Overcoming Computational Complexity Barriers for Optimal Transport in Discrete and Semi-Discrete Settings
dc.contributor.author | Shirzadian, Pouyan | en |
dc.contributor.committeechair | Raghvendra, Sharath | en |
dc.contributor.committeemember | Sikora, Jamie | en |
dc.contributor.committeemember | Ji, Bo | en |
dc.contributor.committeemember | Agarwal, Pankaj | en |
dc.contributor.committeemember | Heath, Lenwood S. | en |
dc.contributor.department | Computer Science and#38; Applications | en |
dc.date.accessioned | 2025-10-02T08:00:16Z | en |
dc.date.available | 2025-10-02T08:00:16Z | en |
dc.date.issued | 2025-10-01 | en |
dc.description.abstract | Given two probability distributions $mu$ and $nu$, Optimal Transport (OT) measures the minimum effort required to transport mass between $mu$ and $nu$. OT provides a meaningful distance between distributions and acts as a dissimilarity measure between them. Due to its useful statistical properties, OT cost has been used in numerous machine-learning applications. More formally, given a $d$-dimensional continuous (resp. discrete) probability distribution $mu$ defined over a support $Asubsetmathbb{R}^d$ and a discrete distribution $nu$ defined over a support $Bsubsetmathbb{R}^d$, the semi-discrete (resp. discrete) optimal transport problem asks for computing a minimum-cost plan to transport mass between $mu$ and $nu$. In the special case of the discrete OT problem, where $mu$ and $nu$ are defined on sets $A$ and $B$ of $n$ points each, and any point in $Acup B$ is given $1/n$ mass, the problem is equivalent to the minimum-cost bipartite matching problem. In this thesis, we study the semi-discrete OT and the minimum-cost bipartite matching problems. The sensitivity of OT to noise has motivated the study of robust variants. Two such formulations are: (i) the $alpha$-optimal partial transport, which is the minimum cost required to transport a mass of $alpha$, and (ii) the $lambda$-robust optimal transport, which regularizes the OT problem using the total variation (TV) distance. For a continuous distribution $mu$ defined over a bounded support $Asubset mathbb{R}^d$ and a discrete distribution $nu$ with a support $Bsubset mathbb{R}^d$ of $n$ points, suppose $C_{max}$ denotes the diameter of the supports of $mu$ and $nu$ and assume we have access to an oracle that outputs the mass of $mu$ inside a constant-complexity region in $O(1)$ time. In this thesis, given a parameter $varepsilon>0$, we present the following results for the semi-discrete OT problem between $mu$ and $nu$: 1. Additive approximation for semi-discrete OT: We present an $varepsilon$-additive approximation algorithm for the semi-discrete OT problem that runs in $n^{O(d)}logfrac{C_{max}}{varepsilon}$ time. Our algorithm works for several ground distances, including the $ell_p$-norm and the squared-Euclidean distance. 2. Combinatorial algorithm for semi-discrete OT: We propose a combinatorial framework for the semi-discrete OT, which can be viewed as an extension of the combinatorial framework for the discrete OT but requires several new ideas. We present a new algorithm that for $2$-dimensional settings, computes an $varepsilon$-additive approximate semi-discrete transport plan in $tilde{O}(n^{4}log frac{1}{varepsilon})$ time. 3. Combinatorial algorithm for semi-discrete robust OT: We provide a novel characterization of the optimal solutions to the semi-discrete robust OT problem and show that they can be represented as a restricted Voronoi diagram. We also present an algorithm that computes an $varepsilon$-additive approximate solution in $O(n^{4}log nlog frac{1}{varepsilon})$ time in $2$ dimensions and $n^{d+2}log (1/ varepsilon)$ time in $d$ dimensions. Furthermore, for point sets $A$ and $B$ of size $n$, let $Delta$ be the spread, i.e., the ratio of the distance of the farthest to the closest pair of points in $Acup B$. Furthermore, let $Phi(n)$ be the query/update time of a $d$-dimensional dynamic weighted bichromatic closest pair data structure. For point sets in $d$ dimensions, in this thesis, we present the following result for the bipartite matching problem: 4. Robust algorithm for stochastic Euclidean matching: We present a new algorithm to compute a minimum-cost bipartite matching under Euclidean distances between $A$ and $B$ with a worst-case execution time of $tilde{O}(n^2Phi(n) log Delta)$. However, when $A$ and $B$ are drawn independently and identically from a fixed distribution that is not known to the algorithm, the execution time of our algorithm is, in expectation, $tilde{O}(n^{2-frac{1}{2d}}Phi(n)log Delta)$. For any given metric space, obtaining an optimal solution to the offline version of the classical $k$-server problem reduces to solving a minimum-cost partial bipartite matching between two point sets $A$ and $B$ within that metric space. Our final result in this thesis is the following: 5. Efficient exact offline algorithm for $k$-server: We present an ${tilde{O}(n^{2-frac{1}{2d+1}}Phi(n)log Delta)}$ time algorithm for solving the instances of minimum-cost partial bipartite matching defined by the offline version of the $k$-server problem. | en |
dc.description.abstractgeneral | Every day, we encounter matching problems, such as assigning taxis to passengers or pairing deliveries with addresses. Similar challenges arise in computer science, particularly in fields like machine learning and generative modeling, where we often need to compare sets of data. For example, to evaluate the quality of images produced by a model, we compare them to real-world images. A mathematical tool called Optimal Transport (OT) helps solve such problems by finding the most efficient way to move ``mass'' from one group (called a distribution) to another. OT has become a popular method in machine learning, computer vision, and statistics because it offers a meaningful way to compare complex data distributions. In the first part of this thesis, we focus on the semi-discrete OT problem, where one distribution is continuous (i.e., a smooth map of mass or values over a region) and the other consists of a finite set of points. We develop new algorithms that make these computations faster and more accurate. Unfortunately, real-world data is often noisy, which can make OT results less reliable. To address this, researchers have introduced robust variants of OT that allow for transporting only a portion of the mass to tolerate small errors in the data. In this thesis, we provide new characterizations of the solutions to these robust OT problems and design efficient algorithms that remain accurate even when the input data is imperfect. In the second part of the thesis, we study a special case of OT known as bipartite matching, a classic computer science problem where the goal is to pair points from two sets at minimum cost. We develop a new algorithm that performs particularly well when the input datasets are similar, an advantage in many practical settings. Finally, we apply our matching techniques to the well-known $k$-server problem, which models how systems with limited resources can respond efficiently to a stream of requests. Our algorithm is efficient and outperforms existing methods when the number of servers is not too small. In summary, this thesis introduces new methods to solve optimal transport and matching problems. These advances have practical implications for data science, logistics, and foundational areas of computer science. | en |
dc.description.degree | Doctor of Philosophy | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:44668 | en |
dc.identifier.uri | https://hdl.handle.net/10919/137888 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Optimal Transport | en |
dc.subject | Exact Algorithms | en |
dc.subject | Wasserstein Distance | en |
dc.title | Overcoming Computational Complexity Barriers for Optimal Transport in Discrete and Semi-Discrete Settings | en |
dc.type | Dissertation | en |
thesis.degree.discipline | Computer Science & Applications | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | doctoral | en |
thesis.degree.name | Doctor of Philosophy | en |