Overcoming Computational Complexity Barriers for Optimal Transport in Discrete and Semi-Discrete Settings

TR Number

Date

2025-10-01

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Given two probability distributions mu and nu, Optimal Transport (OT) measures the minimum effort required to transport mass between mu and nu. OT provides a meaningful distance between distributions and acts as a dissimilarity measure between them. Due to its useful statistical properties, OT cost has been used in numerous machine-learning applications.

More formally, given a d-dimensional continuous (resp. discrete) probability distribution mu defined over a support AsubsetmathbbRd and a discrete distribution nu defined over a support BsubsetmathbbRd, the semi-discrete (resp. discrete) optimal transport problem asks for computing a minimum-cost plan to transport mass between mu and nu. In the special case of the discrete OT problem, where mu and nu are defined on sets A and B of n points each, and any point in AcupB is given 1/n mass, the problem is equivalent to the minimum-cost bipartite matching problem. In this thesis, we study the semi-discrete OT and the minimum-cost bipartite matching problems.

The sensitivity of OT to noise has motivated the study of robust variants. Two such formulations are: (i) the alpha-optimal partial transport, which is the minimum cost required to transport a mass of alpha, and (ii) the lambda-robust optimal transport, which regularizes the OT problem using the total variation (TV) distance.

For a continuous distribution mu defined over a bounded support AsubsetmathbbRd and a discrete distribution nu with a support BsubsetmathbbRd of n points, suppose Cmax denotes the diameter of the supports of mu and nu and assume we have access to an oracle that outputs the mass of mu inside a constant-complexity region in O(1) time. In this thesis, given a parameter varepsilon>0, we present the following results for the semi-discrete OT problem between mu and nu:

1. Additive approximation for semi-discrete OT: We present an $varepsilon$-additive approximation algorithm for the semi-discrete OT problem that runs in  $n^{O(d)}logfrac{C_{max}}{varepsilon}$ time. Our algorithm works for several ground distances, including the $ell_p$-norm and the squared-Euclidean distance.

2. Combinatorial algorithm for semi-discrete OT: We propose a combinatorial framework for the semi-discrete OT, which can be viewed as an extension of the combinatorial framework for the discrete OT but requires several new ideas. We present a new algorithm that for $2$-dimensional settings, computes an $varepsilon$-additive approximate semi-discrete transport plan in $tilde{O}(n^{4}log frac{1}{varepsilon})$ time.

3. Combinatorial algorithm for semi-discrete robust OT: We provide a novel characterization of the optimal solutions to the semi-discrete robust OT problem and show that they can be represented as a restricted Voronoi diagram. We also present an algorithm that computes an $varepsilon$-additive approximate solution in $O(n^{4}log nlog frac{1}{varepsilon})$ time in $2$ dimensions and $n^{d+2}log (1/ varepsilon)$ time in $d$ dimensions.

Furthermore, for point sets A and B of size n, let Delta be the spread, i.e., the ratio of the distance of the farthest to the closest pair of points in AcupB. Furthermore, let Phi(n) be the query/update time of a d-dimensional dynamic weighted bichromatic closest pair data structure. For point sets in d dimensions, in this thesis, we present the following result for the bipartite matching problem:

  1. Robust algorithm for stochastic Euclidean matching: We present a new algorithm to compute a minimum-cost bipartite matching under Euclidean distances between A and B with a worst-case execution time of tildeO(n2Phi(n)logDelta). However, when A and B are drawn independently and identically from a fixed distribution that is not known to the algorithm, the execution time of our algorithm is, in expectation, tildeO(n2−frac12dPhi(n)logDelta).

For any given metric space, obtaining an optimal solution to the offline version of the classical k-server problem reduces to solving a minimum-cost partial bipartite matching between two point sets A and B within that metric space. Our final result in this thesis is the following:

  1. Efficient exact offline algorithm for k-server: We present an tildeO(n2−frac12d+1Phi(n)logDelta) time algorithm for solving the instances of minimum-cost partial bipartite matching defined by the offline version of the k-server problem.

Description

Keywords

Optimal Transport, Exact Algorithms, Wasserstein Distance

Citation