Scalable Combinatorial Algorithms for Optimal Transport Based Similarity Metrics

TR Number

Date

2024-11-22

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Optimal Transport (OT), also known as Wasserstein distance, is a valuable metric for comparing probability distributions. Owing to its appealing statistical properties, researchers in various fields, such as machine learning, use OT within applications. However, computing both exact and approximate OT is computationally expensive and impractical for large datasets. Furthermore, OT is sensitive to small noise in the input distributions. In this document, we propose to use combinatorial methods to design scalable and noise-resistant solutions for OT.

We present four key contributions in this work. First, we introduce a novel combinatorial parallel algorithm for approximating OT, which achieves a parallel time complexity of O(logn/varepsilon2), where n is the input size and varepsilon is the addtitive error, Our algorithm outperforms the state-of-the-art in experiments. Second, we propose a new concept, OT-profile, representing the function of minimum partial optimal transport cost sigmaalpha versus the transported mass alpha. This can be used to identify outliers in real-world data. The utility of OT-profile is demonstrated in outlier detection and PU-learning jobs and outperforms the state-of-the-art. Third, building upon the OT-profile, we propose a new OT-based metric for comparing distributions that is more robust to noise. This metric preserves desirable properties while reducing its sensitivity to noise for high p values, providing a robust solution for real-world datasets. Lastly, we have developed a Python library that integrates our algorithms and methods into a user-friendly framework, making it easier for practitioners to adopt our methods. Our work enhances the computational efficiency and robustness of OT, making it practical for machine learning applicaitons.

Description

Keywords

Combinatorial Optimization, Optimal Transport, Wasserstein distance

Citation