∎
33email: jasonalt@mit.edu
33email: eboix@mit.edu
Polynomial-time algorithms for Multimarginal Optimal Transport problems with structure††thanks: Work partially supported by NSF GRFP 1122374, a TwoSigma PhD fellowship, and a Siebel Scholarship.
Abstract
Multimarginal Optimal Transport (MOT) has attracted significant interest due to applications in machine learning, statistics, and the sciences. However, in most applications, the success of MOT is severely limited by a lack of efficient algorithms. Indeed, MOT in general requires exponential time in the number of marginals and their support sizes . This paper develops a general theory about what “structure” makes MOT solvable in time.
We develop a unified algorithmic framework for solving MOT in time by characterizing the structure that different algorithms require in terms of simple variants of the dual feasibility oracle. This framework has several benefits. First, it enables us to show that the Sinkhorn algorithm, which is currently the most popular MOT algorithm, requires strictly more structure than other algorithms do to solve MOT in time. Second, our framework makes it much simpler to develop time algorithms for a given MOT problem. In particular, it is necessary and sufficient to (approximately) solve the dual feasibility oracle—which is much more amenable to standard algorithmic techniques.
We illustrate this ease-of-use by developing -time algorithms for three general classes of MOT cost structures: (1) graphical structure; (2) set-optimization structure; and (3) low-rank plus sparse structure. For structure (1), we recover the known result that Sinkhorn has runtime (h20gm, ; teh2002unified, ); moreover, we provide the first time algorithms for computing solutions that are exact and sparse. For structures (2)-(3), we give the first time algorithms, even for approximate computation. Together, these three structures encompass many—if not most—current applications of MOT.
Keywords:
Multimarginal optimal transport polynomial-time algorithms implicit linear programming structured linear programsMSC:
90C08 90C061 Introduction
Multimarginal Optimal Transport (MOT) is the problem of linear programming over joint probability distributions with fixed marginal distributions. In this way, MOT generalizes the classical Kantorovich formulation of Optimal Transport from marginal distributions to an arbitrary number of them.
More precisely, an MOT problem is specified by a cost tensor in the -fold tensor product space , and marginal distributions in the simplex .111For simplicity, all are assumed to have the same support size . Everything in this paper extends in a straightforward way to non-uniform sizes where is replaced by , and is replaced by . The MOT problem is to compute
(MOT) |
where is the “transportation polytope” consisting of all entrywise non-negative tensors satisfying the marginal constraints for all and .
This MOT problem has many applications throughout machine learning, computer science, and the natural sciences since it arises in tasks that require “stitching” together aggregate measurements. For instance, applications of MOT include inference from collective dynamics (h20gm, ; h20partial, ), information fusion for Bayesian learning (srivastava2018scalable, ), averaging point clouds (AguCar11, ; CutDou14, ), the -coupling problem (ruschendorf2002n, ), quantile aggregation (makarov1982estimates, ; ruschendorf1982random, ), matching for teams (ChiMccNes10, ; CarEke10, ), image processing (RabPeyDel11, ; solomon2015convolutional, ), random combinatorial optimization (zemel1982polynomial, ; weiss1986stochastic, ; Nat21extremal, ; Agr12, ; MeiNad79, ; nadas1979probabilistic, ; Han86, ), Distributionally Robust Optimization (Nat18dro, ; Nat09, ; MisNat14, ), simulation of incompressible fluids (Bre08, ; BenCarNen19, ), and Density Functional Theory (cotar2013density, ; buttazzo2012optimal, ; BenCarNen16, ).
However, in most applications, the success of MOT is severely limited by the lack of efficient algorithms. Indeed, in general, MOT requires exponential time in the number of marginals and their support sizes . For instance, applying a linear program solver out-of-the-box takes time because MOT is a linear program with variables, non-negativity constraints, and equality constraints. Specialized algorithms in the literature such as the Sinkhorn algorithm yield similar runtimes. Such runtimes currently limit the applicability of MOT to tiny-scale problems (e.g., ).
Polynomial-time algorithms for MOT.
This paper develops polynomial-time algorithms for MOT, where here and henceforth “polynomial” means in the number of marginals and their support sizes —and possibly also for -additive approximation, where is a bound on the entries of .
At first glance, this may seem impossible for at least two “trivial” reasons. One is that it takes exponential time to read the input cost since it has entries. We circumvent this issue by considering costs with -size implicit representations, which encompasses essentially all MOT applications.222E.g., in the MOT problems of Wasserstein barycenters, generalized Euler flows, and Density Functional Theory, has entries and thus can be implicitly input via the functions . A second obvious issue is that it takes exponential time to write the output variable since it has entries. We circumvent this issue by returning solutions with -size implicit representations, for instance sparse solutions.
But, of course, circumventing these issues of input/output size is not enough to actually solve MOT in polynomial time. See (AltBoi20hard, ) for examples of -hard MOT problems with costs that have -size implicit representations.
Remarkably, for several MOT problems, there are specially-tailored algorithms that run in polynomial time—notably, for MOT problems with graphically-structured costs of constant treewidth (h20tree, ; h20gm, ; teh2002unified, ), variational mean-field games (BenCarDi18, ), computing generalized Euler flows (BenCarCut15, ), computing low-dimensional Wasserstein barycenters (CarObeOud15, ; BenCarCut15, ; AltBoi20bary, ), and filtering and estimation tasks based on target tracking (h20tree, ; h20gm, ; h20incremental, ; h19hmc, ; h20partial, ). However, the number of MOT problems that are known to be solvable in polynomial time is small, and it is unknown if these techniques can be extended to the many other MOT problems arising in applications. This motivates the central question driving this paper:
This paper is conceptually divided into two parts. In the first part of the paper, we develop a unified algorithmic framework for MOT that characterizes the structure required for different algorithms to solve MOT in time, in terms of simple variants of the dual feasibility oracle. This enables us to prove that some algorithms can solve MOT problems in polynomial time whenever any algorithm can; whereas the popular SINKHORN algorithm cannot. Moreover, this algorithmic framework makes it significantly easier to design a time algorithm for a given MOT problem (when possible) because it now suffices to solve the dual feasibility oracle—and this is much more amenable to standard algorithmic techniques. In the second part of the paper, we demonstrate the ease-of-use of our algorithmic framework by applying it to three general classes of MOT cost structures.
1.1 Contribution : unified algorithmic framework for MOT
In order to understand what structural properties make MOT solvable in polynomial time, we first lay a more general groundwork. The purpose of this is to understand the following fundamental questions:
-
Q1
What are reasonable candidate algorithms for solving structured MOT problems in polynomial time?
-
Q2
What structure must an MOT problem have for these algorithms to have polynomial runtimes?
-
Q3
Is the structure required by a given algorithm more restrictive than the structure required by a different algorithm (or any algorithm)?
-
Q4
How to check if this structure occurs for a given MOT problem?
We detail our answers to these four questions below in §1.1.1 to §1.1.4, and then briefly discuss practical tradeoffs beyond polynomial-time solvability in §1.1.5; see Table 4 for a summary. We expect that this general groundwork will prove useful in future investigations of tractable MOT problems.
Algorithm | Oracle | Runtime | Always applicable? | Exact solution? | Sparse solution? | Practical? |
ELLIPSOID | MIN | Theorem 4.1 | Yes | Yes | Yes | No |
MWU | AMIN | Theorem 4.7 | Yes | No | Yes | Yes |
SINKHORN | SMIN | Theorem 4.18 | No | No | No | Yes |
1.1.1 Answer to Q1: candidate -time algorithms
We consider three algorithms for MOT whose exponential runtimes can be isolated into a single bottleneck—and thus can be implemented in polynomial time whenever that bottleneck can. These algorithms are the Ellipsoid algorithm ELLIPSOID (GLSbook, ), the Multiplicative Weights Update algorithm MWU (young2001sequential, ), and the natural multidimensional analog of Sinkhorn’s scaling algorithm SINKHORN (BenCarCut15, ; PeyCut17, ). SINKHORN is specially tailored to MOT and is currently the predominant algorithm for it. To foreshadow our answer to Q3, the reason that we restrict to these candidate algorithms is: we show that ELLIPSOID and MWU can solve an MOT problem in polynomial time if and only if any algorithm can.
1.1.2 Answer to Q2: structure necessary to run candidate algorithms
These three algorithms only access the cost tensor through polynomially many calls of their respective bottlenecks. Thus the structure required to implement these candidate algorithms in polynomial time is equivalent to the structure required to implement their respective bottlenecks in polynomial time.
In §4, we show that the bottlenecks of these three algorithms are polynomial-time equivalent to natural analogs of the feasibility oracle for the dual LP to MOT. Namely, given weights , compute
(1.1) |
either exactly for ELLIPSOID, approximately for MWU, or with the “min” replaced by a “softmin” for SINKHORN. We call these three tasks the MIN, AMIN, and SMIN oracles, respectively. See Remark 3.4 for the interpretation of these oracles as variants of the dual feasibility oracle.
These three oracles take time to implement in general. However, for a wide range of structured cost tensors they can be implemented in time, see §1.2 below. For such structured costs , our oracle abstraction immediately implies that the MOT problem with cost and any input marginals can be (approximately) solved in polynomial time by any of the three respective algorithms.
Our characterization of the algorithms’ bottlenecks as variations of the dual feasibility oracle has two key benefits—which are the answers to Q3 and Q4, described below.
1.1.3 Answer to Q3: characterizing what MOT problems each algorithm can solve
A key benefit of our characterization of the algorithms’ bottlenecks as variations of the dual feasibility oracles is that it enables us to establish whether the structure required by a given MOT algorithm is more restrictive than the structure required by a different algorithm (or by any algorithm).
In particular, this enables us to answer the natural question: why restrict to just the three algorithms described above? Can other algorithms solve MOT in time in situations when these algorithms cannot? Critically, the answer is no: restricting ourselves to the ELLIPSOID and MWU algorithms is at no loss of generality.
Theorem 1.1 (Informal statement of part of Theorems 4.1 and 4.7).
For any family of costs :
-
•
ELLIPSOID computes an exact solution for MOT in time if and only if any algorithm can.
-
•
MWU computes an -approximate solution for MOT in time if and only if any algorithm can.
The statement for ELLIPSOID is implicit from classical results about LP (GLSbook, ) combined with arguments from (AltBoi20bary, ), see the previous work section §1.3. The statement for MWU is new to this paper.
The oracle abstraction helps us show Theorem 1.1 because it reduces this question of what structure is needed for the algorithms to solve MOT in polynomial time, to the question of what structure is needed to solve their respective bottlenecks in polynomial time. Thus Theorem 1.1 is a consequence of the following result. (The “if” part of this result is a contribution of this paper; the “only if” part was shown in (AltBoi20hard, ).)
Theorem 1.2 (Informal statement of part of Theorems 4.1 and 4.7).
For any family of costs :
-
•
MOT can be exactly solved in time if and only if MIN can.
-
•
MOT can be -approximately solved in time if and only if AMIN can.
Interestingly, a further consequence of our unified algorithm-to-oracle abstraction is that it enables us to show that SINKHORN—which is currently the most popular algorithm for MOT by far—requires strictly more structure to solve an MOT problem than other algorithms require. This is in sharp contrast to the complete generality of the other two algorithms shown in Theorem 1.1.
Theorem 1.3 (Informal statement of Theorem 4.19).
Under standard complexity-theoretic assumptions, there exists a family of MOT problems that can be solved exactly in time using ELLIPSOID, however it is impossible to implement a single iteration of SINKHORN (even approximately) in time.
The reason that our unified algorithm-to-oracle abstraction helps us show Theorem 1.3 is that it puts SINKHORN on equal footing with the other two classical algorithms in terms of their reliance on variants of the dual feasibility oracle. This reduces proving Theorem 1.3 to showing the following separation between the SMIN oracle and the other two oracles.
Theorem 1.4 (Informal statement of Lemma 3.7).
Under standard complexity-theoretic assumptions, there exists a family of cost tensors such that there are -time algorithms for MIN and AMIN, however it is impossible to solve SMIN (even approximately) in time.
1.1.4 Answer to Q4: ease-of-use for checking if MOT is solvable in polynomial time
The second key benefit of this oracle abstraction is that it is helpful for showing that a given MOT problem (whose cost is input implicitly through some concise representation) is solvable in polynomial time as it without loss of generality reduces MOT to solving any of the three corresponding oracles in polynomial time. The upshot is that these oracles are more directly amenable to standard algorithmic techniques since they are phrased as more conventional combinatorial-optimization problems. In the second part of the paper, we illustrate this ease-of-use via applications to three general classes of structured MOT problems; for an overview see §1.2.
1.1.5 Practical algorithmic tradeoffs beyond polynomial-time solvability
From a theoretical perspective, the most important aspect of an algorithm is whether it can solve MOT in polynomial time if and only if any algorithm can. As we have discussed, this is true for ELLIPSOID and MWU (Theorem 1.1) but not for SINKHORN (Theorem 1.3). Nevertheless, for a wide range of MOT cost structures, all three oracles can be implemented in polynomial time, which means that all three algorithms ELLIPSOID, MWU, and SINKHORN can be implemented in polynomial time. Which algorithm is best in practice depends on the relative importance of the following considerations for the particular application.
-
•
Error. ELLIPSOID computes exact solutions, whereas MWU and SINKHORN only compute low-precision solutions due to runtime dependence.
-
•
Solution sparsity. ELLIPSOID and MWU output solutions with polynomially many non-zero entries (roughly ), whereas SINKHORN outputs fully dense solutions with non-zero entries (through a polynomial-size implicit representation, see §4.3). Solution sparsity enables interpretability, visualization, and efficient downstream computation—benefits which are helpful in diverse applications, for example ranging from computer graphics (solomon2015convolutional, ; blondel2018smooth, ; pitie2007automated, ) to facility location problems (anderes2016discrete, ) to machine learning (AltBoi20bary, ; flamary2016optimal, ) to ecological inference (muzellec2017tsallis, ) to fluid dynamics (see §5.3), and more. Furthermore, in §7.4, we show that sparse solutions for MOT (a.k.a. linear optimization over the transportation polytope) enable efficiently solving certain non-linear optimization problems over the transportation polytope.
-
•
Practical runtime. Although all three algorithms enjoy polynomial runtime guarantees, the polynomials are smaller for some algorithms than for others. In particular, SINKHORN has remarkably good scalability in practice as long the error is not too small and its bottleneck oracle SMIN is practically implementable. By Theorems 1.1 and 1.3, MWU can solve strictly more MOT problems in polynomial time than SINKHORN; however, it is less scalable in practice when both MWU and SINKHORN can be implemented. ELLIPSOID is not practical and is used solely as a proof of concept that problems are tractable to solve exactly; in practice, we use Column Generation (see, e.g., (BerTsi97, , §6.1)) rather than ELLIPSOID as it has better empirical performance, yet still has the same bottleneck oracle MIN, see §4.1.3. Column Generation is not as practically scalable as SINKHORN in and but has the benefit of computing exact, sparse solutions.
To summarize: which algorithm is best in practice depends on the application. For example, Column Generation produces the qualitatively best solutions for the fluid dynamics application in §5.3, SINKHORN is the most scalable for the risk estimation application in §7.3, and MWU is the most scalable for the network reliability application in §6.3 (for that application there is no known implementation of SINKHORN that is practically efficient).
1.2 Contribution : applications to general classes of structured MOT problems
In the second part of the paper, we illustrate the algorithmic framework developed in the first part of the paper by applying it to three general classes of MOT cost structures:
-
1.
Graphical structure (in §5).
-
2.
Set-optimization structure (in §6).
-
3.
Low-rank plus sparse structure (in §7).
Specifically, if the cost is structured in any of these three ways, then MOT can be (approximately) solved in time for any input marginals .
Previously, it was known how to solve MOT problems with structure (1) using SINKHORN (h20gm, ; teh2002unified, ), but this only computes solutions that are dense (with non-zero entries) and low-precision (due to runtime dependence). We therefore provide the first solutions that are sparse and exact for structure (1). For structures (2) and (3), we provide the first polynomial-time algorithms, even for approximate computation. These three structures are incomparable: it is in general not possible to model a problem falling under any of the three structures in a non-trivial way using any of the others, for details see Remarks 6.7 and 7.3. This means that the new structures (2) and (3) enable capturing a wide range of new applications.
Below, we detail these structures individually in §1.2.1, §1.2.2, and §1.2.3. See Table 2 for a summary.
Structure | Definition | Complexity measure | Polynomial-time algorithm? | |
Approximate | Exact | |||
Graphical (§5) | treewidth | Known (teh2002unified, ; h20gm, ) | Corollary 5.6 | |
Set-optimization (§6) | optimization oracle over S | Corollary 6.9 | Corollary 6.9 | |
Low-rank + sparse (§7) | rank of , sparsity of | Corollary 7.5 | Unknown |
1.2.1 Graphical structure
In §5, we apply our algorithmic framework to MOT problems with graphical structure, a broad class of MOT problems that have been previously studied (h20gm, ; h20tree, ; teh2002unified, ). Briefly, an MOT problem has graphical structure if its cost tensor decomposes as
where are arbitrary “local interactions” that depend only on tuples of the variables.
In order to derive efficient algorithms, it is necessary to restrict how local the interactions are because otherwise MOT is -hard (even if all interaction sets have size ) (AltBoi20hard, ). We measure the locality of the interactions via the standard complexity measure of the “treewidth” of the associated graphical model. See §5.1 for formal definitions. While the runtimes of our algorithms (and all previous algorithms) depend exponentially on the treewidth, we emphasize that the treewidth is a very small constant (either or ) in all current applications of MOT falling under this framework; see the related work section.
We show that for MOT cost tensors that have graphical structure of constant treewidth, all three oracles can be implemented in time. We accomplish this by leveraging the known connection between graphically structured MOT and graphical models shown in (h20gm, ). In particular, the MIN, AMIN, and SMIN oracles are respectively equivalent to the mode, approximate mode, and log-partition function of an associated graphical model. Thus we can implement all oracles in time by simply applying classical algorithms from the graphical models community (KolFri09, ; wainwright2008graphical, ).
Theorem 1.5 (Informal statement of Theorem 5.5).
Let have graphical structure of constant treewidth. Then the MIN, AMIN, and SMIN oracles can be computed in time.
It is an immediate corollary of Theorem 1.5 and our algorithms-to-oracles reduction described in §1.1 that one can implement ELLIPSOID, MWU, and SINKHORN in polynomial time. Below, we record the theoretical guarantee of ELLIPSOID since it is the best of the three algorithms as it computes exact, sparse solutions.
Theorem 1.6 (Informal statement of Corollary 5.6).
Let have graphical structure of constant treewidth. Then an exact, sparse solution for MOT can be computed in time.
Previously, it was known how to solve such MOT problems (teh2002unified, ; h20gm, ) using SINKHORN, but this only computes a solution that is fully dense (with non-zero entries) and low-precision (due to runtime dependence). Details in the related work section. Our result improves over this state-of-the-art algorithm by producing solutions that are exact and sparse in time.
In §5.3, we demonstrate the benefit of Theorem 1.6 on the application of computing generalized Euler flows, which was historically the motivation of MOT and has received significant attention, e.g., (BenCarCut15, ; BenCarNen19, ; Bre08, ; Bre89, ; Bre93, ; Bre99, ). While there is a specially-tailored version of the SINKHORN algorithm for this problem that runs in polynomial time (BenCarCut15, ; BenCarNen19, ), it produces solutions that are approximate and fully dense. Our algorithm produces exact, sparse solutions which lead to sharp visualizations rather than blurry ones (see Figure 10).
1.2.2 Set-optimization structure
In §6, we apply our algorithmic framework to MOT problems whose cost tensors take value or in each entry. That, is costs of the form
for some subset . Such MOT problems arise naturally in applications where one seeks to minimize the probability that some event occurs, given marginal probabilities on each variable , see Example 6.1.
In order to derive efficient algorithms, it is necessary to restrict the (otherwise arbitrary) set . We parametrize the complexity of such MOT problems via the complexity of finding the minimum-weight object in . This opens the door to combinatorial applications of MOT because finding the minimum-weight object in is well-known to be polynomial-time solvable for many “combinatorially-structured” sets of interest—e.g., the set of cuts in a graph, or the set of independent sets in a matroid.
We show that for MOT cost tensors with this structure, all three oracles can be implemented efficiently.
Theorem 1.7 (Informal statement of Theorem 6.8).
Let have set-optimization structure. Then the MIN, AMIN, and SMIN oracles can be computed in time.
It is an immediate corollary of Theorem 1.7 and our algorithms-to-oracles reduction described in §1.1 that one can implement ELLIPSOID, MWU, and SINKHORN in polynomial time. Below, we record the theoretical guarantee for ELLIPSOID since it is the best of these three algorithms as it computes exact, sparse solutions.
Theorem 1.8 (Informal statement of Corollary 6.9).
Let have set-optimization structure. Then an exact, sparse solution for MOT can be computed in time.
This is the first polynomial-time algorithm for this class of MOT problems. We note that a more restrictive class of MOT problems was studied in (zemel1982polynomial, ) under the additional restriction that is upwards-closed.
In §6.3, we show how this general class of set-optimization structure captures, for example, the classical application of computing the extremal reliability of a network with stochastic edge failures. Network reliability is a fundamental topic in network science and engineering (gertsbakh2011network, ; ball1995network, ; ball1986computational, ) which is often studied in an average-case setting where each edge fails independently with some given probability (moore1956reliable, ; karger2001randomized, ; valiant1979complexity, ; provan1983complexity, ). The application investigated here is a robust notion of network reliability in which edge failures may be maximally correlated (e.g., by an adversary) or minimally correlated (e.g., by a network maintainer) subject to a marginal constraint on each edge’s failure probability, a setting that dates back to the 1980s (zemel1982polynomial, ; weiss1986stochastic, ). We show how to express both the minimally and maximally correlated network reliability problems as MOT problems with set-optimization structure, recovering as a special case of our general framework the known polynomial-time algorithms in (zemel1982polynomial, ; weiss1986stochastic, ) as well as more practical polynomial-time algorithms that scale to input sizes that are an order-of-magnitude larger.
1.2.3 Low-rank and sparse structure
In §7, we apply our algorithmic framework to MOT problems whose cost tensors decompose as
where is a constant-rank tensor, and is a polynomially-sparse tensor. We assume that is represented in factored form, and that is represented through its non-zero entries, which overall yields a -size representation of .
We show that for MOT cost tensors with low-rank plus sparse structure, the AMIN and SMIN oracles can be implemented in polynomial time.555It is an interesting open question if the MIN oracle can similarly be implemented in time. This would enable implementing ELLIPSOID in time by our algorithms-to-oracles reduction, and thus would enable computing exact solutions for this class of MOT problems (cf., Theorem 1.10). This may be of independent interest because, by taking all oracle inputs in (1.1), this generalizes the previously open problem of approximately computing the smallest entry of a constant-rank tensor with entries in time.
Theorem 1.9 (Informal statement of Theorem 7.4).
Let have low-rank plus sparse structure. Then the AMIN and SMIN oracles can be computed in time.
It is an immediate corollary of Theorem 1.9 and our algorithms-to-oracles reduction described in §1.1 that one can implement MWU and SINKHORN in polynomial time. Of these two algorithms, MWU computes sparse solutions, yielding the following theorem.
Theorem 1.10 (Informal statement of Corollary 7.5).
Let have low-rank plus sparse structure. Then a sparse, -approximate solution for MOT can be computed in time.
This is the first polynomial-time result for this class of MOT problems. We note that the runtime of our MOT algorithm depends exponentially on the rank of , hence why we take to be constant. Nevertheless, such a restriction on the rank is unavoidable since unless , there does not exist an algorithm with runtime that is jointly polynomial in , , and the rank (AltBoi20hard, ).
We demonstrate this polynomial-time algorithm concretely on two applications. First, in §7.3 we consider the risk estimation problem of computing an investor’s expected profit in the worst-case over all future prices that are consistent with given marginal distributions. We show that this is equivalent to an MOT problem with a low-rank tensor and thereby provide the first efficient algorithm for it.
Second, in §7.4, we consider the fundamental problem of projecting a joint distribution onto the transportation polytope. We provide the first polynomial-time algorithm for solving this when decomposes into a constant-rank and sparse component, which models mixtures of product distributions with polynomially many corruptions. This application illustrates the versatility of our algorithmic results beyond polynomial-time solvability of MOT, since this projection problem is a quadratic optimization over the transportation polytope rather than linear optimization (a.k.a. MOT). In order to achieve this, we develop a simple quadratic-to-linear reduction tailored to this problem that crucially exploits the sparsity of the MOT solutions enabled by the MWU algorithm.
1.3 Related work
1.3.1 MOT algorithms
MOT algorithms fall into two categories. One category consists of general-purpose algorithms that do not depend on the specific MOT cost. For example, this includes running an LP solver out-of-the-box, or running the Sinkhorn algorithm where in each iteration one sums over all entries of the cost tensor to implement the marginalization bottleneck (LinHoJor19, ; Fri20, ; tupitsa2020multimarginal, ). These approaches are robust in the sense that they do not need to be changed based on the specific MOT problem. However, they are impractical beyond tiny input sizes (e.g., ) because their runtimes scale as .
The second category consists of algorithms that are much more scalable but require extra structure of the MOT problem. Specifically, these are algorithms that somehow exploit the structure of the relevant cost tensor in order to (approximately) solve an MOT problem in time (teh2002unified, ; h20gm, ; h20tree, ; BenCarDi18, ; h20incremental, ; h19hmc, ; h20partial, ; CarObeOud15, ; Nen16, ; BenCarCut15, ; BenCarNen19, ; AltBoi20bary, ; Nat18dro, ; Nat09, ; MisNat14, ; zemel1982polynomial, ; weiss1986stochastic, ; Nat21extremal, ; Agr12, ; MeiNad79, ; nadas1979probabilistic, ; Han86, ). Such a runtime is far more tractable—but it is not well understood for which MOT problems such a runtime is possible. The purpose of this paper is to clarify this question.
To contextualize our answer to this question with the rapidly growing literature requires further splitting this second category of algorithms.
Sinkhorn algorithm.
Currently, the predominant approach in the second category is to solve an entropically regularized version of MOT with the Sinkhorn algorithm, a.k.a. Iterative Proportional Fitting or Iterative Bregman Projections or RAS algorithm or Iterative Scaling algorithm, see e.g., (teh2002unified, ; BenCarDi18, ; BenCarNen16, ; BenCarNen19, ; Nen16, ; h20tree, ; h20gm, ). Recent work has shown that a polynomial number of iterations of this algorithm suffices (LinHoJor19, ; Fri20, ; tupitsa2020multimarginal, ). However, the bottleneck is that each iteration requires operations in general because it requires marginalizing a tensor with entries. The critical question is therefore: what structure of an MOT problem enables implementing this marginalization bottleneck in polynomial time.
This paper makes two contributions to this question. First, we identify new broad classes of MOT problems for which this bottleneck can be implemented in polynomial time, and thus SINKHORN can be implemented in polynomial time (see §1.2). Second, we propose other algorithms that require strictly less structure than SINKHORN does in order to solve an MOT problem in polynomial time (Theorem 4.19).
Ellipsoid algorithm.
The Ellipsoid algorithm is among the most classical algorithms for implicit LP (GLSbook, ; GroLovSch81, ; khachiyan1980polynomial, ), however it has taken a back seat to the SINKHORN algorithm in the vast majority of the MOT literature.
In §4.1, we make explicit the fact that the variant of ELLIPSOID from (AltBoi20bary, ) can solve MOT exactly in time if and only if any algorithm can (Theorem 4.1). This is implicit from combining several known results (AltBoi20hard, ; AltBoi20bary, ; GLSbook, ). In the process of making this result explicit, we exploit the special structure of the MOT LP to significantly simplify the reduction from the dual violation oracle to the dual feasibility oracle. The previously known reduction is highly impractical as it requires an indirect “back-and-forth” use of the Ellipsoid algorithm (GLSbook, , page 107). In contrast, our reduction is direct and simple; this is critical for implementing our practical alternative to ELLIPSOID, namely COLGEN, with the dual feasibility oracle.
Multiplicative Weights Update algorithm.
This algorithm, first introduced by (young2001sequential, ), has been studied in the context of optimal transport when (BlaJamKenSid18, ; Qua18, ), in which case implicit LP is not necessary for a polynomial runtime. MWU lends itself to implicit LP (young2001sequential, ), but is notably absent from the MOT literature.
In §4.2, we show that MWU can be applied to MOT in polynomial time if and only if the approximate dual feasibility oracle can be solved in polynomial time. To do this, we show that in the special case of MOT, the well-known “softmax-derivative” bottleneck of MWU is polynomial-time equivalent to the approximate dual feasibility oracle. Since it is known that the approximate dual feasibility oracle is polynomial-time reducible to approximate MOT (AltBoi20hard, ), we therefore establish that MWU can solve MOT approximately in polynomial time if and only if any algorithm can (Theorem 4.7).
1.3.2 Graphically structured MOT problems with constant treewidth
We isolate here graphically structured costs with constant treewidth because this framework encompasses all MOT problems that were previously known to be tractable in polynomial time (teh2002unified, ; h20gm, ), with the exceptions of the fixed-dimensional Wasserstein barycenter problem and MOT problems related to combinatorial optimization—both of which are described below in §1.3.3. This family of graphical structured costs with treewidth (a.k.a. “tree-structured costs” (h20tree, )) includes applications in economics such as variational mean-field games (BenCarDi18, ), interpolating histograms on trees (akagi2020probabilistic, ), matching for teams (CarObeOud15, ; Nen16, ); as well as encompasses applications in filtering and estimation for collective dynamics such as target tracking (h20tree, ; h20gm, ; h20incremental, ; h19hmc, ; h20partial, ) and Wasserstein barycenters in the case of fixed support (h20partial, ; Nen16, ; BenCarCut15, ; CarObeOud15, ). With treewidth , this family of costs also includes dynamic multi-commodity flow problems (haasler2021scalable, ), as well as the application of computing generalized Euler flows in fluid dynamics (BenCarCut15, ; BenCarNen19, ; Nen16, ), which was historically the original motivation of MOT (Bre89, ; Bre93, ; Bre99, ; Bre08, ).
Previous polynomial-time algorithms for graphically structured MOT compute approximate, dense solutions.
Implementing SINKHORN for graphically structured MOT problems by using belief propagation to efficiently implement the marginalization bottleneck was first proposed twenty years ago in (teh2002unified, ). There have been recent advancements in understanding connections of this algorithm to the Schrödinger bridge problem in the case of trees (h20tree, ), as well as developing more practically efficient single-loop variations (h20gm, ).
All of these works prove theoretical runtime guarantees only in the case of tree structure (i.e., treewidth ). However, this graphical model perspective for efficiently implementing SINKHORN readily extends to any constant treewidth: simply implement the marginalization bottleneck using junction trees. This, combined with the iteration complexity of SINKHORN which is known to be polynomial (LinHoJor19, ; Fri20, ; tupitsa2020multimarginal, ), immediately yields an overall polynomial runtime. This is why we cite (teh2002unified, ; h20gm, ) throughout this paper regarding the fact that SINKHORN can be implemented in polynomial time for graphical structure with any constant treewidth.
While the use of SINKHORN for graphically structured MOT is mathematically elegant and can be impressively scalable in practice, it has two drawbacks. The first drawback of this algorithm is that it computes (implicit representations of) solutions that are fully dense with non-zero entries. Indeed, it is well-known that SINKHORN finds the unique optimal solution to the entropically regularized MOT problem , and that this solution is fully dense (PeyCut17, ). For example, in the simple case of cost , uniform marginals , and any strictly positive regularization parameter , this solution has value in each entry.
The second drawback of this algorithm is that it only computes solutions that are low-precision due to runtime dependence on the accuracy . This is because the number of SINKHORN iterations is known to scale polynomially in the entropic regularization parameter even in the matrix case (linial1998deterministic, , §1.2), and it is known that is necessary for the converged solution of SINKHORN to be an -approximate solution to the (unregularized) original MOT problem (LinHoJor19, ).
Improved algorithms for graphically structured MOT problems.
The contribution of this paper to the study of graphically structured MOT problems is that we give the first time algorithms that can compute solutions which are exact and sparse (Corollary 5.6). Our framework also directly recovers all known results about SINKHORN for graphically structured MOT problems—namely that it can be implemented in polynomial time for trees (h20tree, ; teh2002unified, ) and for constant treewidth (h20gm, ; teh2002unified, ).
1.3.3 Tractable MOT problems beyond graphically structured costs
The two new classes of MOT problems studied in this paper—namely, set-optimization structure and low-rank plus sparse structure—are incomparable to each other as well as to graphical structure. Details in Remarks 6.7 and 7.3. This lets us handle a wide range of new MOT problems that could not be handled before.
There are two other classes of MOT problems studied in the literature which do not fall under the three structures studied in this paper. We elaborate on both below.
Remark 1.11 (Low-dimensional Wasserstein barycenter).
This MOT problem has cost where denotes the -th atom in the distribution . Clearly this cost is not a graphically structured cost of constant treewidth—indeed, representing it through the lens of graphical structure requires the complete graph of interactions, which means a maximal treewidth of .666 We remark that the related but different problem of fixed-support Wasserstein barycenters has graphical structure with treewidth (h20partial, ; Nen16, ; BenCarCut15, ; CarObeOud15, ). However, it should be emphasized that the fixed-support Wasserstein barycenter problem is different from the Wasserstein barycenter problem: it only approximates the latter to accuracy if the fixed support is restricted to an -net which requires discretization size for the barycenter’s support, and thus (i) even in constant dimension, does not lead to high-precision algorithms due to runtime; and (ii) scales exponentially in the dimension . See (AltBoi20baryhard, , §1.3) for further details about the complexity of Wasserstein barycenters. This problem also does not fall under the set-optimization or constant-rank structures. Nevertheless, this MOT problem can be solved in time for any fixed dimension by exploiting the low-dimensional geometric structure of the points that implicitly define the cost (AltBoi20bary, ).
Remark 1.12 (Random combinatorial optimization).
MOT problems also appear in the random combinatorial optimization literature since the 1970s, see e.g., (MeiNad79, ; zemel1982polynomial, ; weiss1986stochastic, ; nadas1979probabilistic, ; Han86, ), although under a different name and in a different community. These papers consider MOT problems with costs of the form for polytopes given through a list of their extreme points. Applications include PERT (Program Evaluation and Review Technique), extremal network reliability, and scheduling. Recently, applications to Distributionally Robust Optimization were investigated in (Nat18dro, ; Nat09, ; MisNat14, ) which considered general polytopes , as well as in (Nat21extremal, ) which considered MOT costs of the related form , and in (Agr12, ) which considers other combinatorial costs such as sub/supermodular functions. These papers show that these random combinatorial optimization problems are in general intractable, and give sufficient conditions on when they can be solved in polynomial time. In general, these families of MOT problems are different from the three structures studied in this paper, although some MOT applications fall under multiple umbrellas (e.g., extremal network reliability). It is an interesting question to understand to what extent these structures can be reconciled (as well as the algorithms, which sometimes use extended formulations in these papers).
1.3.4 Intractable MOT problems
These algorithmic results beg the question: what are the fundamental limitations of this line of work on polynomial-time algorithms for structured MOT problems? To this end, the recent paper (AltBoi20hard, ) provides a systematic investigation of -hardness results for structured MOT problems, including converses to several results in this paper. In particular, (AltBoi20hard, , Propositions 4.1 and 4.2) justify the constant-rank regime studied in §7 by showing that unless , there does not exist an algorithm with runtime that is jointly polynomially in the rank and the input parameters and . Similarly, (AltBoi20hard, , Propositions 5.1 and 5.2) justify the constant-treewidth regime for graphically structured costs studied in §5 and all previous work by showing that unless , there does not exist an algorithm with polynomial runtime even in the seemingly simple class of MOT costs that decompose into pairwise interactions . The paper (AltBoi20hard, ) also shows -hardness for several MOT problems with repulsive costs, including for example the MOT formulation of Density Functional Theory with Coulomb-Buckingham potential. It is an problem whether the Coulomb potential, studied in (BenCarNen16, ; cotar2013density, ; buttazzo2012optimal, ), also leads to an -hard MOT problem (AltBoi20hard, , Conjecture 6.4).
1.3.5 Variants of MOT
The literature has studied several other variants of the MOT problem, notably with entropic regularization and/or with constraints on a subset of the marginals, see, e.g., (BenCarCut15, ; BenCarDi18, ; BenCarNen19, ; BenCarNen16, ; h20partial, ; haasler2021scalable, ; h20incremental, ; h20gm, ; h20tree, ; h19hmc, ; LinHoJor19, ). Our techniques readily apply with little change. Briefly, to handle entropic regularization, simply use the SMIN oracle and SINKHORN algorithm with fixed regularization parameter (rather than of vanishing size ) as described in §4.3. And to handle partial marginal constraints, essentially the only change is that in the MIN, AMIN, and SMIN oracles, the potentials are zero for all indices corresponding to unconstrained marginals . Full details are omitted for brevity since they are straightforward modifications of our main results.
1.3.6 Optimization over joint distributions
Optimization problems over exponential-size joint distributions appear in many domains. For instance, they arise in game theory when computing correlated equilibria (pap08, ); however, in that case the optimization has different constraints which lead to different algorithms. Such problems also arise in variational inference (wainwright2003variational, ); however, the optimization there typically constrains this distribution to ensure tractability (e.g., mean-field approximation restricts to product distributions). The different constraints in these optimization problems over joint distributions versus MOT lead to significant differences in computational complexity, and thus also necessitate different algorithmic techniques.
1.4 Organization
In §2 we recall preliminaries about MOT and establish notation. The first part of the paper then establishes our unified algorithmic framework for MOT. Specifically, in §3 we define and compare three variants of the dual feasibility oracle; and in §4 we characterize the structure that MOT algorithms require for polynomial-time implementation in terms of these three oracles. For an overview of these results, see §1.1. The second part of the paper applies this algorithmic framework to three general classes of MOT cost structures: graphical structure (§5), set-optimization structure (§6), and low-rank plus sparse structure (§7). For an overview of these results, see §1.2. These three application sections are independent of each other and can be read separately. We conclude in §8.
2 Preliminaries
General notation.
The set is denoted by . For shorthand, we write to denote a function that grows at most polynomially fast in those parameters. Throughout, we assume for simplicity of exposition that all entries of the input and have bit complexity at most , and same with the components defining in structured settings. As such, throughout runtimes refer to the number of arithmetic operations. The set is denoted by , and note that the value can be represented efficiently by adding a single flag bit. We use the standard and notation, and use and to denote that polylogarithmic factors may be omitted.
Tensor notation.
The -fold tensor product space is denoted by , and similarly for . Let . Its -th marginal, , is denoted by and has entries . For shorthand, we often denote an index by . The sum of ’s entries is denoted by . The maximum absolute value of ’s entries is denoted by , or simply for short. For , we write to denote the tensor with value at entry , and elswewhere. The operations and respectively denote the entrywise product and the Kronecker product. The notation is shorthand for . A non-standard notation we use throughout is that denotes a function (typically , , or a polynomial) applied entrywise to a tensor .
2.1 Multimarginal Optimal Transport
The transportation polytope between measures is
(2.1) |
For a fixed cost , the problem is to solve the following linear program, given input measures :
(MOT) |
In the matrix case, (MOT) is the Kantorovich formulation of OT (Vil03, ). Its dual LP is
(MOT-D) |
A basic, folklore fact about MOT is that it always has a sparse optimal solution (e.g., (anderes2016discrete, , Lemma 3)). This follows from elementary facts about standard-form LP; we provide a short proof for completeness.
Lemma 2.1 (Sparse solutions for MOT).
For any cost and any marginals , there exists an optimal solution to that has at most non-zero entries.
Proof.
Since (MOT) is an LP over a compact domain, it has an optimal solution at a vertex (BerTsi97, , Theorem 2.7). Since (MOT) is a standard-form LP, these vertices are in correspondence with basic solutions, thus their sparsity is bounded by the number of linearly dependent constraints defining (BerTsi97, , Theorem 2.4). We bound this quantity by via two observations. First, is defined by equality constraints in (2.1), one for each coordinate of each marginal constraint . Second, at least of these constraints are linearly dependent because we can construct distinct linear combinations of them, namely for each marginal , which all simplify to the same constraint , and thus are redundant with each other. ∎
Definition 2.2 (-approximate MOT solution).
is an -approximate solution to if is feasible (i.e., ) and is at most more than the optimal value.
2.2 Regularization
We introduce two standard regularization operators. First is the Shannon entropy of a tensor with entries summing to . We adopt the standard notational convention that . Second is the softmin operator, which is defined for parameter as
(2.2) |
This softmin operator naturally extends to by adopting the standard notational conventions that and .
We make use of the following folklore fact, which bounds the error between the and operators based on the regularization and the number of points. For completeness, we provide a short proof.
Lemma 2.3 (Softmin approximation bound).
For any and ,
Proof.
Assume without loss of generality that all are finite, else can be dropped (if all then the claim is trivial). For shorthand, denote by . For the first inequality, use the non-negativity of the exponential function to bound
For the second inequality, use the fact that each to bound
∎
The entropically regularized MOT problem (RMOT for short) is the convex optimization problem
(RMOT) |
This is the natural multidimensional analog of entropically regularized OT, which has a rich literature in statistics (Leo13, ) and transportation theory (Wil69, ), and has recently attracted significant interest in machine learning (Cut13, ; PeyCut17, ). The convex dual of (RMOT) is the convex optimization problem
(RMOT-D) |
In contrast to MOT, there is no analog of Lemma 2.1 for RMOT: the unique optimal solution to RMOT is dense. Further, this solution may not even be “approximately” sparse. For example, when , all are uniform, and is any positive number, the solution is fully dense with all entries equal to .
We define to be an -approximate RMOT solution in the analogous way as in Definition 2.2. A basic, folklore fact about RMOT is that if the regularization is sufficiently large, then RMOT and MOT are equivalent in terms of approximate solutions.
Lemma 2.4 (MOT and RMOT are close for large regularization ).
Proof.
Since a discrete distribution supported on atoms has entropy at most (CoverThomas, ), the objectives of (MOT) and (RMOT) differ pointwise by at most . Since (MOT) and (RMOT) also have the same feasible sets, their optimal values therefore differ by at most . ∎
3 Oracles
Here we define the three oracle variants described in the introduction and discuss their relations. In the below definitions, let be a cost tensor.
Definition 3.1 (MIN oracle).
For weights , returns
Definition 3.2 (AMIN oracle).
For weights and accuracy , returns up to additive error .
Definition 3.3 (SMIN oracle).
For weights and regularization parameter , returns
An algorithm is said to “solve” or “implement” if given input , it outputs . Similarly for and . Note that the weights that are input to SMIN have values inside ; this simplifies the notation in the treatment of the SINKHORN algorithm below and does not increase the bit-complexity by more than bit by adding a flag for the value .
Remark 3.4 (Interpretation as variants of the dual feasibility oracle).
Since these oracles form the respective bottlenecks of all algorithms from the MOT and implicit linear programming literatures (see the overview in the introduction §1.1), an important question is: if one oracle can be implemented in time, does this imply that the other can be too?
Two reductions are straightforward: the AMIN oracle can be implemented in time whenever either the MIN oracle or the SMIN oracle can be implemented in time. We record these simple observations in remarks for easy recall.
Remark 3.5 (MIN implies AMIN).
For any accuracy , the oracle provides a valid answer to the oracle by definition.
Remark 3.6 (SMIN implies AMIN).
For any and regularization , the oracle provides a valid answer to the oracle due to the approximation property of the operator (Lemma 2.3).
In the remainder of this section, we show a separation between the SMIN oracle and both the MIN and AMIN oracles by exhibiting a family of cost tensors for which there exist polynomial-time algorithms for MIN and AMIN, however there is no polynomial-time algorithm for SMIN. The non-existence of a polynomial-time algorithm of course requires a complexity theoretic assumption; our result holds under BIS-hardness—which is a by-now standard complexity theory assumption introduced in (dyer2000relative, ), and in words is the statement that there does not exist a polynomial-time algorithm for counting the number of independent sets in a bipartite graph.
Lemma 3.7 (Restrictiveness of the SMIN oracle).
There exists a family of costs for which and can be solved in time, however is #BIS-hard.
Proof.
In order to prove hardness for general , it suffices to exhibit such a family of cost tensors when . Since , it is convenient to abuse notation slightly by indexing a cost tensor by rather than by . The family we exhibit is , where the cost tensors are parameterized by a non-negative square matrix and a vector , and have entries of the form
Polynomial-time algorithm for MIN and AMIN. We show that given a matrix , vector , and weights , it is possible to compute on the cost tensor in time. Clearly this also implies a time algorithm for for any , see Remark 3.5.
To this end, we first re-write the problem on in a more convenient form that enables us to “ignore” the weights . Recall that is the problem of
Note that the linear part of the cost is equal to
(3.1) |
where is the vector with entries , and is the scalar . Thus, since is clearly computable in time, the problem is equivalent to solving
(3.2) |
when given as input a non-negative matrix and a vector .
To show that this task is solvable in time, note that the objective in (3.2) is a submodular function because it is a quadratic whose Hessian has non-positive off-diagonal terms (Bach13, , Proposition 6.3). Therefore (3.2) is a submodular optimization problem, and thus can be solved in time using classical algorithms from combinatorial optimization (GLSbook, , Chapter 10.2).
SMIN oracle is BIS-hard. On the other hand, by using the definition of the SMIN oracle, the re-parameterization (3.1), and then the definition of the softmin operator,
where is the partition function of the ferromagnetic Ising model with inconsistent external fields given by
Because it is BIS hard to compute the partition function of a ferromagnetic Ising model with inconsistent external fields (goldberg2007complexity, ), it is BIS hard to compute the value of the oracle . ∎
Remark 3.8 (The restrictiveness of SMIN extends to approximate computation).
The separation between the oracles shown in Lemma 3.7 further extends to approximate computation of the SMIN oracle under the assumption that BIS is hard to approximate, since under this assumption it is hard to approximate the partition function of a ferromagnetic Ising model with inconsistent external fields (goldberg2007complexity, ).
4 Algorithms to oracles
In this section, we consider three algorithms for MOT. Each is iterative and requires only polynomially many iterations. The key issue for each algorithm is the per-iteration runtime, which is in general exponential (roughly ). We isolate the respective bottlenecks of these three algorithms into the three variants of the dual feasibility oracle defined in §3. See §1.1 and Table 4 for a high-level overview of this section’s results.
4.1 The Ellipsoid algorithm and the MIN oracle
Among the most classical algorithms for implicit LP is the Ellipsoid algorithm (GLSbook, ; GroLovSch81, ; khachiyan1980polynomial, ). However it has taken a back seat to the SINKHORN algorithm in the vast majority of the MOT literature. The very recent paper (AltBoi20bary, ), which focuses on the specific MOT application of computing low-dimensional Wasserstein barycenters, develops a variant of the classical Ellipsoid algorithm specialized to MOT; henceforth this is called ELLIPSOID, see §4.1.1 for a description of this algorithm. The objective of this section is to analyze ELLIPSOID in the context of general MOT problems in order to prove the following.
Theorem 4.1.
For any family of cost tensors , the following are equivalent:
-
(i)
ELLIPSOID takes time to solve the problem. (Moreover, it outputs a vertex solution represented as a sparse tensor with at most non-zeros.)
-
(ii)
There exists a time algorithm that solves the problem.
-
(iii)
There exists a time algorithm that solves the problem.
Interpretation of results.
In words, the equivalence “(i) (ii)” establishes that ELLIPSOID can solve any MOT problem in polynomial time that any other algorithm can. Thus from a theoretical perspective, this paper’s restriction to ELLIPSOID is at no loss of generality for developing polynomial-time algorithms that exactly solve MOT. In words, the equivalence “(ii) (iii)” establishes that the MOT and MIN problems are polynomial-time equivalent. Thus we may investigate when MOT is tractable by instead investigating the more amenable question of when MIN is tractable (see §1.1.4) at no loss of generality.
As stated in the related work section, Theorem 4.1 is implicit from combining several known results (AltBoi20hard, ; AltBoi20bary, ; GLSbook, ). Our contribution here is to make this result explicit, since this allows us to unify algorithms from the implicit LP literature with the SINKHORN algorithm. We also significantly simplify part of the implication “(iii) (i)”, which is crucial for making an algorithm that relies on the MIN oracle practical—namely, the Column Generation algorithm discussed below.
Organization of §4.1.
In §4.1.1, we recall this ELLIPSOID algorithm and how it depends on the violation oracle for (MOT-D). In §4.1.2, we give a significantly simpler proof that the violation and feasibility oracles are polynomial-time equivalent in the case of (MOT-D), and use this to prove Theorem 4.1. In §4.1.3, we describe a practical implementation that replaces the ELLIPSOID outer loop with Column Generation.
4.1.1 Algorithm
A key component of the proof of Theorem 4.1 is the ELLIPSOID algorithm introduced in (AltBoi20bary, ) for MOT, which we describe below. In order to present this, we first define a variant of the MIN oracle that returns a minimizing tuple rather than the minimizing value.
Definition 4.2 (Violation oracle for (MOT-D)).
Given weights , returns the minimizing solution and value of .
can be viewed as a violation oracle777Recall that a violation oracle for a polytope is an algorithm that given a point , either asserts is in , or otherwise outputs the index of a violated constraint . for the decision set to (MOT-D). This is because, given , the tuple output by either provides a violated constraint if , or otherwise certifies is feasible. In (AltBoi20bary, ) it is proved that MOT can be solved with polynomially many calls to the oracle.
Theorem 4.3 (ELLIPSOID guarantee; Proposition 12 of (AltBoi20bary, )).
Algorithm 1 finds an optimal vertex solution for using calls to the oracle and additional time. The solution is returned as a sparse tensor with at most non-zero entries.
Sketch of algorithm.
Full details and a proof are in (AltBoi20bary, ). We give a brief overview here for completeness. First, recall from the implicit LP literature that the classical Ellipsoid algorithm can be implemented in polynomial time for an LP with arbitrarily many constraints so long as it has polynomially many variables and the violation oracle for its decision set is solvable in polynomial time (GLSbook, ). This does not directly apply to the LP (MOT) because that LP has variables. However, it can apply to the dual LP (MOT-D) because that LP only has variables.
This suggests a natural two-step algorithm for MOT. First, compute an optimal dual solution by directly applying the Ellipsoid algorithm to (MOT-D). Second, use this dual solution to construct a sparse primal solution. Although this dual-to-primal conversion does not extend to arbitrary LP (BerTsi97, , Exercise 4.17), the paper (AltBoi20bary, ) provides a solution by exploiting the standard-form structure of MOT. The procedure is to solve
(4.1) |
which is the MOT problem restricted to sparsity pattern , where is the set of tuples returned by the violation oracle during the execution of step one of Algorithm 1. This second step takes time using a standard LP solver, because running the Ellipsoid algorithm in the first step only calls the violation oracle times, and thus has size, and therefore the LP (4.1) has variables and constraints. In (AltBoi20bary, ) it is proved that this produces a primal vertex solution to the original MOT problem.
4.1.2 Equivalence of bottleneck to MIN
Although Theorem 4.3 shows that ELLIPSOID can solve MOT in time using the ARGMIN oracle, this is not sufficient to prove the implication “(iii)(i)” in Theorem 4.1. In order to prove that implication requires showing the polynomial-time equivalence between MIN and ARGMIN.
Lemma 4.4 (Equivalence of MIN and ARGMIN).
Each of the oracles and can be implemented using calls of the other oracle and additional time.
This equivalence follows from classical results about the equivalence of violation and feasibility oracles (yudin1976informational, ). However, the known proof of that general result requires an involved and indirect argument based on “back-and-forth” applications of the Ellipsoid algorithm (GLSbook, , §4.3). Here we exploit the special structure of MOT to give a direct and elementary proof. This is essential to practical implementations (see §4.1.3).
Proof.
It is obvious how the oracle can be implemented via a single call of the oracle; we now show the converse. Specifically, given , we show how to compute a solution for using calls to the oracle and polynomial additional time. We use the first calls to compute the first index of the solution, the next calls to compute the next index , and so on.
Formally, for , let us say that is a “partial solution” of size if there exists a solution for that satisfies for all . Then it suffices to show that for every , it is possible to compute a partial solution of size from a partial solution of size using calls to the oracle and polynomial additional time.
The simple but key observation enabling this is the following. Below, for and , define to be the vector in with value on entry , and value on all other entries. In words, the following observation states that if the constant is sufficiently large, then for any indices , replacing the vectors with the vectors in a MIN oracle query effectively performs a MIN oracle query on the original input except that now the minimization is only over satisfying .
Observation 4.5.
Set . Then for any and any ,
Proof.
By definition of the MIN oracle,
It suffices to prove that every minimizing tuple for the right hand side satisfies for all . Suppose not for sake of contradiction. Then there exists a minimizing tuple for which for some . But then , so the objective value of is at least
But this is strictly larger (by at least ) than the value of any tuple with prefix , contradicting the optimality of . ∎
Thus, given a partial solution of length , we construct a partial solution of length by setting to be a minimizer of
(4.2) |
The runtime bound is clear; it remains to show correctness. To this end, note that
where above the first and third steps are by Observation 4.5, the second step is by construction of , the fourth step is by simplifying, and the final step is by definition of being a partial solution of size . We conclude that is a partial solution of size , as desired. ∎
We can now conclude the proof of the main result of §4.1.
Proof of Theorem 4.1.
The implication “(i) (ii)” is trivial, and the implication “(ii) (iii)” is shown in (AltBoi20hard, ). It therefore suffices to show the implication “(iii) (i)”. This follows from combining the fact that ELLIPSOID solves in polynomial time given an efficient implementation of (Theorem 4.3), with the fact that the and oracles are polynomial-time equivalent (Lemma 4.4). ∎
4.1.3 Practical implementation via Column Generation
Although ELLIPSOID enjoys powerful theoretical runtime guarantees, it is slow in practice because the classical Ellipsoid algorithm is. Nevertheless, whenever ELLIPSOID is applicable (i.e., whenever the oracle can be efficiently implemented), we can use an alternative practical algorithm, namely the delayed Column Generation method COLGEN, to compute exact, sparse solutions to MOT.
For completeness, we briefly recall the idea behind COLGEN; for further details see the standard textbook (BerTsi97, , §6.1). COLGEN runs the Simplex method, keeping only basic variables in the tableau. Each time that COLGEN needs to find a Simplex variable on which to pivot, it solves the “pricing problem” of finding a variable with negative reduced cost. This is the key subroutine in COLGEN. In the present context of the MOT LP, this pricing problem is equivalent to a call to the ARGMIN violation oracle (see (BerTsi97, , Definition 3.2) for the definition of reduced costs). By the polynomial-time equivalence of the ARGMIN and MIN oracles shown in Lemma 4.4, this bottleneck subroutine in COLGEN can be computed in polynomial time whenever the MIN oracle can. For easy recall, we summarize this discussion as follows.
Theorem 4.6 (Standard guarantee for COLGEN; Section 6.1 of (BerTsi97, )).
For any , one can implement iterations of COLGEN in time and calls to the oracle. When COLGEN terminates, it returns an optimal vertex solution, which is given as a sparse tensor with at most non-zero entries.
Note that COLGEN does not have a theoretical guarantee stating that it terminates after a polynomial number of iterations. But it often performs well in practice and terminates after a small number of iterations, leading to much better empirical performance than ELLIPSOID.
4.2 The Multiplicative Weights Update algorithm and the AMIN oracle
The second classical algorithm for solving implicitly-structured LPs that we study in the context of MOT is the Multiplicative Weights Update algorithm MWU (young2001sequential, ). The objective of this section is to prove the following guarantees for its specialization to MOT.
Theorem 4.7.
For any family of cost tensors , the following are equivalent:
-
(i)
For any , MWU takes time to solve the problem -approximately. (Moreover, it outputs a sparse solution with at most non-zero entries.)
-
(ii)
There exists a -time algorithm that solves the problem -approximately for any .
-
(iii)
There exists a -time algorithm that solves the problem -approximately for any .
Interpretation of results.
Similarly to the analogous Theorem 4.1 for ELLIPSOID, the equivalence “(i) (ii)” establishes that MWU can approximately solve any MOT problem in polynomial time that any other algorithm can. Thus, from a theoretical perspective, restricting to MWU for approximately solving MOT problems is at no loss of generality. In words, the equivalence “(ii) (iii)” establishes that approximating MOT and approximating MIN are polynomial-time equivalent. Thus we may investigate when MOT is tractable to approximate by instead investigating the more amenable question of when MIN is tractable (see §1.1.4) at no loss of generality.
Theorem 4.7 is new to this work. In particular, equivalences between problems with polynomially small error do not fall under the purview of classical LP theory, which deals with exponentially small error (GLSbook, ). Our use of the MWU algorithm exploits a simple reduction of MOT to a mixed packing-covering LP that has appeared in the matrix case of Optimal Transport in (BlaJamKenSid18, ; Qua18, ), where implicit LP is not necessary for polynomial runtime.
Organization of §4.2.
4.2.1 Algorithm
Here we present the MWU algorithm, which combines the generic Multiplicative Weights Update algorithm of (young2001sequential, ) specialized to MOT, along with a final rounding step that ensures feasibility of the solution.
In order to present MWU, it is convenient to assume that the cost has entries in the range , which is at no loss of generality by simply translating and rescaling the cost (see §4.2.2), and can be done implicitly given a bound on . This is why in the rest of this subsection, every runtime dependence on is polynomial in for costs in the range ; after transformation back to , this is polynomial dependence in the standard scale-invariant quantity .
Since the cost is assumed to have non-negative entries, for any , the polytope
of couplings with cost at most is a mixed packing-covering polytope (i.e., all variables are non-negative and all constraints have non-negative coefficients). Note that is non-empty if and only if has value at most . Thus, modulo a binary search on , this reduces computing the value of to the task of detecting whether is empty. Since is a mixed packing-covering polytope, the Multiplicative Weights Update algorithm of (young2001sequential, ) determines whether is empty, and runs in polynomial time apart from one bottleneck, which we now define.
In order to define the bottleneck, we first define a potential function. For this, we define the softmax analogously to the softmin as
Here we use regularization parameter for simplicity, since this suffices for analyzing MWU, and thus we have dropped this index for shorthand.
Definition 4.8 (Potential function for MWU).
Fix a cost , target marginals , and target value . Define the potential function by
The softmax in the above expression is interpreted as a softmax over the values in the concatenation of vectors and scalars in its input. (This slight abuse of notation significantly reduces notational overhead.)
Given this potential function, we now define the bottleneck operation for MWU: find a direction in which can be increased such that the potential is increased as little as possible.
Definition 4.9 (Bottleneck oracle for MWU).
Given iterate , target marginals , target value , and accuracy , either:
-
•
Outputs “null”, certifying that , or
-
•
Outputs such that .
(If is within , then either return behavior is a valid output.)
Pseudocode for the MWU algorithm is given in Algorithm 2. We prove that MWU runs in polynomial time given access to this bottleneck oracle.
Theorem 4.10.
Let the entries of the cost lie in the range . Given and accuracy parameter , MWU either certifies that , or returns a -sparse satisfying .
Furthermore, the loop in Step 1 runs in iterations, and Step 2 runs in time.
The MWU algorithm can be used to output a -approximate solution for MOT time via an outer loop that performs binary search over ; this only incurs -multiplicative overhead in runtime.
Proof.
We analyze Step 1 (Multiplicative Weights Update) and Step 2 (rounding) of MWU separately.
Lemma 4.11 (Correctness of Step 1).
Step 1 of Algorithm 2 runs in iterations. It either returns (i) “infeasible”, certifying that is empty; or (ii) finds a -sparse tensor that is approximately in , i.e., satisfies:
Step 1 is the Multiplicative Weights Update algorithm of (young2001sequential, ) applied to the polytope , so correctness follows from the analysis of (young2001sequential, ). We briefly recall the main idea behind this algorithm for the convenience of the reader, and provide a full proof in Appendix A.1 for completeness. The main idea behind the algorithm is that on each iteration, is chosen so that the increase in the potential is approximately bounded by the increase in the total mass . If this is impossible, then the bottleneck oracle returns null, which means is empty. So assume otherwise. Then once the total mass has increased to , the potential must be bounded by . By exploiting the inequality between the max and the softmax, this means that as well. Thus, rescaling by in Line 10 satisfies , , and . See Appendix A.1 for full details and a proof of the runtime and sparsity claims.
Lemma 4.12 (Correctness of Step 2).
Step 2 of Algorithm 2 runs in time and returns satisfying . Furthermore, only has non-zero entries.
Proof of Lemma 4.12.
By Lemma 4.11, satisfies for all . Observe that this is an invariant that holds throughout the execution of Step . This, along with the fact that is equal for all , implies that the indices found in Line 14 satisfy for each . Thus in particular in Line 15. It follows that Line 16 makes at least one more constraint satisfied (in particular the constraint “” where is the minimizer in Line 15). Since there are constraints total to be satisfied, Step 2 terminates in at most iterations. Each iteration increases the number of non-zero entries in by at most one, thus is sparse throughout. That is sparse also implies that each iteration can be performed in time, thus Step 2 takes time overall.
4.2.2 Equivalence of bottleneck to AMIN
In order to prove Theorem 4.7, we show that the MWU algorithm can be implemented in polynomial time and calls to the AMIN oracle. First, we prove this fact for the ARGAMIN oracle, which differs from the AMIN oracle in that it also returns a tuple that is an approximate minimizer.
Definition 4.13 (Approximate violation oracle for (MOT-D)).
Given weights and accuracy , returns that minimizes up to additive error , and its value up to additive error .
Lemma 4.14.
Let the entries of the cost lie in the range . The MWU algorithm (Algorithm 2), can be implemented by time and calls to the oracle with accuracy parameter .
Proof.
We show that on each iteration of Step 1 of Algorithm 2 we can emulate the call to the MWU_BOTTLENECK oracle with one call to the ARGAMIN oracle. Recall that seeks to find such that
is at most , or to certify that for all it is greater than . By explicit computation,
(4.3) |
where the weights in the last line are defined as
Note that the second term in the product in (4.3) is positive and does not depend on . This suggests that in order to minimize (4.3), it suffices to compute for some accuracy parameter .
The main technical difficulty with formalizing this intuitive approach is that the weights are not necessarily efficiently computable. Nevertheless, using extra time on each iteration, we can compute the marginals . Since the ARGAMIN oracle returns an -additive approximation of the cost, we can also compute a running estimate of the cost such that, on iteration ,
Therefore, we define weights , which approximate and which can be computed in time on each iteration:
We also define the approximate value for any :
It holds that returns a that minimizes up to multiplicative error , because the entries of the cost are lower-bounded by 1, and for all . In particular, minimizes up to multiplicative error . We prove the following claim relating and :
Claim 4.15.
For any , on iteration , we have .
By the above claim, therefore minimizes up to multiplicative error if we choose . Thus one can implement by returning the value of if its value is estimated to be at most , and returning “null“ otherwise. The bound on the accuracy follows since follows since and by Theorem 4.10.
Proof of Claim.
We compare the expressions for and . Each of these is a product of two terms. Since , and for all , the ratio of the first terms is
where we have used that, for all ,
Similarly the ratio of the second terms in the expression for and is also in the range . This concludes the proof of the claim. ∎
∎
Finally, we show that the ARGAMIN oracle can be reduced to the AMIN oracle, which completes the proof that MWU can be run with AMIN.
Lemma 4.16 (Equivalence of AMIN and ARGAMIN).
Each of the oracles and with accuracy parameter can be implemented using calls of the other oracle with accuracy parameter and additional time.
It is worth remarking that the equivalence that we show between AMIN and ARGAMIN is not known to hold for the feasibility and separation oracles of general LPs, since the known result for general LPs requires exponentially small error in (GLSbook, , §4.3). However, in the case of MOT the equivalence follows from a direct and practical reduction, similar to the proof of the equivalence of the exact oracles (Lemma 4.4). The main difference is that some care is needed to bound the propagation of the errors of the approximate oracles. For completeness, we provide the full proof of Lemma 4.16 in Appendix A.
We conclude by proving Theorem 4.7.
Proof of Theorem 4.7.
The implication “(i) (ii)” is trivial, and the implication “(ii) (iii)” is shown in (AltBoi20hard, ). It therefore suffices to show the implication “(iii) (i)”. For costs with entries in the range , this follows from combining the fact that MWU can be implemented to solve in time given an efficient implementation of with polynomially-sized accuracy parameter (Lemma 4.14), along with the fact that the and oracles are polynomially-time equivalent with polynomial-sized accuracy parameter (Lemma 4.16).
The assumption that has entries within the range can be removed with no loss by translating and rescaling the original cost and running Algorithm 2 on with approximation parameter . Each -approximate query to the oracle can be simulated by a -approximate query to the oracle, where . ∎
Remark 4.17 (Practical optimizations).
Our numerical implementation of MWU has two modifications that provide practical speedups. One is maintaining a cached list of the tuples previously returned by calls to MWU_BOTTLENECK. Whenever MWU_BOTTLENECK is called, we first check whether any tuple in the cache satisfies the desiderata , in which case we use this to answer the oracle query. Otherwise, we answer the oracle query using AMIN as explained above. In practice, this cache allows us to avoid many calls to the potentially expensive AMIN bottleneck. Our second optimization is that, at each iteration of MWU, we check whether the current iterate can be rescaled in order to satisfy the guarantees in Lemma 4.11 required from Step 1. If so, we stop Step 1 early and use this rescaled .
4.3 The Sinkhorn algorithm and the SMIN oracle
The Sinkhorn algorithm (SINKHORN) is specially tailored to MOT, and does not apply to general exponential-size LP. Currently it is by far the most popular algorithm in the MOT literature (see §1.3). However, in general each iteration of SINKHORN takes exponential time , and it is not well-understood when it can be implemented in polynomial-time. The objective of this section is to show that this bottleneck is polynomial-time equivalent to the SMIN oracle, and in doing so put SINKHORN on equal footing with classical implicit LP algorithms in terms of their reliance on variants of the dual feasibility oracle for MOT. Concretely, this lets us establish the following two results.
First, SINKHORN can solve MOT in polynomial time whenever SMIN can be solved in polynomial time.
Theorem 4.18.
For any family of cost tensors and accuracy , SINKHORN solves to accuracy in time and calls to the oracle with regularization . (The solution is output through a polynomial-size implicit representation, see §4.3.1.)
Second, we show that SINKHORN requires strictly more structure than other algorithms do to solve an MOT problem. This is why the results about ELLIPSOID (Theorem 4.1) and MWU (Theorem 4.7) state that those algorithms solve an MOT problem whenever possible, whereas Theorem 4.18 cannot be analogously extended to such an “if and only if” characterization.
Theorem 4.19.
There is a family of cost tensors for which ELLIPSOID solves exactly in time, however it is BIS-hard to implement a single iteration of SINKHORN in time.
Organization of §4.3.
4.3.1 Algorithm
Here we recall SINKHORN and its known guarantees. To do this, we first define the following oracle. While this oracle does not have an interpretation as a dual feasibility oracle, we show below that it is polynomial-time equivalent to SMIN, which is a specific type of approximate dual feasibility oracle (Remark 3.6).
Definition 4.20 (MARG).
Given scalings , regularization , and an index , the marginalization oracle returns the vector .
It is known that SINKHORN can solve MOT with only polynomially many calls to this oracle (LinHoJor19, ). The approximate solution that SINKHORN computes is a fully dense tensor with non-zero entries, but it is output implicitly in space through “scaling vectors” and “rounding vectors”, described below.
Theorem 4.21 (SINKHORN guarantee, (LinHoJor19, )).
Algorithm 3 computes an -approximate solution to using calls to the oracle with parameter , and additional time. The solution is of the form
(4.4) |
and is output implicitly via the scaling vectors and rounding vectors .
Input:
Cost ,
marginals
Output: Implicit representation of tensor (4.4) that is an -approximate solution to
Sketch of algorithm.
Full details and a proof are in (LinHoJor19, ). We give a brief overview here for completeness. The main idea of SINKHORN is to solve RMOT, the entropically regularized variant of MOT described in §2.2. On one hand, this provides an -approximate solution to MOT by taking the regularization parameter sufficiently high (Lemma 2.4). On the other hand, solving RMOT rather than MOT enables exploiting the first-order optimality conditions of RMOT, which imply that the unique solution to RMOT is the unique tensor in of the form
(4.5) |
where denotes the entrywise exponentiated tensor , and are non-negative vectors. The SINKHORN algorithm approximately computes this solution in two steps.
The first and main step of Algorithm 3 is the natural multimarginal analog of the Sinkhorn scaling algorithm (Sin67, ). It computes an approximate solution by finding such that is nearly feasible in the sense that for each . Briefly, it does this via alternating optimization: initialize to the all-ones vector , and then iteratively update one so that the -th marginal of the current scaled iterate is . Although correcting one marginal can detrimentally affect the others, this algorithm nevertheless converges—in fact, in a polynomial number of iterations (LinHoJor19, ).
The second step of Algorithm 3 is the natural multimarginal analog of the rounding algorithm (AltWeeRig17, , Algorithm 2). It rounds the solution found in step one to the transportation polytope . Briefly, it performs this by scaling each marginal to be entrywise less than the desired , and then adding mass back to so that all marginals constraints are exactly satisfied. The former adjustment is done by adjusting the diagonal scalings , and the latter adjustment is done by adding a rank- term .
Critically, note that Algorithm 3 takes polynomial time except for possibly the calls to the oracle. In the absence of structure in the cost tensor , evaluating this oracle takes exponential time because it requires computing marginals of a tensor with entries.
We conclude this discussion with several remarks about SINKHORN.
Remark 4.22 (Choice of update index in SINKHORN).
In line 4 there are several ways to choose update indices, all of which lead to the polynomial iteration complexity we desire. Iteration-complexity bounds are shown for a greedy choice in (LinHoJor19, ; Fri20, ). Similar bounds can be shown for random and round-robin choices by adapting the techniques of (AltPar20bal, ). These latter two choices do not incur the overhead of MARG computations per iteration required by the greedy choice, which is helpful in practice. Empirically, we observe that round-robin works quite well, and we use this in our experiments.
Remark 4.23 (Alternative implementations of SINKHORN).
For simplicity, Algorithm 3 provides pseudocode for the “vanilla” version of SINKHORN as it performs well in practice and it achieves the polynomial iteration complexity we desire. There are several variants in the literature, including accelerated versions and first rounding small entries of the marginals—these variants have iteration-complexity bounds with better polynomial dependence on and , albeit sometimes at the expense of larger polynomial factors in (LinHoJor19, ; tupitsa2020multimarginal, ).
Remark 4.24 (Output of SINKHORN and efficient downstream tasks).
While the output of SINKHORN is fully dense with non-zero entries, its specific form (4.4) enables performing downstream tasks in polynomial time. This is conditional on a polynomial-time oracle, which is at no loss of generality since that is required for running SINKHORN in polynomial time in the first place. The basic idea is that is a mixture of two simple distributions (modulo normalization). The first is , which is marginalizable using . The second is , which is easily marginalizable since it is a product distribution (as the are non-negative). Together, this enables efficient marginalization of . By recursively marginalizing on conditional distributions, this enables efficiently sampling from . This in turn enables efficient estimation of bounded statistics of (e.g., the cost ) by Hoeffding’s inequality.
4.3.2 Equivalence of bottleneck to SMIN
Although Theorem 4.21 shows that SINKHORN solves MOT in polynomial time using the MARG oracle, this is neither sufficient to prove the implication “(ii)(i)” in Theorem 4.18, nor to prove Theorem 4.19. In order to prove these results, we show that SMIN and MARG are polynomial-time equivalent.
Lemma 4.25 (Equivalence of MARG and SMIN).
For any regularization , each of the oracles and can be implemented using calls of the other oracle and additional time.
Proof.
Reduction from SMIN to MARG. First, we show how to compute using one call to the marginalization oracle and additional time. Consider the entrywise exponentiated matrix , and let be the answer to . Observe that
where above the first step is by definition of , the second step is by definition of and combining the sums, and the third step is by definition of . We conclude that is a valid answer to . Since this is clearly computable from in time, this establishes the claimed reduction.
Reduction from MARG to SMIN. Next, we show that for any marginalization index and entry , it is possible to compute the -th entry of the vector using one call to the oracle and additional time. Define to be the vector with -th entry equal to , and all other entries . Define the matrix , where recall that (see §2). Let denote the answer to . Observe that
where above the first step is by definition of , the second step is by definition of and , and the third step is by definition of the marginalization notation . We conclude that is a valid answer for the -th entry of the vector . This establishes the claimed reduction since we may repeat this procedure times to compute all entries of the the vector .
∎
We can now conclude the proofs of the main results of §4.3.
Proof of Theorem 4.18.
Proof of Theorem 4.19.
Consider the family of cost tensors in Lemma 3.7 for which the oracle admits a polynomial-time algorithm, but for which the oracle is #BIS-hard. Then on one hand, the ELLIPSOID algorithm solves in polynomial time by Theorem 4.1. And on the other hand, it is #BIS-hard to implement a single iteration of SINKHORN because that requires implementing the oracle, which is polynomial-time equivalent to the oracle by Lemma 4.25. ∎
5 Application: MOT problems with graphical structure
In this section, we illustrate our algorithmic framework on MOT problems with graphical structure. Although a polynomial-time algorithm is already known for this particular structure (h20gm, ; teh2002unified, ), that algorithm computes solutions that are approximate and dense; see the related work section for details. By combining our algorithmic framework developed above with classical facts about graphical models, we show that it is possible to compute solutions that are exact and sparse in polynomial time.
The section is organized as follows. In §5.1, we recall the definition of graphical structure. In §5.2, we show that the MIN, AMIN, and SMIN oracles can be implemented in polynomial time for cost tensors with graphical structure; from this it immediately follows that all of the MOT algorithms discussed in part 1 of this paper can be implemented in polynomial time. Finally, in §5.3, we demonstrate our results on the popular application of computing generalized Euler flows, which was the original motivation of MOT. Numerical simulations demonstrate how the exact, sparse solutions produced by our new algorithms provide qualitatively better solutions than previously possible in polynomial time.
5.1 Setup
We begin by recalling preliminaries about undirected graphical models, a.k.a., Markov Random Fields. We recall only the relevant background; for further details we refer the reader to the textbooks (KolFri09, ; wainwright2008graphical, ).
In words, graphical models provide a way of encoding the independence structure of a collection of random variables in terms of a graph. The formal definition is as follows. Below, all graphs are undirected, and the notation means the power set of (i.e., the set of all subsets of ).
Definition 5.1 (Graphical model structure).
Let . The graphical model structure corresponding to is the graph with vertices and edges .
Definition 5.2 (Graphical model).
Let . A probability distribution over is a graphical model with structure if there exist functions and normalizing constant such that
A standard measure of complexity for graphical models is the treewidth of the underlying graphical model structure because this captures not just the storage complexity, but also the algorithmic complexity of performing fundamental tasks such as computing the mode, log-partition function, and marginal distributions (KolFri09, ; wainwright2008graphical, ). There are a number of equivalent definitions of treewidth (bodlaender2007treewidth, ). Each requires defining intermediate combinatorial concepts. We recall here the definition that is based on the concept of a junction tree because this is perhaps the most standard definition in the graphical models community.
Definition 5.3 (Junction tree, treewidth).
A junction tree for a graph consists of a tree and a set of bags satisfying:
-
•
For each variable , the set of nodes induces a subtree of .
-
•
For each edge , there is some bag containing both endpoints of .
The width of the junction tree is one less than the size of the largest bag, i.e., is . The treewidth of a graph is the width of its minimum-width junction tree.
We now formally recall the definition of graphical structure for MOT.
|
|
||
|
|
|
|
||
|
|
|
|
||
|
|
Definition 5.4 (Graphical structure for MOT).
An MOT cost tensor has graphical structure with treewidth if there is a graphical model structure and functions such that
(5.1) |
and such that the graph has treewidth .
We make three remarks about this structure. First, note that the functions can be arbitrary so long as the corresponding graphical model structure has treewidth at most .
Second, if Definition 5.4 did not constrain the treewidth , then every tensor would trivially have graphical structure with maximal treewidth (take to be the singleton containing , to be the complete graph, and to be ). Just like all previous algorithms, our algorithms have runtimes that depend exponentially (only) on the treewidth of . This is optimal in the sense that unless , there is no algorithm with jointly polynomial runtime in the input size and treewidth (AltBoi20hard, ). We also point out that in all current applications of graphically structured MOT, the treewidth is either or , see §1.3.
Third, as in all previous work on graphically structured MOT, we make the natural assumptions that the cost is input implicitly through the functions , and that each function can be evaluated in polynomial time, since otherwise graphical structure is useless for designing polynomial-time algorithms. In all applications in the literature, these two basic assumptions are always satisfied. Note also that if the treewidth of the graphical structure is constant, then there is a linear-time algorithm to compute the treewidth and a corresponding minimum-width junction tree (bodlaender1996linear, ).
5.2 Polynomial-time algorithms
By our oracle reductions in §4, in order to design polynomial-time algorithms for MOT with graphical structure, it suffices to design polynomial-time algorithms for the MIN, AMIN, or SMIN oracles. This follows directly from classical algorithmic results in the graphical models literature (KolFri09, ).
Theorem 5.5 (Polynomial-time algorithms for the MIN, AMIN, and SMIN oracles for costs with graphical structure).
Let be a cost tensor that has graphical structure with constant treewidth (see Definition 5.4). Then the , , and oracles can be computed in time.
Proof.
Consider input for the oracles. Let denote the probability distribution on given by
(5.2) |
where ensures is normalized. Observe that the oracle amounts888In fact, for the purpose of computing , the distribution can be defined using any . to computing the mode of the distribution because , where is a maximizer of . Also, the oracle amounts to computing the partition function because . Thus it suffices to compute the mode and partition function of in polynomial time. (The oracle follows from the oracle by Remark 3.5).
To this end, observe that by assumption on , there is a graphical model structure and functions such that the corresponding graph has treewidth and the distribution factors as
It follows that is a graphical model with respect to the same graphical model structure because the “vertex potentials” do not affect the underlying graphical model structure. Thus is a graphical model with constant treewidth , so we may compute the mode and partition function of in time using, respectively, the classical max-product and sum-product algorithms (KolFri09, , Chapters 13.3 and 10.2). For convenience, pseudocode summarizing this discussion is provided in Algorithms 4 and 5. ∎
An immediate consequence of Theorem 5.5 combined with our oracle reductions is that all candidate MOT algorithms in §4 can be efficiently implemented for MOT problems with graphical structure. From a theoretical perspective, ELLIPSOID gives the best guarantee since it produces an exact, sparse solution.
Corollary 5.6 (Polynomial-time algorithms for MOT problems with graphical structure).
Let be a cost tensor that has graphical structure with constant treewidth (see Definition 5.4). Then:
-
•
The ELLIPSOID algorithm in §4.1 computes an exact solution to in time.
-
•
The MWU algorithm in §4.2 computes an -approximate solution to in time.
-
•
The SINKHORN algorithm in §4.3 computes an -approximate solution to in time.
-
•
The COLGEN algorithm in §4.1.3 can be run for iterations in time.
Moreover, ELLIPSOID, MWU, and COLGEN output a polynomially sparse tensor, whereas SINKHORN outputs a fully dense tensor through the implicit representation described in §4.3.1.
5.3 Application vignette: fluid dynamics
In this section, we numerically demonstrate our new results for graphically structured MOT—namely the ability to compute exact, sparse solutions in polynomial time (Corollary 5.6). We illustrate this on the problem of computing generalized Euler flows—an MOT application which has received significant interest and which was historically the motivation of MOT, see e.g., (BenCarCut15, ; BenCarNen16, ; Bre08, ; Bre89, ; Bre93, ; Bre99, ). This MOT problem is already known to be tractable via a popular, specially-tailored modification of SINKHORN (BenCarCut15, )—which can be interpreted as implementing SINKHORN using graphical structure (h20gm, ; teh2002unified, ). However, that algorithm is based on SINKHORN and thus unavoidably produces solutions that are low-precision (due to runtime dependence), fully dense (with non-zero entries), and have well-documented numerical precision issues. We offer the first polynomial-time algorithm for computing exact and/or sparse solutions.
We briefly recall the premise of this MOT problem; for further background see (Bre08, ; BenCarCut15, ). An incompressible fluid (e.g., water) is modeled by particles which are uniformly distributed in space (due to incompressibility) at all times . We observe each particle’s location at initial time and final time . The task is to infer the particles’ locations at all intermediate times , and this is modeled by an MOT problem as follows.
Specifically, the locations of the fluid particles are discretized to points , and is a known permutation on this set that encodes the relation between each initial location at time and final location at time . The total movement of a particle that takes the trajectory is given by
(5.3) |
By the principle of least action, the generalized Euler flow problem of inferring the most likely trajectories of the fluid particles is given by the solution to the MOT problem with this cost and uniform marginals which impose the constraint that the fluid is incompressible.
Corollary 5.7 (Exact, sparse solutions for generalized Euler flows).
The MOT problem with cost (5.3) can be solved in time. The solution is returned as a sparse tensor with at most non-zeros.
Proof.
This cost tensor can be expressed in graphical form where consists of the sets of adjacent time points as well as the set . Moreover, each function can be computed in time since this simply requires computing for pairs of points . Once this graphical representation is computed, Corollary 5.6 implies a time algorithm for this MOT problem because the graphical model structure is a cycle graph and thus has treewidth (cf., Figure 3). ∎
|
|
|
|
||||||||||||||||
SINKHORN | COLGEN |
Figure 10 illustrates how the exact, sparse solutions found by our new algorithm provide visually sharper estimates than the popular modification of SINKHORN in (BenCarCut15, ), which blurs the trajectories. The latter is the state-of-the-art algorithm in the literature and in particular is the only previously known non-heuristic algorithm that has polynomial-time guarantees. Note that this algorithm is identical to implementing SINKHORN by exploiting the graphical structure to perform exact marginalization efficiently (teh2002unified, ; h20gm, ).
The numerical simulation is on a standard benchmark problem used in the literature (see e.g., (BenCarCut15, , Figure 9) and (Bre08, , Figure 2)) in which the particle at initial location moves to final location . This is run with and marginals uniformly supported on positions in . See Appendix B for numerics on other standard benchmark instances. Note that this amounts to solving an MOT LP with variables, which is infeasible for standard LP solvers. Our algorithm is the first to compute exact solutions for problem instances of this scale.
Two important remarks. First, since this MOT problem is a discretization of the underlying PDE, an exact solution is of course not necessary; however, there is an important—even qualitative—difference between low-precision solutions (computable with runtime) and high-precision solutions (computable with runtime) for the discretized problem. Second, a desirable feature of SINKHORN that should be emphasized is its practical scalability, which might make it advantageous for problems where very fine discretization is required. It is an interesting direction of practical relevance to develop algorithms that can compute high-precision solutions at a similarly large scale in practice (see the discussion in §8).
6 Application: MOT problems with set-optimization structure
In this section, we consider MOT problems whose cost tensors take values and —or more generally any two values, by a straightforward reduction111111If takes two values , then define the tensor with -entries by . It is straightforward to see that the MOT problems with costs and have identical solutions.. Such MOT problems arise naturally in applications where one seeks to minimize or maximize the probability that some event occurs given marginal probabilities on each variable (see Example 6.1). We establish that this general class of MOT problems can be solved in polynomial time under a condition on the sparsity pattern of that is often simple to check due its connection to classical combinatorial optimization problems.
The section is organized as follows. In §6.1 we formally describe this setup and discuss why it is incomparable to all other structures discussed in this paper. In §6.2, we show that for costs with this structure, the MIN, AMIN, and SMIN oracles can be implemented in polynomial time; from this it immediately follows that the ELLIPSOID, MWU, SINKHORN, and COLGEN algorithms discussed in part 1 of this paper can be implemented in polynomial time. In §6.3, we illustrate our results via a case study on network reliability.
6.1 Setup
Example 6.1 (Motivation for binary-valued MOT costs: minimizing/maximizing probability of an event).
Let . If , then the problem amounts to minimizing the probability that event occurs, given marginals on each variable. On the other hand, if , then the problem amounts to maximizing the probability that event occurs since
Even if the cost is binary-valued, there is no hope to solve MOT in polynomial time without further assumptions—essentially because in the worst case, any algorithm must query all entries if is a completely arbitrary -valued tensor.
We show that MOT is polynomial-time solvable under the general and often simple-to-check condition that the MIN, AMIN, and SMIN oracles introduced in §3 are polynomial-time solvable when restricted to the set of indices for which . For simplicity, our definition of these set oracles removes the cost as it is constant on . Of course it is also possible to remove the negative sign in by re-parameterizing the inputs as ; however, we keep this notation in order to parallel the original oracles.
Definition 6.2 (MIN set oracle).
Let . For weights , returns
Definition 6.3 (AMIN set oracle).
Let . For weights and accuracy , returns up to additive error .
Definition 6.4 (SMIN set oracle).
Let . For weights and regularization parameter , returns
The key motivation behind these set oracle definitions (aside from the syntactic similarity to the original oracles) is that they encode the problem of (approximately) finding the min-weight object in . This opens the door to combinatorial applications of MOT because finding the min-weight object in is well-known to be polynomial-time solvable for many “combinatorial-structured” sets of interest—e.g., the set of cuts in a graph, or the set of independent sets in a matroid. See §6.3 for fully-detailed applications.
Definition 6.5 (Set-optimization structure for MOT).
An MOT cost tensor has exact, approximate, or soft set-optimization structure of complexity if
for a set for which there is an algorithm solving , , or , respectively, in time.
We make two remarks about this structure.
Remark 6.6 (Only require set oracle for , not for ).
Note that Definition 6.5 only requires the set oracles for the set of entries where is , and does not need the set oracles for the set where is . The fact that both set oracles are not needed makes set-optimization structure easier to check than the original oracles in §3, because those effectively require optimization over both and .
Remark 6.7 (Set-optimization structure is incomparable to graphical and low-rank plus sparse structure).
Costs that satisfy Definition 6.5 in general do not have non-trivial graphical structure or low-rank plus sparse structure. Specifically, there are costs that satisfy Definition 6.5, yet require maximal treewidth to model via graphical structure, and super-constant rank or exponential sparsity to model via low-rank plus sparse structure. (A concrete example is the network reliability application in §6.3.) Because of the -hardness of MOT problems with -treewidth graphical structure or super-constant rank (AltBoi20hard, ), simply modeling such problems with graphical structure or low-rank plus rank structure is therefore useless for the purpose of designing polynomial-time MOT algorithms.
6.2 Polynomial-time algorithms
By our oracle reductions in part 1 of this paper, in order to design polynomial-time algorithms for MOT with set-optimization structure, it suffices to design polynomial-time algorithms for the MIN, AMIN, or SMIN oracles. We show how to do this for all three oracles in a straightforward way by exploiting the set-optimization structure.
Theorem 6.8 (Polynomial-time algorithms for the MIN, AMIN, and SMIN oracles for costs with set-optimization structure).
If is a cost tensor with exact, approximate, or soft set-optimization structure of complexity (see Definition 6.5), then the , , and oracles, respectively, can be computed in time.
Input:
Access to via oracle, matrix
Output: Solution to
Input:
Access to via oracle, matrix , regularization
Output: Solution to
Proof.
Polynomial-time algorithm for MIN. We first claim that Algorithm 6 implements the oracle. To this end, define
(6.1) |
By re-arranging the sum and max, it follows that
(6.2) |
Therefore
(6.3) |
where above the first step is by definition of ; the second step is by partitioning the minimization over into such that or , and then plugging in the definitions of and ; the third step is by manipulating in both cases; and the last step is because as shown above. We conclude that Algorithm 6 correctly outputs . Since the algorithm uses one call to the oracle and additional time, the claim is proven.
Polynomial-time algorithm for AMIN. Next, we claim that the same Algorithm 6, now run with the approximate oracle in the first step instead of the exact oracle , computes a valid solution to . To prove this, let , , and be as defined in (6.1) and (6.2) for the MIN analysis, and let . By the same logic as in (6.3), except now reversed, the output
is equal to . Now because is within additive error of (by definition of the oracle), it follows that the above output is within additive error of
Thus the output is a valid answer to , establishing correctness. The runtime claim is obvious.
Polynomial-time algorithm for SMIN. Finally, we claim that Algorithm 7 implements the oracle. To this end, define
By re-arranging products and sums, it follows that
Therefore
where above the first step is by definition of ; the second step is by partitioning the sum over into such that or , and then plugging in the definitions of and ; and the third step is because as shown above. We conclude that Algorithm 7 correctly outputs . Since the algorithm uses one call to the oracle and additional time, the claim is proven. ∎
An immediate consequence of Theorem 6.8 combined with our oracle reductions is that all of the candidate MOT algorithms described in §4 can be efficiently implemented for MOT problems with set-optimization structure. From a theoretical perspective, the ELLIPSOID algorithm gives the best guarantee since it produces an exact, sparse solution in polynomial time.
Corollary 6.9 (Polynomial-time algorithms for MOT problems with set-optimization structure).
Let be a cost tensor that has set-optimization structure with complexity (see Definition 6.5).
- •
-
•
Approximate set-optimization structure. The MWU algorithm in §4.2 computes an -approximate solution to in time.
-
•
Soft set-optimization structure. The SINKHORN algorithm in §4.3 computes an -approximate solution to in time.
Moreover, ELLIPSOID, MWU, and COLGEN output a polynomially sparse tensor, whereas SINKHORN outputs a fully dense tensor through the implicit representation described in §4.3.1.
6.3 Application vignette: network reliability with correlations
In this section, we illustrate this class of MOT structures via an application to network reliability, a central topic in network science, engineering, and operations research, see e.g., the textbooks (gertsbakh2011network, ; ball1995network, ; ball1986computational, ). The basic network reliability question is: given an undirected graph where each edge is reliable with some probability and fails with probability , what is the probability that all vertices are reachable from all others? This connectivity is desirable in applications, e.g., if is a computer cluster, the vertices are the machines, and the edges are communication links, then connectivity corresponds to the reachability of all machines. See the aforementioned textbooks for many other applications.
Of course, the above network reliability question is not yet well-defined since the edge failures are only prescribed up to their marginal distributions. Which joint distribution greatly impacts the answer.
The most classical setup posits that edge failures are independent (moore1956reliable, ). Denote the network reliability probability for this setting by . This quantity is P-complete (valiant1979complexity, ; provan1983complexity, ) and thus -hard to compute, but there exist fully polynomial randomized approximation schemes (a.k.a. FPRAS) for multiplicatively approximating both the connection probability (karger2001randomized, ) and the failure probability (guo2019polynomial, ).
Here we investigate the setting of coordinated edge failures, which dates back to the 1980s (weiss1986stochastic, ; zemel1982polynomial, ). This coordination may optimize for disconnection (e.g., by an adversary), or for connection (e.g., maximize the time a network is connected while performing maintenance on each edge during fraction of the time). We define these notions below; see also Figure 5 for an illustration. Below, denotes the Bernoulli distribution with parameter .
Definition 6.10 (Network reliability with correlations).
For an undirected graph and edge reliability probabilities :
-
•
The worst-case network reliability is
-
•
The best-case network reliability is

Clearly . These gaps can be large (e.g., see Figure 5), which promises large opportunities for applications in which coordination is possible. However, in order to realize such an opportunity requires being able to compute and , and both of these problems require solving an exponentially large LP with variables. Below we show how to use set-optimization structure to compute these quantities in time, thereby recovering as a special case of our general framework the known polynomial-time algorithms for this particular problem in (zemel1982polynomial, ; weiss1986stochastic, ), as well as more practical polynomial-time algorithms that scale to input sizes that are an order-of-magnitude larger.
Corollary 6.11 (Polynomial-time algorithm for network reliability with correlations).
The worst-case and best-case network reliability can both be computed in time.
Proof.
By the observation in Example 6.1, the optimization problems defining and are instances of MOT in which , , , and each entry of the cost is the indicator of whether that subset of edges is a connected or disconnected subgraph of , respectively. It therefore suffices to show that both of these MOT cost tensors satisfy exact set-optimization structure (Definition 6.5) since that implies a polynomial-time algorithm for exactly solving MOT (Corollary 6.9).
Set-optimization structure for .
In this case, is the set of connected subgraphs of . Thus the problem is: given weights , compute
Note that this objective is equal to where . Since the latter sum is independent of , the problem therefore reduces to the problem of finding a minimum-weight connected subgraph in ; that is, given edge weights , compute
(6.4) |
We first show how to solve this in polynomial time in the case that all edge weights are positive. In this case, the optimal solution is a minimum-weight spanning tree of . This can be found by Kruskal’s algorithm in time (kruskal1956shortest, ).
For the general case of arbitrary edge weights, note that the edges with non-positive weight can be added to any solution without worsening the cost or feasibility. Thus these edges are without loss of generality in every solution , and so it suffices to solve the same problem (6.4) on the graph obtained by contracting these non-positively-weighted edges in . This reduces (6.4) to the same problem of finding a minimum-weight connected subgraph, except now in the special case that all edge weights are positive. Since we have already shown how to solve this case in polynomial time, the proof is complete.
Set-optimization structure for .
In this case, is the set of disconnected subgraphs of . We may simplify the problem for by re-parameterizing the input to edge weights as done above in (6.4) for . Thus the problem for is: given weights , compute
(6.5) |
We first show how to solve this in the case that all edge weights are negative. In that case, the optimal solution is of the form , where is a maximum-weight cut of the graph with weights . Equivalently, by negating all edge weights, is a minimum-weight cut of the graph with weights . Since a minimum-weight cut of a graph with positive weights can be found in polynomial time (stoer1997simple, ), the problem (6.5) can be solved in polynomial time when all are negative.
Now in the general case of arbitrary edge weights, note that the edges with non-negative weight can be removed from any solution without worsening the cost or feasibility. Thus these edges are without loss of generality not in every solution , and so it suffices to solve the same problem (6.5) on the graph obtained by deleting these non-negatively-weighted edges in . This reduces (6.5) to the same problem of finding a minimum-weight disconnected subgraph, except now in the special case that all edge weights are negative. Since we have already shown how to solve this case in polynomial time, the proof is complete. ∎
![]() |
![]() |
![]() |
![]() |
In Figure 6, we compare the numerical performance of the algorithms in Corollary 6.11—COLGEN and MWU with polynomial-time implementation of their bottlenecks—with the fastest previous algorithms for both best-case and worst-case network reliability. Previously, the fastest algorithms that apply to this problem are (1) out-of-the-box LP solvers run on MOT, (2) the brute-force implementation of SINKHORN which marginalizes over all entries in each iteration, and (3) this COLGEN algorithm that we recover (zemel1982polynomial, ; weiss1986stochastic, ). It is unknown if there is a practically efficient implementation of the oracle (and thus of SINKHORN) for both best-case or worst-case reliability. Since the previous algorithms (1) and (2) have exponential runtime that scales as , they do not scale past tiny input sizes. In contrast, the algorithms in Corollary 6.11 scale to much larger inputs. Indeed, the COLGEN algorithm that our framework recovers can compute exact solutions roughly an order-of-magnitude faster than the other algorithms, and the new MWU algorithm computes reasonably approximate solutions beyond , which amounts to an MOT LP with variables.
7 Application: MOT problems with low-rank plus sparse structure
In this section, we consider MOT problems whose cost tensors decompose into low-rank and sparse components. We propose the first polynomial-time algorithms for this general class of MOT problems.
The section is organized as follows. In §7.1 we formally describe this setup and discuss why it is incomparable to all other structures discussed in this paper. In §7.2, we show that for costs with this structure, the AMIN and SMIN oracles can be implemented in polynomial time; from this it immediately follows that MWU and SINKHORN can be implemented in polynomial time. Finally, in §7.3 and §7.4, we provide two illustrative applications of these algorithms. The former regards portfolio risk management and is a direct application of our result for MOT with low-rank cost tensors. The latter regards projecting mixture distributions to the transportation polytope and illustrates the versality of our algorithmic results since this problem is quadratic optimization over the transportation polytope rather than linear (a.k.a. MOT).
7.1 Setup
We begin by recalling the definition of tensor rank. It is the direct analog of the standard concept of matrix rank. See the survey (kolda2009tensor, ) for further background.
Definition 7.1 (Tensor rank).
A rank- factorization of a tensor is a collection of vectors satisfying
The rank of a tensor is the minimal for which there exists a rank- factorization.
In this section we consider MOT problems with the following “low-rank plus sparse” structure.
Definition 7.2 (Low-rank plus sparse structure for MOT).
An MOT cost tensor has low-rank plus sparse structure of rank and sparsity if it decomposes as
(7.1) |
where is a rank- tensor and is an -sparse tensor.
Throughout, we make the natural assumption that is input through its non-zero entries, and that is input through a rank- factorization. We also make the natural assumption that the entries of both and are of size —this rules out the case of having extremely large entries of and , one positive and one negative, which cancel to yield a small entry of .
Remark 7.3 (Neither low-rank structure nor sparse structure can be modeled by graphical structure or set-optimization structure).
In general, both rank- costs and polynomially sparse costs do not have non-trivial graphical structure. Specifically, modeling these costs with graphical structure requires the complete graph (a.k.a., maximal treewidth of )—and because MOT problems with graphical structure of treewidth are -hard to solve in the absence of further structure (AltBoi20hard, ), modeling such problems with graphical structure is useless for the purpose of designing polynomial-time MOT algorithms. It is also clear that neither low-rank structure nor sparse structure can be modeled by set-optimization structure because in general, neither nor nor has binary-valued entries.
7.2 Polynomial-time algorithms
From a technical perspective, the main result of this section is that there is a polynomial-time algorithm for approximating the minimum entry of a tensor that decomposes into constant-rank and sparse components. Previously, this was not known even for constant-rank tensors. This result may be of independent interest. We remark that this result is optimal in the sense that unless , there does not exist an algorithm with runtime that is jointly polynomial in the input size and the rank (AltBoi20hard, ).
Theorem 7.4 (Polynomial-time algorithm solving AMIN and SMIN for low-rank + sparse costs).
We make three remarks about Theorem 7.4. First, we are unaware of any polynomial-time implementation of for the cost . Instead, Theorem 7.4 solves the oracle for an -approximate cost tensor since this is sufficient for implementing SINKHORN on the original cost tensor (see Corollary 7.5 below). Second, it is an interesting open question if the runtime for the -approximate oracle can be improved to , as this would imply a runtime for the oracle and thus for this class of MOT problems (see also Footnote 5 in the introduction). Third, we remark about practical efficiency: the runtime of Algorithm 8 is not just polynomially small in and , but in fact linear in and near-linear in . However, since this improved runtime is not needed for the theoretical results in the sequel, we do not pursue this further.
Combining the efficient oracle implementations in Theorem 7.4 with our algorithm-to-oracles reductions in §4 implies the first polynomial-time algorithms for MOT problems with costs that have constant-rank plus sparse structure. This is optimal in the sense that unless , there does not exist an algorithm with runtime that is jointly polynomial in the input size and the rank (AltBoi20hard, ).
Corollary 7.5 (Polynomial-time algorithms solving MOT for low-rank + sparse costs).
Consider cost tensors that have low-rank plus sparse structure of constant rank and sparsity (see Definition 7.2). For any :
-
•
The MWU algorithm in §4.2 computes an -approximate solution to in time.
-
•
The SINKHORN algorithm in §4.3 computes an -approximate solution to in time.
Moreover, MWU outputs a polynomially sparse tensor, whereas SINKHORN outputs a fully dense tensor through the implicit representation described in §4.3.1.
Proof.
For MWU, simply combine the polynomial-time reduction to the oracle (Theorem 4.7) with the polynomial-time algorithm for the AMIN oracle (Theorem 7.4). For SINKHORN, combining the polynomial-time reduction to the oracle (Theorem 4.18) with the polynomial-time algorithm for the oracle (Theorem 7.4) yields a algorithm for -approximating the MOT problem with cost tensor . It therefore suffices to show that the values of the MOT problems with cost tensors and differ by at most , that is,
But this holds because both MOT problems have the same feasible set, and for any feasible it follows from Hölder’s inequality that the objectives of the two MOT problems differ by at most
∎
Below, we describe the algorithm in Theorem 7.4. Specifically, in §7.2.1, we give four helper lemmas which form the technical core of our algorithm; and then in §7.2.2, we combine these ingredients to design the algorithm and prove its correctness. Throughout, recall that we use the bracket notation to denote the entrwise application of a univariate function (e.g., , , or a polynomial) to .
7.2.1 Technical ingredients
At a high level, our approach to designing the algorithm in Theorem 7.4 is to approximately compute the SMIN oracle in polynomial time by synthesizing four facts:
-
1.
By expanding the softmin and performing simple operations, it suffices to compute the total sum of all entries of the entrywise exponentiated tensor (modulo simple transforms).
-
2.
Although is in general a full-rank tensor, we can exploit the fact that is a low-rank tensor in order to approximate by a low-rank tensor . (Moreover, we can efficiently compute a low-rank factorization of in closed form.)
-
3.
There is a simple algorithm for computing the sum of all entries of in polynomial time because is low-rank. (And thus we may approximate the sum of all entries of as desired in step 1.)
-
4.
This approximation is sufficient for computing both the AMIN and SMIN oracle in Theorem 7.4.
Of these four steps, the main technical step is the low-rank approximation in step two. Below, we formalize these four steps individually in Lemmas 7.6, 7.7, 7.8, and 7.9. Further detail on how to synthesize these four steps is then provided afterwards, in the proof of Theorem 7.4.
It is convenient to write the first lemma in terms of an approximate tensor rather than the original cost .
Lemma 7.6 (Softmin for cost with sparse component).
Let and . Then
where ,
(7.2) |
and
(7.3) |
Proof.
By expanding the definition of softmin, and then substituting with and with ,
By simple manipulations, we conclude that the above quantity is equal to the desired quantity:
Above, the first step is by partitioning the sum over based on if , the second step is by adding and subtracting , and the last step is by definition of and . ∎
Lemma 7.7 (Low-rank approximation of the exponential of a low-rank tensor).
There is an algorithm that given in rank- factored form, , and a precision , takes time to compute a rank- tensor in factored form satisfying , where
(7.4) |
Proof.
By classical results from approximation theory (see, e.g., (Tre19, )), there exists a polynomial of degree satisfying
For instance, the Taylor or Chebyshev expansion of suffices. Thus the tensor with entries
approximates to error
We now show that has rank , and moreover that a rank- factorization can be computed in time. Denote and . By definition of , definition of and , and then the Multinomial Theorem,
where the sum is over -tuples with non-negative entries summing to at most . Thus
where denotes the vector with -th entry for , and for . This yields the desired low-rank factorization of because
Finally, since each of the vectors in the factorization of can be computed efficiently from the closed-form expression above, the desired runtime follows. ∎
Lemma 7.8 (Marginalizing a scaled low-rank tensor).
Given vectors and a tensor through a rank factorization, we can compute in time.
Proof.
Denote the factorization of by . Then
where the first step is by definition of the operation that sums over all entries, the second step is by definition of , and the third step is by swapping products and sums. Thus computing the desired quantity amounts to computing inner products of -dimensional vectors. This can be done in time. ∎
Lemma 7.9 (Precision of the low-rank approximation).
Let . Suppose satisfies . Then the matrix satisfies
(7.5) |
Proof.
Observe that the minimum entry of is at least
(7.6) |
Since this is strictly positive, the tensor is well defined. Furthermore,
where above the first step is by definition of the max norm; the second step is by the elementary inequality which holds for positive scalars and (AltBacRud19, , Lemma K); and the third step is by (7.6) and the approximation bound of . Since , we therefore conclude that . By adding and subtracting , this implies . ∎
7.2.2 Proof of Theorem 7.4
We are now ready to state the algorithm in Theorem 7.4. Pseudocode is in Algorithm 8. Note that and are never explicitly computed because in both Lines 3 and 4, the algorithm performs the relevant operations only through the low-rank tensor and the sparse tensor .
Input:
Low-rank tensor , sparse tensor , matrix ,
accuracy
Output: Solution to both on cost tensor , and also on some approximate cost tensor satisfying
Proof of Theorem 7.4.
Proof of correctness for SMIN. Consider any oracle inputs . By Lemma 7.9, the tensor satisfies . Therefore it suffices to show that Algorithm 8 correctly computes . This is true because that quantity is equal to by Lemma 7.6.
Proof of correctness for AMIN. We have just established that Algorithm 8 computes . Because and the fact that SMIN is a special case of AMIN (Remark 3.6), it follows that is within additive accuracy of . Therefore, by the triangle inequality, it suffices to show that is within additive accuracy of . That is, it suffices to show that
But this is true because by Lemma 7.9, and thus the quantities and are within additive accuracy for each .
Proof of runtime. We prove the claimed runtime bound simultaneously for the AMIN and SMIN computation because we use the same algorithm for both. To this end, we first bound the rank of the low-rank approximation computed in Lemma 7.7. Note that since and since it is assumed that , we have . Therefore
Above, the first step is by Lemma 7.7, and the final step is because is assumed constant.
7.3 Application vignette: risk estimation
Here we consider an application to portfolio risk management. For simplicity of exposition, let us first describe the setting of financial instrument (“stock”). Consider investing in one unit of a stock for years. For , let denote the price of the stock at year . Suppose that the return of the stock between years and is believed to follow some distribution . A fundamental question about the riskiness of this stock is to compute the investor’s expected profit in the worst-case over all joint probability distributions on future returns that are consistent with the modeled marginal distributions . This is an MOT problem with cost given by
where here we view as a function rather than a tensor for notational simplicity. If each return has possible values (e.g., after quantization), then the cost is equivalently represented as a rank-1 tensor in (by assigning an index to each of the possible values of each ). Therefore our result Corollary 7.5 provides a polynomial-time algorithm for solving this MOT problem defining the investor’s worst-case profit.
Rather than formalize this proof for stock, we directly generalize to the general case of investing in stocks, . This is essentially identical to the simple case of stock, modulo additional notation.
Corollary 7.10 (Polynomial-time algorithm for expected profit given marginals on the returns).
Suppose an investor holds unit of stocks for years. For each stock and each year , let denote the relative price of stock between years and . Suppose has distribution , and that each has at most atoms. Let denote the maximal possible return. For any constant number of stocks , there is a time algorithm for -approximating the expected profit in the worst-case over all futures that are consistent with the returns’ marginal distributions.
Proof.
This is the optimization problem
over all joint distributions on the returns that are consistent with the marginal distibutions . This is an MOT problem with marginals, each over atoms, with cost function
(7.7) |
By viewing this cost function as a cost tensor in the natural way (i.e., assigning an index to each of the possible values of ), this representation (7.7) shows that the corresponding cost tensor has rank . Moreover, observe that the maximum entry of the cost is . Therefore we may appeal to our polynomial-time MOT algorithms in Corollary 7.5 for costs with constant rank. ∎
The algorithm is readily generalized, e.g., if the investor has different units of a stock, or if a stock is held for a different number of years. The former is modeled simply by adding an extra year in which the return of stock is equal to the number of units, with probability . The latter is modeled simply by setting the return of stock to be for all years after it is held, with probability .
In Figure 7, we provide a numerical illustration comparing our new polynomial-time algorithms for this risk estimation task with the previous fastest algorithms. Previously, the fastest algorithms that apply to this problem are out-of-the-box LP solvers run on MOT, and the brute-force implementation of SINKHORN which marginalizes over all entries in each iteration. Since both of these previous algorithms have exponential runtime that scales as , they do not scale beyond tiny input sizes of and even with two minutes of computation time. In contrast, our new polynomial-time algorithms compute high-quality solutions for problems that are orders-of-magnitude larger. For example, our polynomial-time implementation of SINKHORN takes less than a second to solve an MOT LP with variables.
Details for this numerical experiment: we consider stock over timesteps, where each marginal distribution is uniform on , discretized with . We implement the AMIN and SMIN oracle efficiently by using our above algorithm to exploit the rank-one structure of the cost tensor. In particular, the polynomial approximation we use here to approximate is the degree- Taylor approximation (cf., Lemma 7.7). This lets us run SINKHORN and MWU in polynomial time, as described above. In the numerical experiment, we also implement an approximate version of COLGEN using our polynomial-time implementation of the approximate violation oracle AMIN. Since the algorithms compute an upper bound, lower value is better in the right plot of Figure 7. We observe that MWU yields the loosest approximation for this application, whereas our implementations of SINKHORN and COLGEN produce high-quality approximations, as is evident by comparing to the exact LP solver in the regime that the latter is tractable to run.
![]() |
![]() |
7.4 Application vignette: projection to the transportation polytope
Here we consider the fundamental problem of projecting a joint probability distribution onto the transportation polytope , i.e.,
(7.8) |
We provide the first polynomial-time algorithm for solving this problem in the case where is a distribution that decomposes into a low-rank component plus a sparse component. The low-rank component enables modeling mixtures of product distributions (e.g., mixtures of isotropic Gaussians), which arise frequently in statistics and machine learning; see, e.g., (feldman2008learning, ). In such applications, the number of product distributions in the mixture corresponds to the tensor rank. The sparse component further enables modeling arbitrary corruptions to the distribution in polynomially many entries.
We emphasize that this projection problem (7.8) is not an MOT problem since the objective is quadratic rather than linear. This illustrates the versatility of our algorithmic results. Our algorithm is based on a reduction from quadratic optimization to linear optimization over that is tailored to this problem. Crucial to this reduction is the fact that the MOT algorithms in §4 can compute sparse solutions. In particular, this reduction does not work with SINKHORN because SINKHORN cannot compute sparse solutions.
Corollary 7.11 (Efficient projection to the transportation polytope).
Let , where has constant rank and is polynomially sparse. Suppose that and are . Given in factored form, through its non-zero entries, measures , and accuracy , we can compute in time a feasible that has -suboptimal cost for the projection problem (7.8). This solution is a sparse tensor output through its non-zero entries.
Proof.
We apply the Frank-Wolfe algorithm (a.k.a., Conditional Gradient Descent) to solve (7.8), specifically using approximate LP solutions for the descent direction as in (Jag13, , Algorithm 2). By the known convergence guarantee of this algorithm (Jag13, , Theorem 1.1), if each LP is solved to accuracy, then Frank-Wolfe iterations suffice to obtain an -suboptimal solution to (7.8).
The crux, therefore, is to show that each Frank-Wolfe iteration can be computed efficiently, and that the final solution is sparse. Initialize to be an arbitrary vertex of . Then is feasible and is polynomially sparse (see §2.1). Let denote the -th Frank-Wolfe iterate. Performing the next iteration requires two computations:
-
1.
Approximately solve the following LP to accuracy:
(7.9) -
2.
Update , where is the current stepsize.
For the first iteration , note that the LP (7.9) is an MOT problem with cost which decomposes into a polynomially sparse tensor plus a constant-rank tensor . Therefore the algorithm in Corollary 7.5 can solve the LP (7.9) to additive accuracy in time, and it outputs a solution that is sparse. It follows that can be computed in time and moreover is sparse since it is a convex combination of the similarly sparse tensors and . By repeating this argument identically for iterations, it follows that each iteration takes time, and that each iterate is sparse. ∎
8 Discussion
In this paper, we investigated what structure enables MOT—an LP with variables—to be solved in time. We developed a unified algorithmic framework for MOT by characterizing what “structure” is required to solve MOT in polynomial time by different algorithms in terms of simple variants of the dual feasibility oracle. On one hand, this enabled us to show that ELLIPSOID and MWU solve MOT in polynomial time whenever any algorithm can, whereas SINKHORN requires strictly more structure. And on the other hand, this made the design of polynomial-time algorithms for MOT much simpler, as we illustrated on three general classes of MOT cost structures.
Our results suggest several natural directions for future research. One exciting direction is to identify further tractable classes of MOT cost structures beyond the three studied in this paper, since this may enable new applications of MOT. Our results help guide this search because they make it significantly easier to identify if an MOT problem is polynomial-time solvable (see §1.1.4).
Another important direction is practicality. While the focus of this paper is to characterize when MOT problems can be solved in polynomial time, in practice there is of course a difference between small and large polynomial runtimes. It is therefore a question of practical significance to improve our “proof of concept” polynomial-time algorithms by designing algorithms with smaller polynomial runtimes. Our theoretical results help guide this search for practical algorithms because they make it significantly easier to identify if an MOT problem is polynomial-time solvable in the first place.
In order to develop more practical algorithms, recall that, roughly speaking, our approach for designing MOT algorithms consisted of three parts:
-
•
An “outer loop” algorithm such as ELLIPSOID, MWU, or SINKHORN that solves MOT in polynomial time conditionally on a polynomial-time implementation of a certain bottleneck oracle.
-
•
An “intermediate” algorithm that reduces this bottleneck oracle to polynomial calls of a variant of the dual feasibility oracle.
-
•
An “inner loop” algorithm that solves the relevant variant of the dual feasibility oracle for the structured MOT problem at hand.
Obtaining a smaller polynomial runtime for any of these three parts immediately implies smaller polynomial runtimes for the overall MOT algorithm. Another approach is to design altogether different algorithms that avoid the polynomial blow-up of the runtime that arises from composing these three parts. Understanding how to solve an MOT problem more “directly” in this way is an interesting question.
Acknowledgements.
We are grateful to Jonathan Niles-Weed, Pablo Parrilo, and Philippe Rigollet for insightful conversations; to Frederic Koehler for suggesting a simpler proof of Lemma 3.7; to Ben Edelman and Siddhartha Jayanti who were involved in the brainstorming stages and provided helpful references; and to Karthik Natarajan for references to the random combinatorial optimization literature.Appendix A Deferred proof details
A.1 Proof of Lemma 4.11
Our proof is based on two helper claims.
Claim A.1 (Lemma 3 of (young2001sequential, )).
If returns “null”, then .
Proof.
We prove the contrapositive. Let such that . Then
Since is non-negative and , there must exist a satisfying . ∎
This first claim ensures that if the algorithm returns “infeasible”, then indeed . This proves the first part of the lemma. We now prove a second claim, useful for bounding the running time and the quality of the returned solution when the algorithm does not return “infeasible”.
Claim A.2 (Lemmas 1 and 4 of (young2001sequential, )).
Proof.
Note that initially and . So it suffices to prove that does not increase on each iteration. Indeed, if is returned by MWU_BOTTLENECK, then . Furthermore, let . By the smoothness of the softmax, it holds that
So, in Line 9, increases by , and increases by at most . This proves the claim. ∎
We use this second claim to bound the running time of Step 1. Each iteration of the loop from Line 5 to Line 9 of Algorithm 2, increases the value of
by at least . So after iterations we must have
where we have used Jensen’s inequality to relate and . By the claim, this means that on any iteration , we must have
so the loop must terminate after at most iterations. Since each iteration can increase the number of non-zero entries by at most 1, this also proves the sparsity bound on .
A.2 Proof of Lemma 4.16
It is obvious how the oracle can be implemented via a single call of the oracle; we now show the converse. Specifically, given , we show how to compute a solution for using calls to the oracle with accuracy . As in the proof of Lemma 4.4, we use the first calls to compute the first index of the solution, the next calls to compute the next index , and so on.
Formally, for , let us say that is a -approximate “partial solution” of size if there exists a solution for that satisfies for all . Then it suffices to show that for every , it is possible to compute an -approximate partial solution of size from an -approximate partial solution of size using calls to and polynomial additional time.
Do this by setting to be a minimizer of
(A.1) |
where the vectors are defined as in the proof of Lemma 4.4. The runtime claim is obvious; it suffices to prove correctness. To this end, observe that
Above, the first and fifth steps are by Observation 4.5, the second and fourth steps are by definition of the AMIN oracle, the third step is by construction of , the penultimate step is by simplifying, and the final step is by definition of being an -approximate partial solution of size . We conclude that is an -approximate partial solution of size , as desired.
Appendix B Additional numerical experiments
Here, we provide additional numerics for the generalized Euler flow application in §5.3 in order to demonstrate that similar behavior is observed on other standard benchmark inputs in the literature (Bre08, ). These instances are identical to Figure 10, except with different input permutations between the initial and final positions of the particles. Note that our algorithm COLGEN computes an exact, sparse solution with at most non-zero entries (Theorem 4.6). In contrast, the SINKHORN algorithm of (BenCarCut15, ) computes approximate, fully dense solutions with non-zero entries, which leads to blurry visualizations.
|
|
|
|
||||||||||||||||
SINKHORN | COLGEN |
|
|
|
|
||||||||||||||||
SINKHORN | COLGEN |
References
- [1] S. Agrawal, Y. Ding, A. Saberi, and Y. Ye. Price of correlations in stochastic optimization. Operations Research, 60(1):150–162, 2012.
- [2] M. Agueh and G. Carlier. Barycenters in the Wasserstein space. SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011.
- [3] Y. Akagi, Y. Tanaka, T. Iwata, T. Kurashima, and H. Toda. Probabilistic optimal transport based on collective graphical models. Preprint at arXiv:2006.08866, 2020.
- [4] J. Altschuler, F. Bach, A. Rudi, and J. Niles-Weed. Massively scalable Sinkhorn distances via the Nyström method. In Advances in Neural Information Processing Systems, pages 4429–4439, 2019.
- [5] J. Altschuler, J. Weed, and P. Rigollet. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Advances in Neural Information Processing Systems, 2017.
- [6] J. M. Altschuler and E. Boix-Adserà. Hardness results for Multimarginal Optimal Transport problems. Discrete Optimization, 42:100669, 2021.
- [7] J. M. Altschuler and E. Boix-Adserà. Wasserstein barycenters can be computed in polynomial time in fixed dimension. Journal of Machine Learning Research, 22:44–1, 2021.
- [8] J. M. Altschuler and E. Boix-Adserà. Wasserstein barycenters are NP-hard to compute. SIAM Journal on Mathematics of Data Science, 2022.
- [9] J. M. Altschuler and P. A. Parrilo. Near-linear convergence of the Random Osborne algorithm for Matrix Balancing. Mathematical Programming, pages 1–35, 2022.
- [10] E. Anderes, S. Borgwardt, and J. Miller. Discrete Wasserstein barycenters: Optimal transport for discrete data. Mathematical Methods of Operations Research, 84(2):389–409, 2016.
- [11] F. Bach. Learning with submodular functions: a convex optimization perspective. Foundations and Trends in Machine Learning, 6(2-3):145–373, 2013.
- [12] M. O. Ball. Computational complexity of network reliability analysis: An overview. IEEE Transactions on Reliability, 35(3):230–239, 1986.
- [13] M. O. Ball, C. J. Colbourn, and J. S. Provan. Network reliability. Handbooks in Operations Research and Management Science, 7:673–762, 1995.
- [14] J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré. Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015.
- [15] J.-D. Benamou, G. Carlier, S. Di Marino, and L. Nenna. An entropy minimization approach to second-order variational mean-field games. Mathematical Models and Methods in Applied Sciences, 29(08):1553–1583, 2019.
- [16] J.-D. Benamou, G. Carlier, and L. Nenna. A numerical method to solve multi-marginal optimal transport problems with Coulomb cost. In Splitting Methods in Communication, Imaging, Science, and Engineering, pages 577–601. Springer, 2016.
- [17] J.-D. Benamou, G. Carlier, and L. Nenna. Generalized incompressible flows, multi-marginal transport and Sinkhorn algorithm. Numerische Mathematik, 142(1):33–54, 2019.
- [18] D. Bertsimas and J. N. Tsitsiklis. Introduction to linear optimization, volume 6. Athena Scientific Belmont, MA, 1997.
- [19] J. Blanchet, A. Jambulapati, C. Kent, and A. Sidford. Towards optimal running times for optimal transport. Preprint at arXiv:1810.07717, 2018.
- [20] M. Blondel, V. Seguy, and A. Rolet. Smooth and sparse optimal transport. In International conference on Artificial Intelligence and Statistics, pages 880–889. PMLR, 2018.
- [21] H. L. Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM Journal on computing, 25(6):1305–1317, 1996.
- [22] H. L. Bodlaender. Treewidth: Structure and algorithms. In International Colloquium on Structural Information and Communication Complexity, pages 11–25. Springer, 2007.
- [23] Y. Brenier. The least action principle and the related concept of generalized flows for incompressible perfect fluids. Journal of the American Mathematical Society, 2(2):225–255, 1989.
- [24] Y. Brenier. The dual least action problem for an ideal, incompressible fluid. Archive for Rational Mechanics and Analysis, 122(4):323–351, 1993.
- [25] Y. Brenier. Minimal geodesics on groups of volume-preserving maps and generalized solutions of the Euler equations. Communications on Pure and Applied Mathematics, 52(4):411–452, 1999.
- [26] Y. Brenier. Generalized solutions and hydrostatic approximation of the Euler equations. Physica D: Nonlinear Phenomena, 237(14-17):1982–1988, 2008.
- [27] G. Buttazzo, L. De Pascale, and P. Gori-Giorgi. Optimal-transport formulation of electronic density-functional theory. Physical Review A, 85(6):062502, 2012.
- [28] G. Carlier and I. Ekeland. Matching for teams. Economic Theory, 42(2):397–418, 2010.
- [29] G. Carlier, A. Oberman, and E. Oudet. Numerical methods for matching for teams and Wasserstein barycenters. ESAIM: Mathematical Modelling and Numerical Analysis, 49(6):1621–1642, 2015.
- [30] L. Chen, W. Ma, K. Natarajan, D. Simchi-Levi, and Z. Yan. Distributionally robust linear and discrete optimization with marginals. Operations Research, 2022.
- [31] P.-A. Chiappori, R. J. McCann, and L. P. Nesheim. Hedonic price equilibria, stable matching, and optimal transport: equivalence, topology, and uniqueness. Economic Theory, 42(2):317–354, 2010.
- [32] C. Cotar, G. Friesecke, and C. Klüppelberg. Density functional theory and optimal transportation with Coulomb cost. Communications on Pure and Applied Mathematics, 66(4):548–599, 2013.
- [33] N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy. Optimal transport for domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 2016.
- [34] T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley & Sons, 2012.
- [35] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, 2013.
- [36] M. Cuturi and A. Doucet. Fast computation of Wasserstein barycenters. In International Conference on Machine Learning, 2014.
- [37] M. Dyer, L. A. Goldberg, C. Greenhill, and M. Jerrum. On the relative complexity of approximate counting problems. In International Workshop on Approximation Algorithms for Combinatorial Optimization, pages 108–119. Springer, 2000.
- [38] F. Elvander, I. Haasler, A. Jakobsson, and J. Karlsson. Multi-marginal optimal transport using partial information with applications in robust localization and sensor fusion. Signal Processing, 171:107474, 2020.
- [39] J. Feldman, R. O’Donnell, and R. A. Servedio. Learning mixtures of product distributions over discrete domains. SIAM Journal on Computing, 37(5):1536–1564, 2008.
- [40] S. Friedland. Optimal transport, distance between sets of measures and tensor scaling. Preprint at arXiv:2005.00945, 2020.
- [41] I. Gertsbakh and Y. Shpungin. Network reliability and resilience. Springer Science & Business Media, 2011.
- [42] L. A. Goldberg and M. Jerrum. The complexity of ferromagnetic Ising with local fields. Combinatorics, Probability and Computing, 16(1):43–61, 2007.
- [43] M. Grötschel, L. Lovász, and A. Schrijver. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica, 1(2):169–197, 1981.
- [44] M. Grötschel, L. Lovász, and A. Schrijver. Geometric algorithms and combinatorial optimization, volume 2. Springer Science & Business Media, 2012.
- [45] H. Guo and M. Jerrum. A polynomial-time approximation algorithm for all-terminal network reliability. SIAM Journal on Computing, 48(3):964–978, 2019.
- [46] I. Haasler, A. Ringh, Y. Chen, and J. Karlsson. Estimating ensemble flows on a hidden Markov chain. In IEEE Conference on Decision and Control, pages 1331–1338. IEEE, 2019.
- [47] I. Haasler, A. Ringh, Y. Chen, and J. Karlsson. Multi-marginal optimal transport and Schrödinger bridges on trees. Preprint at arXiv:2004.06909, 2020.
- [48] I. Haasler, A. Ringh, Y. Chen, and J. Karlsson. Scalable computation of dynamic flow problems via multi-marginal graph-structured optimal transport. Preprint at arXiv:2106.14485, 2021.
- [49] I. Haasler, R. Singh, Q. Zhang, J. Karlsson, and Y. Chen. Multi-marginal optimal transport and probabilistic graphical models. IEEE Transactions on Information Theory, 2021.
- [50] W. K. Haneveld. Robustness against dependence in PERT: An application of duality and distributions with known marginals. In Stochastic Programming, pages 153–182. Springer, 1986.
- [51] M. Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In International Conference on Machine Learning, pages 427–435, 2013.
- [52] D. R. Karger. A randomized fully polynomial time approximation scheme for the all-terminal network reliability problem. SIAM Review, 43(3):499–522, 2001.
- [53] L. G. Khachiyan. Polynomial algorithms in linear programming. USSR Computational Mathematics and Mathematical Physics, 20(1):53–72, 1980.
- [54] T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455–500, 2009.
- [55] D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
- [56] J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical society, 7(1):48–50, 1956.
- [57] C. Léonard. A survey of the Schr̈odinger problem and some of its connections with optimal transport. Preprint at arXiv:1308.0215, 2013.
- [58] T. Lin, N. Ho, M. Cuturi, and M. I. Jordan. On the complexity of approximating multimarginal optimal transport. Preprint at arXiv:1910.00152, 2019.
- [59] N. Linial, A. Samorodnitsky, and A. Wigderson. A deterministic strongly polynomial algorithm for matrix scaling and approximate permanents. In Symposium on Theory of Computing, pages 644–652, 1998.
- [60] G. Makarov. Estimates for the distribution function of a sum of two random variables when the marginal distributions are fixed. Theory of Probability and its Applications, 26(4):803–806, 1982.
- [61] I. Meilijson and A. Nádas. Convex majorization with an application to the length of critical paths. Journal of Applied Probability, 16(3):671–677, 1979.
- [62] V. K. Mishra, K. Natarajan, D. Padmanabhan, C.-P. Teo, and X. Li. On theoretical and empirical aspects of marginal distribution choice models. Management Science, 60(6):1511–1531, 2014.
- [63] E. F. Moore and C. E. Shannon. Reliable circuits using less reliable relays. Journal of the Franklin Institute, 262(3):191–208, 1956.
- [64] B. Muzellec, R. Nock, G. Patrini, and F. Nielsen. Tsallis regularized optimal transport and ecological inference. In AAAI Conference on Artificial Intelligence, volume 31, 2017.
- [65] A. Nadas. Probabilistic PERT. IBM Journal of Research and Development, 23(3):339–347, 1979.
- [66] K. Natarajan, M. Song, and C.-P. Teo. Persistency model and its applications in choice modeling. Management Science, 55(3):453–469, 2009.
- [67] L. Nenna. Numerical methods for multi-marginal optimal transportation. PhD thesis, 2016.
- [68] D. Padmanabhan, S. Damla Ahipasaoglu, A. Ramachandra, and K. Natarajan. Extremal probability bounds in combinatorial optimization. Preprint at arXiv:2109.01591, 2021.
- [69] C. H. Papadimitriou and T. Roughgarden. Computing correlated equilibria in multi-player games. Journal of the ACM, 55(3):1–29, 2008.
- [70] G. Peyré and M. Cuturi. Computational optimal transport. Foundations and Trends in Machine Learning, 2017.
- [71] F. Pitié, A. C. Kokaram, and R. Dahyot. Automated colour grading using colour distribution transfer. Computer Vision and Image Understanding, 107(1-2):123–137, 2007.
- [72] J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM Journal on Computing, 12(4):777–788, 1983.
- [73] K. Quanrud. Approximating optimal transport with linear programs. In Symposium on Simplicity in Algorithms, 2018.
- [74] J. Rabin, G. Peyré, J. Delon, and M. Bernot. Wasserstein barycenter and its application to texture mixing. In International Conference on Scale Space and Variational Methods in Computer Vision, pages 435–446. Springer, 2011.
- [75] L. Rüschendorf. Random variables with maximum sums. Advances in Applied Probability, pages 623–632, 1982.
- [76] L. Rüschendorf and L. Uckelmann. On the n-coupling problem. Journal of Multivariate Analysis, 81(2):242–258, 2002.
- [77] R. Singh, I. Haasler, Q. Zhang, J. Karlsson, and Y. Chen. Incremental inference of collective graphical models. IEEE Control Systems Letters, 2020.
- [78] R. Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly, 74(4):402–405, 1967.
- [79] J. Solomon, F. De Goes, G. Peyré, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas. Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics, 34(4):1–11, 2015.
- [80] S. Srivastava, C. Li, and D. B. Dunson. Scalable Bayes via barycenter in Wasserstein space. Journal of Machine Learning Research, 19(1):312–346, 2018.
- [81] M. Stoer and F. Wagner. A simple min-cut algorithm. Journal of the ACM, 44(4):585–591, 1997.
- [82] Y. W. Teh and M. Welling. The unified propagation and scaling algorithm. In Advances in Neural Information Processing Systems, pages 953–960, 2002.
- [83] L. N. Trefethen. Approximation theory and approximation practice, volume 164. SIAM, 2019.
- [84] N. Tupitsa, P. Dvurechensky, A. Gasnikov, and C. A. Uribe. Multimarginal optimal transport by accelerated alternating minimization. In 2020 IEEE Conference on Decision and Control, pages 6132–6137. IEEE, 2020.
- [85] L. G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410–421, 1979.
- [86] C. Villani. Topics in optimal transportation. Number 58. American Mathematical Society, 2003.
- [87] M. J. Wainwright and M. I. Jordan. Variational inference in graphical models: The view from the marginal polytope. In Allerton Conference on Communication, Control, and Computing, volume 41, pages 961–971, 2003.
- [88] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Now Publishers Inc, 2008.
- [89] G. Weiss. Stochastic bounds on distributions of optimal value functions with applications to PERT, network flows and reliability. Operations Research, 34(4):595–605, 1986.
- [90] A. G. Wilson. The use of entropy maximising models, in the theory of trip distribution, mode split and route split. Journal of Transport Economics and Policy, pages 108–126, 1969.
- [91] N. E. Young. Sequential and parallel algorithms for mixed packing and covering. In Symposium on Foundations of Computer Science, pages 538–546. IEEE, 2001.
- [92] D. B. Yudin and A. S. Nemirovskii. Informational complexity and efficient methods for the solution of convex extremal problems. Matekon, 13(2):22–45, 1976.
- [93] E. Zemel. Polynomial algorithms for estimating network reliability. Networks, 12(4):439–452, 1982.