k-Sample inference
via Multimarginal Optimal Transport
Abstract
This paper proposes a Multimarginal Optimal Transport () approach for simultaneously comparing measures supported on finite subsets of , . We derive asymptotic distributions of the optimal value of the empirical program under the null hypothesis that all measures are same, and the alternative hypothesis that at least two measures are different. We use these results to construct the test of the null hypothesis and provide consistency and power guarantees of this -sample test. We consistently estimate asymptotic distributions using bootstrap, and propose a low complexity linear program to approximate the test cut-off. We demonstrate the advantages of our approach on synthetic and real datasets, including the real data on cancers in the United States in 2004 - 2020.
keywords:
[class=MSC]keywords:
1 Introduction
The k-sample inference concerns with simultaneously comparing several probability measures. The classical question of this inference is to determine whether groups of observed data points have the same underlying probability distribution, i.e. to test
(1) | ||||
This testing problem has a long history in statistics, with classical rank-based tests for univariate data [15, 49, 67, 87] to recent extension [20] using multivariate ranks [13, 38], to graph [56], distance [64] and kernel based [68, 50] methods. Direct applications of testing hypotheses in (1) include simultaneously comparing gene expression profiles to assess presence of disease [40, 84], assessing differences in chronic disease levels based on quality of life [11, 76], analyzing associations between exercise and morphology of an animal [88], and comparing distributions of agents’ outcomes in reinforcement learning [61]. Moreover, the test of (1) is frequently viewed as a non-parametric version of ANOVA [12, 64] with myriad of scientific applications, typically comparing treatment outcomes between multiple groups. e.g. in clinical trials [14] and cancer studies [43, 86]. Table 1 outlines additional instances of scientific applications for -sample inference when measures of interest have finite support, which is the case considered in this paper.
This paper proposes an Optimal Transport approach to -sample inference for probability measures with finite supports in , . The method provides a powerful -sample test of (1), but also allows comparison between different collections of measures in terms of their within-collection variability. Optimal Transport based approach has been shown successful in one-sample (goodness-of-fit) and two-sample problems on finite [6, 51, 72], countable [75], semidiscrete [39], and some of the continuous spaces [57, 46]. The test statistics employ -Wasserstein distances (or their regularized variants) to quantify differences between measures of interest while respecting metric structure of their supports [60].
Our test statistic employs a different functional - the Multimarginal Optimal Transport program [62] - which can be represented as a variance functional on the space of measures [10] and thus serves as a natural candidate for testing variability in a collection of measures. We show that despite well-documented differences in solution structures of and problems (Section 1.7.4 of [66]), shares the same benefits as when it comes to the limiting behavior of its optimal value.
Using for -sample inference brings several important advantages. The main advantage is that the asymptotic distributions of can be derived under both and . To the best of our knowledge, the only multivariate -sample test statistic with known distribution is the one of [47], where the limit laws are known only for a specific subset of alternatives. Our laws cover all alternatives in (1), which allows to explicitly derive a power function of the test and establish novel consistency results. The consistency analysis techniques developed in this paper can be further applied to one- and two-sample tests based on asymptotic results in [46, 72, 75].
Another benefit of limit under is the ability to estimate functionals of the distribution, e.g. Confidence Regions for the value. This allows for a novel application of comparing several collections of measures using overlap between their Confidence Regions (see Figure 8 for a concrete example). The procedure can be viewed as a distributional analogue of multiple comparisons (see [45] for a review), which are performed in a space of measures rather than Euclidean space. To the best of our knowledge, this type of analysis is not available with other -sample statistics considered in the literature.
Conceptually, our approach to -sample inference is equivalent to viewing measures as points and considering variability within their collection. It follows a general framework of Optimal Transport based distribution comparison: distributions are viewed as points in a Wasserstein space with Wasserstein distance indicating their closeness [60], leading to development of distributional analogues of traditional methods such as regression and time series [85, 89, 33], synthetic controls [36], and clustering [77]. Below we provide a formulation of our approach (Sections 1.1, 1.2) and summarize our contributions (Section 1.3).
1.1 Multimarginal Optimal Transport (MOT) for k-sample inference
Let be Borel probability measures supported on , . The Multimarginal Optimal Transport () problem (equation (4.3) of [1]) is the optimization problem
(2) |
where is the set of Borel probability measures on the product space with marginals . Different choices for the cost function are possible; throughout the paper, we fix the choice to be
(3) |
Under this choice, the problem is equivalent (Proposition 3.1.2 of [60] to the Wasserstein barycenter problem (equation 2.2 of [1])
(4) |
where denotes 2-Wasserstein distance. By equivalence here we mean that the optimal values of both programs are equal, and the optimal solutions of (2) and of (4) are related by where is the map that averages a given -tuple of points from the supports of the measures ( stands for pushforward of a measure by the map ). We remark here that when the measures are discrete, the barycenter problem (4) generally has more than one optimal solution . This presents challenges for statistical inference concerning barycenter solutions [52], but does not impede the analysis of the optimal value of barycenter or problems (recalling that all optimal solutions result in the same optimal value).
Let denote the optimal value of the program (2). Observe that if and only if the measures are all the same. Indeed, if is (any) optimal solution to the barycenter problem (4), then is equivalent to zero optimal value in the barycenter program (4), i.e. for all . Due to metric properties of -Wasserstein distance (Theorem 7.3 of [80]), this is equivalent to all the measures being the same (and equal to ).
This observation suggests that testing in (1) can be addressed via testing for . To this end, suppose that the data on samples of sizes , respectively, is available to estimate the underlying measures by the empirical measures . To test for based on the data, we consider the asymptotic distribution of the empirical estimator under and reject when the estimator value is large.
More generally, once the asymptotic distribution of empirical is known, one can estimate various functionals of , such as Confidence Regions (CR’s) for a true value either or in (1). Inference of this type requires knowledge of the asymptotic distribution of empirical version of the value in (2). Our derivation of these distributions leverages rich literature on asymptotic theory for the Wasserstein distance, whose main results we briefly review below.
Variable(s) of interest | Support of measures | Ref. |
---|---|---|
Age of patients (in years) | [25] | |
Tumor size (in mm) | [73] | |
Number of positive lymph nodes | [31] | |
Joint distributions of the above variables | finite subsets of or | [34] |
Cell counts | for sites | [16] |
Demand over locations | points (longitude, latitude) | [3] |
Disease rates over locations | points on the map in | [48] |
Pixel/voxel intensity in microscopy images | the grid in or | [78] |
1.2 Existing results on weak limits for Optimal Transport
The squared 2-Wasserstein distance is the optimal value of the problem
(5) |
with , which can be viewed as a particular case of the problem (2) with measures. Being a true metric on a space of probability measures on a given metric space ([81]), the 2-Wasserstein distance (and, more generally, the -Wasserstein distance ) provides a natural way to compare probability measures while respecting the geometry of the supporting metric space. Under this framework, the true measures are estimated by their empirical counterparts, and statistical inference is conducted using limiting laws for the empirical Wasserstein distance [59, 63].
The forms of the weak limits depend on two main factors: dimensionality of the support and the nature of the measures (where the cases and may have different limits). Letting denote the optimal value in (5) (with possibly different costs ), the limiting laws have general form of
(6) |
with (when only is estimated from the data while is not, the “one-sample” version with is considered). When measures are supported on , the limits can be Gaussian, with variance that depending on the truth under the “alternative” assumption [57, 22], and are non-Gaussian under the “null” assumption [22, 21]. When measures are supported on , and are absolutely continuous, the curse of dimensionality takes place: the empirical Wasserstein distance converges in expectation to to the true one too slowly [26, 30]. It is still possible, however, to obtain convergence statement similar to (6) in any dimension by replacing the centering true value with expectation of the empirical value [23]. The limit is Gaussian when and is degenerate (i.e. limiting random variable has zero variance) when .
Favorable situation arises when measures are supported on a finite space : [72] show that the limit law of the form (6) hold for the distance in any dimension and use resulting laws to construct statistical inference under and (the case of countable support is treated in [75]). The laws under either or are non-degenerate and given by
(7) |
where are the weak limits of multinomial processes and , and is set of optimal solutions to the dual of the program (5). The results are extended in [46] to general measures supported on , and general costs (with discussion of limitations in higher dimensions ), thus providing a unified approach to weak limits of empirical OT costs centered by the true population value.
The starting point for theoretical results of this paper is the weak limit (7) [72]. Inspired by this result, we establish the limits of the form (6), where is now the optimal value of the program (2) with measures supported on a finite space for any . The implications of these results to -sample inference and further theoretical findings related to our limits are summarized below.
1.3 Summary of contributions and outline
Asymptotic distributions of : We provide asymptotic distribution of cost on finite spaces by establishing Hadamard directional differentiability of the functional and combining it with functional Delta method [9, 29, 46, 70, 69, 65, 72, 75]. The resulting limit is a Hadamard directional derivative of at the true in the direction of the limit of the empirical process for suitably defined rate (Theorem 2.2(a)). For measures, our limit recovers the one for the Wasserstein distance on finite spaces obtained in [72]; for measures, our limit allows to construct novel inference procedures for the -sample problem using (Section 2.2). We specify the structure of the limit under the assumption of and construct a low complexity stochastic upper bound on the null distribution (Theorem 2.2(b)) that is used to efficiently approximate the limit under (Section 3.2). The bound is tight for . We specify the structure of the limit under and provide sufficient conditions for the limit to be Gaussian by leveraging the results from geometry of multitransportation polytopes from [28]. When the limits are not Gaussian, we construct the Normal lower bounds on the alternative limiting distribution (Theorem 2.2(c)). Our stochastic bounds on the null and alternative distributions provide an analytically tractable way to analyze the power of the Optimal Transport based tests (Section 2.4), which, to the best of our knowledge, was not yet considered in the literature.
Consistency and power: We provide a novel power analysis for Optimal Transport based tests of hypotheses (1) that encompasses both our test (12) and the two-sample test in [72] and can potentially be applied to tests based on limiting laws in [75] and [46]. We show consistency of the test under fixed alternatives (Proposition 2.8), as well as uniform consistency in a certain broad class of alternatives (Theorem 2.9 and Proposition 2.11). We illustrate theoretical power results in case by providing a lower bound on the power function that explicitly relates sample size and the effect size (Corollary 2.10, Figure 2). We also quantify how the population version of our statistic changes with number of measures for certain sequences of alternatives (Lemmas 2.13 and 2.14), suggesting potential power advantages in these cases. For the case of small sample sizes, we provide a permutation version of the based test (Section 3.3). Comparison with with state-of-the-art tests of [47, 50, 64] shows strong finite sample power performance of our tests (Figure 5).
Computational complexity results: Leveraging recent complexity result for /barycenter program [2], we prove polynomial time complexity of the derivative bootstrap that consistently estimates asymptotic distribution of under (Lemma 3.2); polynomial complexity of m-out-of-n bootstrap and permutation procedure follow directly from [2] (Table 2). We demonstrate that the null upper bound of Theorem 2.2(b) can efficiently approximate the null distribution when the cardinality of the support is large (Figure 4).
Applications to real data and software: We illustrate performance of based -sample inference on two synthetic datasets showing strong power performance when testing and the ability to produce meaningful and interpretable confidence regions under (Section 4.1). Further, we apply our methodology to real data on cancers in the United States populations to confirm two claims in cancer studies that were previously shown using different methodologies (Section 4.2, Figures 7 and 8). Current version of the software that implements our methods is available at https://github.com/kravtsova2/mot.
2 k-Sample inference on finite spaces using
2.1 Notation and preliminary definitions
Denote the vector of true measures supported on by
and the vector of the empirical counterparts by
with sample sizes
where to be interpreted as each sample size tending to infinity.
Let be the optimal value of the program (2), which on the finite space becomes the finite-dimensional linear program
(8) | ||||
The optimization variable is a column vector representing joint probability distribution with marginals (frequently called multicoupling), a matrix encodes the constraints for to be a multicoupling (i.e. that summing certain entries of gives the marginals ), and a cost column vector contains Euclidean distances between measure support points to their averages given by (3)111The linear program formulation of MOT problem is discussed, for example, on p.3 of [54]..
For reader’s convenience, Example 1 below illustrates the structure of the linear program (8) in a case of three measures, each supported on two points in :
Example 1 (Illustration of optimization problem for three measures).
Consider the finite set which could represent, for instance, tumor sizes (in centimeters) of cancer patients, and consider probability measures supported on representing the probabilities of occurrences of and tumors in given population of cancer patients. Suppose that the measure has probabilities recorded in a vector , and, similarly, , . The multimarginal optimal transport problem (8) is to minimize a linear (with coefficients in ) function of a measure on the product space whose marginals are , and , respectively (in this example, all of these sets are equal to ). Technically, is an order- tensor, i.e. an array with indices with values in , but for notational convenience we represent it by a long vector . The cost in the objective of (8) associated with the entry is the average of squared differences (or squared norms of the differences in higher-dimensional case) between support points to their mean , i.e.
The objective is to minimize the total discrepancy weighted by , which is given by
The multicoupling is subjected to having non-negative entries and constrained linearly with . The constraint matrix is responsible for making sure that appropriate entries of sum to the given marginals , , and , i.e.
(9) |
finishing the example.
The dual program of (8) is given by
(10) | ||||
(the derivation of the dual follows from the standard theory of linear programming, e.g. Section 4.1 of [5]). A column vector contains dual variables, one for each measure, and the objective of (10) can be thought of summing the contributions .
Let denote the set of dual optimal solutions to (10). This set consists of all vectors that result in the maximum value of the dual objective (= minimum value of the primal objective by strong duality, e.g. Theorem 4.4 of [5]) and satisfy the dual constraints, i.e.
(11) |
We consider the asymptotic behavior of scaled and centered empirical estimator by establishing the weak limit
as where under and under . The set will be needed to define the limit .
2.2 Definitions of testing and inference procedures
Consider the statistic
where under and under .
An -level test of would reject if is large, i.e. if exceeds a th quantile of its null distribution . However, as Theorem 2.2 shows, depends on the unknown true , and hence care must be taken to ensure that the estimated cut-off used for the test still results in the (asymptotic) level .
To this end, we consider a consistent bootstrap estimator of given in Proposition 3.1 and denote its -th quantile by . Consistency of the bootstrap is shown using results of [29], and by Corollary 3.2 of the same work such bootstrap based cut-off gives an asymptotic level test of . Using this cut-off, we define the asymptotic test of as a map
(12) |
Similarly, the distribution of under is consistency estimated by bootstrap in Proposition 3.1, with resulting -th and -th quantiles denoted by and , respectively. The asymptotic Confidence Region for under is given by
(13) |
2.3 Asymptotic distributions of under and

Theorem 2.2(a) provides the general form of the asymptotic distribution of on finite spaces. This distribution is given by the Hadamard directional derivative of the functional, which is an optimal value of a linear program with a feasible set consisting of dual optimal solutions (11). If is a singleton, the limit in Theorem 2.2(a) is a linear combination of Gaussians, and hence is also Gaussian. If not, the limit is the maximum (taken over the feasible set ) of such linear combinations.
By the theory of linear programming, it is possible to assess whether the set of dual optimal solutions is a singleton or not based on the corresponding set of basic optimal solutions to the primal program (we use Theorem 5.6.1 of [71], and Chapters 4 and 5 of [5] for the general linear programming results employed below). In our case, the basic optimal solutions to the primal program (8) are the vertices of a multitransportation polytope given by multicouplings . These vertices contain positive entries, and a vertex is termed degenerate if it contains strictly less (p. 366 of [28]).
The dual optimal set cannot be a singleton if an optimal solution to the primal program is unique and degenerate. This is always the case under : the unique optimal solution is given by the “identity” multicoupling (with in the entries with the same tuple indices and zeros otherwise - so it is degenerate). Hence, the asymptotic distribution of under is never a Gaussian (Theorem 2.2(b)).
The dual optimal set is a singleton if there exists a non-degenerate primal optimal vertex, i.e. an optimal multicoupling with positive entries. This can be possible under certain ’s; in particular, this is possible if a multitransportation polytope contains no degenerate vertices. In this case, is a singleton, and the corresponding asymptotic distribution of is Gaussian (Theorem 2.2(c)). We use a result in discrete geometry from [28] to provide a sufficient condition (A1) that leads to Gaussian limits under :
Condition (A1).
Remark 2.1.
For measures, the condition (A1) is implied by the following No Subset Sum condition that ensures that a transportation polytope has no degenerate vertices222This condition is mentioned by [74] for uniqueness of Kantorovich potentials for finitely supported measures.: there is no proper subsets of indices such that . This condition is both necessary and sufficient to exclude degenerate vertices in case (see, e.g., Theorem 1.2 in Chapter 6 of [28]).
Theorem 2.2 (Asymptotic distribution of on finite spaces).
Assume that the sizes of samples satisfying . Denote and . Then,
- (a)
-
(b)
Furthermore, there exists on the same probability space as such that everywhere, and is given by
(16) where is a subset of constraints from (15).
-
(c)
Under ,
where is given by (14). Furthermore, for every given by (11), there exists a random variable on the same probability space as , such that everywhere, and
(17) If Condition (A1) holds, then is a singleton , and .
Proof summary for Theorem 2.2.
Proof of part (a) is outlined below, with details in Appendix A.1. In what follows, for each , we view the measures , and the dual vectors . The weak convergence and Hadamard directional differentiability are with respect to norm on .
-
Step 1 Establish, for a suitable scaling , the weak limit of the empirical process
-
Step 2 Confirm that the functional is Hadamard directionally differentiable at with derivative
Proof of part (b) is given in Appendix A.2. It provides the exact form of the proposed upper bound reporting the constraints in . To construct , we consider how the inequality constraints behave on the kernel of the map . Resulting upper bound has only polynomially many constraints and can be sampled efficiently to approximate the null distribution in Section 3.2. We remark that the proposed bound is not unique: in particular, it can be strengthened by including more constraints from (Section 5.2). The proposed bound is tight when 333We remark here that the null distribution program (15) can be written with dual variables instead of due to the constraint ; this is what [72] refer to in case (p. 227). We choose to keep the form (15) for notational convenience.
For part (c), if the limit is not Gaussian (i.e., the feasible set is not a singleton), one can take any and consider the (random) objective (14) evaluated at . Resulting value lower bounds the value of the maximization the program (14), and it is distributed according to (17).
∎
Observation 2.3.
The entries of the cost vector indexed by with coinciding index values can be written in terms of the distance between two points with unique indices scaled by . For example, . The details are provided in Appendix A.3.
Lemma 2.4 (Bounds on the dual variables).
Fix . Let be optimal solutions to the dual program (10) satisfying and chosen444Recall that adding a constant to any dual variable and subtracting the same constant from any other dual variable does not change the dual objective and does not violate the dual constraints in (10). Hence, given any vector of dual solutions , the first entries of can be normalized to zero, and the constraint would force the first entry of to be zero as well. Such normalization is frequently done to avoid redundant solutions - see, e.g., the definition of dual transportation polyhedron in [4]. such that the first entries . Then for each , the th entry of is bounded as
where is the squared distance on the ground metric space . It follows that
where depends only on the ground metric space .
Remark 2.5.
The assumption holds for those for which the primal optimal solution - the multicoupling - assigns a positive mass to the “diagonal” tuples , i.e. for . By complementary slackness result in linear programming (see, e.g., Theorem 4.5 of [5]), in this case, the corresponding constraints of the dual
hold with equality, giving . This always holds under and frequently happens under .
Using the above results, Proposition 2.6 defines an upper bound on all test cut-offs , which is independent of the nature and the number of measures in . This bound is used to prove consistency of the test (12) with cut-offs uniformly over (Theorem 2.9) and (Proposition 2.11).
Proposition 2.6 (Bound for test cut-off ).
Fix the test level . There exists depending only on the ground metric space such that
where . In particular, for any ,
For equal sample sizes , this gives
for all .
Proof.
For any given , consider the null distribution given by linear program (15) and bound its objective (everywhere) as
Thus the optimal value of (15) is also bounded by the same quantity.
Desired cut-off is obtained using the cut-off for the distribution of . To define , we use the following concentration result from [53]:
Concentration of (Equation (3.5) from [53]) For a centered Gaussian random variable , given any ,
(18) |
Recalling that with where , we get that
Hence, , and choosing gives . Note that the result holds for any .
We let which ensures . Thus,
which implies .
∎
2.4 Consistency and power

We start by showing the basic requirement for tests comparing measures - consistency under any fixed alternative - by showing that the power tends to with increasing sample sizes.
The power of the test (12) is
(19) | ||||
Remark 2.7.
Proposition 2.8 below proves consistency under fixed alternatives for any . The proof utilizes a Normal Lower Bound guaranteed by Theorem 2.2(c) to lower bound the power of the test.
Proposition 2.8 (Consistency under fixed alternatives, measures).
Under any given alternative , , the test in (12) satisfies
Proof.
Given an alternative , consider a set of dual optimal solutions (11) to . Choose any (treated fixed after the choice), and consider the Normal Lower Bound guaranteed by Theorem 2.2(c). Using (19), we have that
Since while the test’s cut-off and the true value do not change with , we get that , and hence the Gaussian random variable exceeds it with probability tending to as . ∎
Next, we prove uniform consistency of the test (12) over a broad class of alternatives. We start with a case of measures in Theorem 2.9, which proves uniform consistency of the test proposed by [72]. We then move to general ( is allowed to change) in Proposition 2.11, concluding uniform consistency of tests of this type.
Our results are proved under the following assumption (B1) below, which is discussed in Remark 2.5. This assumption is expected to hold when measures are not too far from each other, and the power without this assumption is expected to be higher. Removing (B1) poses the difficulty of bounding dual solutions uniformly over alternative polytopes ; we use such bound to control alternative variances. Under (B1), the condition allows to bound (and hence alternative variances) explicitly and uniformly over .
Assumption (B1).
There exist dual solutions to satisfying .
For any fixed metric space with points and any , define the class of alternatives
Theorem 2.9 (Uniform consistency, measures).
Proof.
Fix test level . The goal is to show that, independently of the nature of the alternative , the probability that the test rejects given by
tends to as (the factor of is due to for ).
Assume for simplicity of notation that the sample sizes are equal, i.e. 666The proof for unequal samples sizes would be similar by considering .. Note first that the null cut-offs can be bounded above uniformly over the class using Proposition 2.6, which with gives the bound
Hence,
This expression represents the “worst” (over ) value that any given must exceed to give the test a power.
Next we show that any will exceed this bound with probability tending to as . To this end, let , and consider the corresponding alternative distribution of . Consider the dual solutions satisfying Assumption (B1) and the corresponding Normal Lower Bound
guaranteed by Theorem 2.2(c). Note that its variance
where denotes the largest eigenvalue of a matrix argument.
Using Theorem 1 of [8], the eigenvalues of are upper bounded by entries of and as and , with the uniform upper bound of for all instances 777In fact, the bound hold for any and is not restricted to the class - see [8].. Hence,
providing a uniform upper bound on the eigenvalue part. Further, by Lemma 2.4. Thus, letting
(20) |
uniformly bounds the variances for all ’s chosen as above for ’s arising from .
The final step is to combine the above uniform bounding arguments to get the power . Note that for large enough , , and also that as . Hence, we have that, for any , for large enough depending only on and but not the nature of ,
(21) | ||||
which tends to with as . This gives the uniform lower bound on the power over proving the uniform consistency of the test over this broad class of alternatives. ∎
Using Theorem 2.9, one can provide a practical lower bound on the power as a function of the sample size and/or the effect size when measures are supported on a known metric space . Note that the dual bound is rather conservative; it can be replaced by a bound on computed for the polytope in every specific case. To find these bounds, one could solve linear programs for each entry of to estimate magnitudes of the dual variables over a given polytope. Denoting the resulting bound , we have
Corollary 2.10 (Lower bound on the power of the two-sample test).
Using techniques similar to the proof of Theorem 2.9, it is possible to prove uniform consistency of the test (12) for alternatives in the class
defined for any and any fixed under the Assumption (B1). This gives consistency of the test (12) uniformly over alternatives with measures:
Proposition 2.11 (Uniform consistency in the class ).
The test in (12) satisfies
Proof.
Similarly to the proof of Theorem 2.9, the null cut-offs are uniformly bounded using Proposition 2.6 as . Recall that (taking equal sample sizes for simplicity) for any , , and hence for large enough to ensure this holds for all .
Similarly to the proof of Theorem 2.9, each alternative random variable arising from has a Normal Lower Bound
with the variance bounded above by
(22) |
which decreases with . Hence, it is bounded uniformly in by from the case (equation (20)), and the rest of the argument carries in exactly the same way as in the proof of Theorem 2.9. ∎
Remark 2.12 (Large and connection with [50]).
In practice, the upper bound on the number of measures in cannot be too large. This is due to the requirement , which forces very large sample sizes that may not be practically plausible. While this limitation is natural for asymptotic -sample tests that work with fixed (as discussed, e.g., in [84]), recent results of [50] show that permutation approach for certain test statistics allows for growing and simultaneously. More precisely, in the class of alternatives where only a few measures differ from the rest of the collection, the permutation kernel based test is uniformly powerful if the population version of the test statistic sufficiently exceeds . Below we discuss sequences of alternatives whose population value does not decrease with , and hence the test statistic is expected to perform well in a permutation procedure. We leave a theoretical power analysis concerning permutation test for future work.
The “clustered” alternatives are collections that separate into two groups (or “clusters”), with measures in each cluster that are all the same. Such situation might arise, for example, if an applied treatment causes different types of responses. For instance, for , define
(see Figure 5B for illustration). The classes with are defined analogously (in each case, is assumed to be divisible by ).
Lemma 2.13 (MOT values for “clustered” alternatives).
For , we have . More generally, for , , where is a collection consisting of one measure from each cluster.
Proof is provided in Appendix A.5. Note that true values for “clustered” alternatives do not decrease with increasing number of measures and thus may serve as a suitable test statistics in permutation tests against alternatives in .
Finally, we comment on the values in a “sparse” alternative class when only one measure is different from the rest (alternatives of this type are considered in both [50] and [84]:
While values do decrease with in this sequences of alternatives, we can state precisely how the rate of this decrease is controlled (proved in Appendix A.6):
Lemma 2.14 (MOT values for “sparse” alternatives).
For , .
3 Sampling from null and alternative distributions


3.1 Bootstrap: -out-of- and derivative
We recall that the limiting laws in Theorem 2.2 depend on the true measures , similarly to the -sample cases considered in [72] and [46]. More precisely, the laws are of the form
where is the map with Hadamard directional derivative at in the direction of and . The classical bootstrap estimator of in a sense of [27] would be constructed by sampling from the conditional (given the data) law of , where is obtained by taking samples from the vector of empirical measures . By Theorem 3.1 of [29], this estimator is not consistent when is non-Gaussian, which is always the case under and frequently under .
In place of inconsistent classical bootstrap, [29] proposes a consistent bootstrap procedure to estimate the law of . The approach of [29] is to ensure consistency of an estimator of the map uniformly in the argument , assuming that the law of the argument is estimated by (some) consistent bootstrap scheme. Two different choices for then lead to bootstrap schemes frequently termed m-out-of-n and derivative bootstrap methods, respectively (see Section 1 of [29] on historical notes on these methods).
The work of [72] outlines the consistency results for these two schemes in -sample cases. For completion, we describe these schemes in the general case of (proved in Appendix A.7):
Proposition 3.1 (Consistency of bootstrap from [29]).
The results of parts (a) and (b) concern with two estimators of the map .
-
(a)
given by
composed with the estimator of given by
results in a consistent bootstrap estimator of under both and . Here, is obtained by resampling m out of n observations from with , and such that .
-
(b)
given by
composed with the estimator of given by
where each is results in a consistent bootstrap estimator of under .
Note: This estimator is frequently termed the derivative bootstrap estimator of and is considered in [72] for the Wasserstein distance map .
Pseudocodes 1 and 3 describe sampling from the limiting laws of under and using bootstrap schemes in Proposition 3.1.
Pseudocode 1 (m-out-of-n bootstrap to obtain one sample from or limiting law).
Given the data ,
-
1.
Let , .
-
2.
For each ,
sample under , or
sample under . -
3.
Compute by solving the program (8).
-
4.
Report under or
under ,
where .
Pseudocode 2 (Derivative bootstrap to obtain one sample from limiting law).
Given the data ,
-
1.
Sample .
-
2.
For each ,
sample .
Let , where . -
3.
Solve the program (15) with in place of .
3.1.1 Computational complexity of bootstrap
For the m-out-of-n bootstrap, computation in Step 3 requires solving the primal program (8), which is a linear program with variables, i.e. exponentially many in terms of the cardinality of the support space . By strong duality, the optimal value of primal MOT program is the same as that of the dual MOT (10), which is a linear program
with polynomially many variables but exponentially many constraints. It is well-known that a linear program with exponentially many constraints can be proved polynomial time solvable via ellipsoid method provided its feasible set has a polynomial time computable separation oracle (see, e.g. Section 8.5 of [5]). Such oracle is a procedure which accepts a proposal point and either confirms that , or outputs a violated constraint. Polynomial separation oracle is found for the dual problem in [2] (Proposition 12), resulting in polynomial time algorithm to solve problem with quadratic cost (3) (Theorem 2 of [2])888Besides the optimal value which agrees for the primal and the dual, [2] are also interested in the primal vertex solution. For that reason, they also discuss how to get primal solution in polynomial time in their Proposition 11..
For the derivative bootstrap, computation in Step 3 requires solving program (15), which similar to the dual MOT linear program and is given by
where is the matrix of the linear map and is a realization of (the nature of the coefficient vector does not affect the complexity). Note that there are only polynomially many constraints in (namely, of them); hence, given a polynomial separation oracle for , the rest of the constraints in can be checked in polynomial time, giving the following theoretical complexity for result for the derivative bootstrap for :
Lemma 3.2 (Polynomial complexity of derivative bootstrap).
The derivative bootstrap linear program (Step 3 in Pseudocode 3) has computational complexity , where is an upper bound on the bits of precision used to represent the coefficient vector .
Proof details missing from the above discussion are provided in Appendix A.8.
3.2 Fast approximation of the null distribution by
A fast alternative to the bootstrap sampling from the null distribution is to utilize the lower bound on the null random variable provided by equation (16) in Theorem 2.2(b). As the proof of the theorem shows, a stochastic upper bound can be constructed to have constraints in place of constraints in by exploiting a constraint structure under (see Appendix A.2 for details). Note that, with only quadratically many constraints, the linear program for with any realization of the coefficient vector can be solved fast by modern linear program solvers.
Sampling from can be viewed as obtaining an upper bound on the derivative bootstrap sampling distribution of , via the following algorithm:
Pseudocode 3 (Sampling ).
Given the data ,
-
1.
Sample .
-
2.
For each ,
sample .
Let , where . -
3.
Solve the program (16) with in place of .
Computational complexity of sampling is included in Table 2. The performance of for testing on all real datasets considered in the paper is illustrated in Figure 4. Note that the low computational complexity of allows to approximate the distribution on large datasets within a few minutes on a standard laptop.
3.3 Permutation approach
An alternative to the asymptotic test (12) is a permutation test. The permutation approach in k-sample testing is frequently used when the asymptotic distribution is difficult to sample from due to, for example, infinite number of parameters and/or difficulties of their estimation (cases of [64], [47], and [50]). Moreover, permutation procedures are applicable when the sample sizes are small (and hence the asymptotic distribution may not be valid), giving exact level permutation tests (Section 15.2 of [27]).
Permutation test accepts a set of data points with group labels, and randomly permutes the labels to compute test statistic of interest on the permuted data to compare with the original one. The number of random permutations is usually taken to be between and out of total possible large number of permutations (p. 158 of [17]).
The permutation test is described in Pseudocode 4.
Pseudocode 4 (MOT based permutation test).
Given the data :
-
1.
Compute .
-
2.
Convert to a matrix of support points, where each support point belongs to the th group, , and is repeated according to the counts in . Collect group labels in the vector .
-
3.
For each , sample random permutation , permute support points according to , and construct measures based on the frequencies of support points in new groups. Compute permuted test statistic by solving the program (8).
-
4.
Compute approximate p-value (p. 158 of [17]) as
Computation of permuted test statistic in Step 3 requires to solve the program (8), similarly to the case of m-out-of-n bootstrap in Pseudocode 1. Hence, both algorithms have the same complexity, as shown in Table 2. Empirical performance of the permutation test of Pseudocode 4 is illustrated in Figure 5.
distribution to sample | permut. or (m-out-of-n) | (deriv.) | |
---|---|---|---|
hypothesis | null, alternative | null | null |
optimization program | equation (8) | equation (15) | equation (16) |
variables | |||
equality constraints | none | ||
inequality constraints | none | ||
theoretical complexity | |||
reference | Theorem 2 of [2] | Lemma 3.2 here | Theorem 6 of [2] |
algorithm (theory) | AB-A [2] | AB-A [2] | Ellipsoid |
algorithm (practice) | AB-A [2] | AB-A [2] | |
Simplex | Simplex | Simplex | |
Interior point | Interior point | Interior point | |
software | Github for [2] (d=2) | ||
GUROBI [37] | GUROBI [37] | GUROBI [37] | |
RSymphony [41] | RSymphony [41] | RSymphony [41] |
4 Applications
Sections 4.1.1 and 4.1.2 illustrate basic properties of based inference on synthetic datasets with measure supports on finite subsets of , . The structures of these datasets emulate potential issues in the real data settings while providing convenient models to demonstrate the advantages of based procedures over existing methods (Figures 5 and 6).
Sections 4.2.1 and 4.2.2 illustrate how based inference can be used in real biomedical settings where measures of interest are naturally finitely supported on a given metric space. We use Surveillance, Epidemiology, and End Results (SEER) (https://seer.cancer.gov), a large database on cancers in the United States routinely used in biomedical literature. Detailed description of the used data including the information on the sample sizes is provided in the Supplementary Material (Kravtsova (2024)).
4.1 Illustrations on synthetic data
4.1.1 3D Experiment dataset: testing
We construct the dataset 3D Experiment which aims to emulate experimental settings of counting the number of induced cells in response to a treatment. The model organism frequently used in such experiments is the nematode worm C. elegans. The goal of the experiment is to determine whether certain genetic modification interrupts with a normal organ development, resulting in abnormal cell behavior observed in diseases [16, 83].
The abnormality is measured by the number of induced cells that emerge after genetic modification. There could be induced cells in each worm; a total of worms are examined giving a measure supported on 999In the real experiment, the measure is supported on , but we simplify it for the purpose of this synthetic dataset. When constructing 3D Experiment dataset, we assume that counting is simultaneously performed in two more sites of an animal, where the number of induced cells can be . This results in the 3-dimensional support with points (Figure 5A).
Recent results in biological literature report differences in the number of induced cells between worm species C. elegans and C. briggsae [18, 55]. Inspired by these results, we construct four measures in 3D Experiment dataset: the first two correspond to two C. briggsae worm strains, and the last two C. elegans worm strains (Figure 5A). We use this set up to demonstrate the power of MOT asymptotic and permutation tests for testing (Figure 5B).

4.1.2 Anderes et al. 2016 dataset: inference
We consider the data constructed by [3], where it demonstrates the properties of a barycenter of finitely supported measures. Each measure represents a demand distribution (for some hypothetical product) over nine locations on the map (these locations are cities in California, and they constitute a finite support for demand distributions). There are measures in [3], each giving demand distribution during particular month (Figure 6A,B).
We use these measures as the ground truth and construct empirical measures by sampling multinomial counts based on this truth. We note that all underlying true measures are different, i. e., holds. Moreover, the differences are more drastic between months with different temperature since [3] allows the temperature to influence the demand. Our inference under confirms this claim by examining sub-collections of measures with months from the same season versus months from different seasons and comparing Confidence Regions for under these settings (Figure 6C).

4.2 Applications to real data
4.2.1 SEER Tumor size dataset: testing
An important question in cancer studies is to determine what factors are associated with development of metastases. In the case of breast cancer, [73] showed that metastatic risk increases with tumor size in intermediate and some of the large tumors ( cm), but does not increase in small tumors ( cm). The study used SEER database and considered a correlation between tumor size and prevalence of metastases. Here we confirm these results via k-sample testing, as described below. Further, we observe similar trend in three more cancer types: prostate cancer, lung cancer in males, and lung cancer in females (Figure 7).
We use the SEER database to extract the data on distributions of tumor size and term this dataset SEER Tumor size. We consider three groups of patients with different disease progression status, giving measures: patients with no metastases present at diagnosis and alive at the end of the study (), patients with metastases at diagnosis and alive at the end of the study (), and patients dead by the end of the study with death caused by the diagnosed cancer ().
First, we test for size distributions in small tumor range ( cm); we find no difference between groups, which holds for breast, prostate, and both lung cancer types (Figure 7A). In contrast, the groups are found different for tumors in larger range ( cm), which again holds for all considered cancer types (Figure 7B). The analysis confirms the significance of metastatic status for the tumor size distribution in intermediate/large tumors, but not the small tumors.

4.2.2 SEER Year of diagnosis dataset: inference
Our final example concerns with potential differences in distributions of characteristics in patients diagnosed at different times. Such differences are discussed in the case of early stage lung cancer, possibly due to improvements in diagnostic technologies [58]. Here we compare these distributions in a framework of -sample inference to confirm the differences in diagnosis results over time, and show that the trend is similar in both male and female patients.
We use SEER database to extract joint distributions of tumor size and patients’ age for lung cancer in males and females and term this dataset SEER Year of diagnosis. We consider four time periods giving measures: 2004 - 2006 (), 2009 - 2011 (), 2014 - 2016 (), 2019 - 2020 (). The distributions are found different by test in both male and female lung cancer cases, and we are interested to compare the differences between male and female collections of measures (Figure 8).
We observe visually that the differences between measures are of similar nature in male and female cases: later diagnostic years appear to have more small size tumors diagnosed in comparison to earlier years (Figure 8A). The similarities between male and female cases are reflected in overlapping Confidence Regions. The reported plan also highlights this finding by coupling the small size support points from the later periods with the larger size support points from the earlier period for patients of the same age (Figure 8B).

5 Discussion and Conclusions
5.1 Summary of results
In this paper, we proposed an Optimal Transport approach to -sample inference. We used the optimal value of the Multimarginal Optimal Transport program () to quantify the difference in a given collection of measures supported on finite subsets of , .
We derived limit laws for the empirical version of under assumptions of (all measures are the same) and (some measures may differ). We established that the limit cannot be Gaussian under , and provided sufficient conditions for the limit to be Gaussian under . Based on these results, we derived expression for the power function of the test of ; using this function, we proved consistency of the test against any fixed alternative and uniform consistency in certain broad classes of alternatives.
To sample from limit laws, we confirmed that derivative and m-out-of-n bootstrap methods are consistent under , and m-out-of-n bootstrap is consistent under . We proved polynomial complexity of sampling via derivative bootstrap, and defined a low complexity upper bound to approximate the test cut-off under . As an alternative to sampling for the limit laws, we defined a permutation test that is suitable if sample sizes are not large enough to validate to convergence to the limit.
We empirically showed that the based test of has strong finite sample power performance when compared with state-of-the-art methods. We also showed how to construct Confidence Regions for the true value under the assumptions of , and how to use this procedure to compare variability between collections of measures. Finally, we demonstrated the use of our methodology on several real biomedical datasets.
5.2 Limitations and future directions
Extensions to continuous measures: One of the main benefits of working on finite spaces is the ability to obtain a non-degenerate limit law under (i. e. the law with a non-zero variance), which allows to quantify fluctuations of the value when all measures are the same and test . In case, non-degeneracy may fail for continuous measures (see discussion in Section 1.2), but holds for discrete measures with limit laws of the form [72]. When extending [72] to continuous measures in case, [46] show that non-degenerate limit laws are possible provided that there exist dual variables (the Kantorovich potentials) which are not constant almost everywhere (Theorem 4.2 of [46]). While constant potentials are always present under (Corollary 4.6 of [46]), in discrete case there are other potentials around that are not constant (this holds for our case of , see discussion preceding Theorem 2.2). Lemma 11 of [74] shows that in case, it is possible to get non-constant potentials for continuous measures by requiring the support to be disconnected (intuitively, this resembles the discrete situation). It is an interesting future direction to analyze how the potentials would behave if our limit laws are extended to continuous measures and possibly different ground cost .
Improving upper bounds on the null distribution: While the proposed null upper bound is computationally tractable and tight for measures, it may be too large to provide a good power for testing for larger . The main reason for this weakness is the “independent” nature of optimization over dual variables recorded in the constraints. Indeed, the constraints that relate different entries of different dual vectors are omitted, and hence the dual vectors only interact via when solving the program. The bound can be strengthened by introducing additional constraints from , which will decrease the value of the program and provide a tighter bound on the null distribution. Two possible choices for these extra constraints are (1) including constraints that involve diverse entries from different dual vectors (e.g., ), and/or (2) sampling constraints at random (such constraint sampling techniques are widely applicable when solving large linear programs arising, for instance, in Markov Decision Processes [19]).
Faster computation of /barycenter value and permutation test: Our empirical power results suggest that could serve as a suitable statistics for permutation test, which can be performed if samples size are not large enough to validate asymptotic tests. In that case, the (or, equivalently, the barycenter) value has to be computed for each permutation. While direct computation of the barycenter cost is not plausible for a large cardinality of the support and/or number of measures , recently proposed sampling techniques can be used to speed up computation while preserving some statistical guarantees [42]. These methods are especially applicable when measures have different supports, and hence can be applied for extensions to continuous measures.
Appendix A Proofs of main results omitted in the main text
A.1 Details on the proof of Theorem 2.2(a)
Step 1 (Weak convergence of measures) For every , the empirical process converges weakly (Theorem 14.3 - 4 of [7]) as
where . Since the processes are independent in , by Theorem 1.4.8 of [79] we can view them jointly as
with respect to norm on . Using Slutsky’s Theorem (e.g., Example 1.4.7 of [79])
which is of the form
with .
Step 2 (Hadamard directional differentiability of ) Consider the functional given by , where is the optimal value of the primal program (8), or, equivalently, the dual program (10). The map is Gâteaux directionally differentiable at tangentially to a certain set (Theorem 3.1 of [32]) and locally Lipschitz (Remark 2.1 of [9]101010Locally Lipschitz property can also be shown directly, see Supplementary Material (Kravtsova (2024))). Hence, it is Hadamard directionally differentiable at tangentially to and two derivatives coincide (see Proposition 3.5 of [69] and also the discussion in Section 2.1 of [9]). The derivative is given by
(23) |
for , , .
A.2 Details on the proof of Theorem 2.2(b)
Consider the null distribution of in (15) given by
s.t. | |||
Note that for any given realization of random coefficients , this linear program has dual vectors of length each, i.e. variables in total. There are equality constraints, each for summing one of the entries of these vectors to zero. The large number of inequality constraints comes from the size of the primal constraint matrix , as discussed in Section 2.1.
To construct with smaller complexity than , we take the same objective function as in program above, but relax some of the inequality constraints subject to . Formally, we represent the equality constraints of as
where represents the linear operator with matrix whose th row picks th element from vectors and sums them up.
As detailed in the proof of Lemma 2.4 given in A.4, for , there are constraints in of the form
which can be written as
Similarly, for , the first dual vector satisfies
Thus, for ,
and the same constraints are satisfied by . Combining these constraints, we obtain
for all . This gives us with constraints that we choose to be the constraint set for the linear program (16) defining (which now has no equality constraints).
A.3 Details for Observation 2.3
Consider a cost vector in the program (8) with entries , where each index takes values in . Suppose that indexes have the same value, e.g. . Then,
where . The first term gives
and the second term gives
Combining the two and multiplying by gives the result.
A.4 Proof of Lemma 2.4
For notational clarity, we start with a case of measures, and assume for simplicity that they are supported on the metric space with only two points. Let be solutions to the dual program (10) satisfying , i.e. . Recall that the constraint matrix in the dual constraints is
Applying it to gives the constraints on as
The middle two constraints give
Recalling that , we get that
If the number of support points was , similar argument using constraints and would give
Recall from Observation 2.3 that which finishes the proof for measures.
To see the result for measures, note that contains constraints
and by Observation 2.3, giving
and, by similar reasoning,
Similarly, we conclude the same property for all dual variables indexed by , i.e.
concluding the proof.
A.5 Proof of Lemma 2.13
A.6 Proof of Lemma 2.14
Let be dual optimal solutions to the the problem . We will show that are dual optimal for , and hence
By optimality of for , the dual constraints hold with equality
for all , and hold with equality
on for some pairs with the set indexing the pairs that support the optimal Wasserstein coupling . Consider a multicoupling that agrees with on tuples , , and is zero otherwise (so the set of such tuples has a full mutlicoupling measure) - this is the candidate for the primal optimal solution to . Further, the above equality implies, for ,
Moreover, is no larger than the value of if some indices are not repeated, making the candidate dual feasible for . By complementary slackness (e. g., Lemma 1.1 of [35] which specifically addresses the multimarginal problem), our dual candidate is optimal.
A.7 Proof of Proposition 3.1
-
(a)
By Theorem 3.1 of [44], the numerical directional derivative estimator is consistent for the directional derivative under mild measurability conditions on . The choice with (or, more generally, , with ) ensures that assumptions of the theorem are satisfied, i.e. that and and allows to conclude consistency of this estimator for . Note that consistency does not depend on the form of , and hence holds under both and .
-
(b)
We will check that the estimator of the directional derivative map given by is uniformly consistent in the sense of Assumption 4 of [29]. Note that under , the estimator is given by
The expression is independent of , and hence the assumption is trivially satisfied. Thus, the proposed bootstrap is consistent by Theorem 3.2 of [29].
A.8 Details on the proof of Lemma 3.2
Consider the linear program
where is a vector of objective coefficients, and matrices with entries in , and a cost vector from primal MOT problem where the measures and support points in are represented with bits of precision.
Recall that a linear program over a polytope with exponentially many constraints can be solved in polynomial time by ellipsoid method if there exists a polynomial time separation oracle for the polytope (Theorem 8.5 of [5]). Here, we construct a separation oracle as follows. Given any , check if constraints are satisfied; if not, output a violated constraint (this is done with complexity). If , check (and output a violated constraint if needed) in by employing a polynomial time oracle in Definition 10 of [2], which is done with complexity (Proposition 12 of [2]).
[Acknowledgments] The author would like to thank the anonymous referees, an Associate Editor and the Editor for their constructive comments that greatly improved the quality of this paper. The author is indebted to Ilmun Kim for sharing the codes for the tests used for comparisons in Figure 5. The author sincerely thanks Adriana Dawes for the advice and support and Florian Gunsilius for the comments that greatly improved the paper. Helpful discussions with Helen Chamberlin, Jun Kitagawa, Facundo Mémoli, and Dustin Mixon are gratefully acknowledged. The efforts of SEER Program in creation and maintenance of the SEER database are gratefully acknowledged. The author is solely responsible for all the mistakes.
The author was supported by the National Institute of General Medical Science of the National Institutes of Health under award number R01GM132651 to Adriana Dawes.
References
- [1] {barticle}[author] \bauthor\bsnmAgueh, \bfnmMartial\binitsM. and \bauthor\bsnmCarlier, \bfnmGuillaume\binitsG. (\byear2011). \btitleBarycenters in the Wasserstein space. \bjournalSIAM Journal on Mathematical Analysis \bvolume43 \bpages904–924. \endbibitem
- [2] {barticle}[author] \bauthor\bsnmAltschuler, \bfnmJason M\binitsJ. M. and \bauthor\bsnmBoix-Adsera, \bfnmEnric\binitsE. (\byear2021). \btitleWasserstein barycenters can be computed in polynomial time in fixed dimension. \bjournalJournal of Machine Learning Research \bvolume22 \bpages1–19. \endbibitem
- [3] {barticle}[author] \bauthor\bsnmAnderes, \bfnmEthan\binitsE., \bauthor\bsnmBorgwardt, \bfnmSteffen\binitsS. and \bauthor\bsnmMiller, \bfnmJacob\binitsJ. (\byear2016). \btitleDiscrete Wasserstein barycenters: Optimal transport for discrete data. \bjournalMathematical Methods of Operations Research \bvolume84 \bpages389–409. \endbibitem
- [4] {bbook}[author] \bauthor\bsnmBalinski, \bfnmMichel L\binitsM. L. and \bauthor\bsnmRussakoff, \bfnmAndrew\binitsA. (\byear1984). \btitleFaces of dual transportation polyhedra. \bpublisherSpringer. \endbibitem
- [5] {bbook}[author] \bauthor\bsnmBertsimas, \bfnmDimitris\binitsD. and \bauthor\bsnmTsitsiklis, \bfnmJohn N\binitsJ. N. (\byear1997). \btitleIntroduction to linear optimization \bvolume6. \bpublisherAthena scientific Belmont, MA. \endbibitem
- [6] {barticle}[author] \bauthor\bsnmBigot, \bfnmJérémie\binitsJ., \bauthor\bsnmCazelles, \bfnmElsa\binitsE. and \bauthor\bsnmPapadakis, \bfnmNicolas\binitsN. (\byear2019). \btitleCentral limit theorems for entropy-regularized optimal transport on finite spaces and statistical applications. \bjournalElectronic Journal of Statistics \bvolume13 \bpages5120 – 5150. \endbibitem
- [7] {bbook}[author] \bauthor\bsnmBishop, \bfnmYvonne M\binitsY. M., \bauthor\bsnmFienberg, \bfnmStephen E\binitsS. E. and \bauthor\bsnmHolland, \bfnmPaul W\binitsP. W. (\byear2007). \btitleDiscrete multivariate analysis: Theory and practice. \bpublisherSpringer Science & Business Media. \endbibitem
- [8] {barticle}[author] \bauthor\bsnmBénasséni, \bfnmJacques\binitsJ. (\byear2012). \btitleA new derivation of eigenvalue inequalities for the multinomial distribution. \bjournalJournal of Mathematical Analysis and Applications \bvolume393 \bpages697-698. \bdoihttps://doi.org/10.1016/j.jmaa.2012.03.029 \endbibitem
- [9] {barticle}[author] \bauthor\bsnmCárcamo, \bfnmJavier\binitsJ., \bauthor\bsnmCuevas, \bfnmAntonio\binitsA. and \bauthor\bsnmRodríguez, \bfnmLuis-Alberto\binitsL.-A. (\byear2020). \btitleDirectional differentiability for supremum-type functionals: Statistical applications. \bjournalBernoulli \bvolume26 \bpages2143 – 2175. \bdoi10.3150/19-BEJ1188 \endbibitem
- [10] {barticle}[author] \bauthor\bsnmCarlier, \bfnmGuillaume\binitsG., \bauthor\bsnmDelalande, \bfnmAlex\binitsA. and \bauthor\bsnmMerigot, \bfnmQuentin\binitsQ. (\byear2024). \btitleQuantitative stability of barycenters in the Wasserstein space. \bjournalProbability Theory and Related Fields \bvolume188 \bpages1257–1286. \endbibitem
- [11] {barticle}[author] \bauthor\bsnmChen, \bfnmSu\binitsS. (\byear2020). \btitleA new distribution-free k-sample test: Analysis of kernel density functionals. \bjournalCanadian Journal of Statistics \bvolume48 \bpages167–186. \endbibitem
- [12] {barticle}[author] \bauthor\bsnmChen, \bfnmSu\binitsS. and \bauthor\bsnmPokojovy, \bfnmMichael\binitsM. (\byear2018). \btitleModern and classical k-sample omnibus tests. \bjournalWiley Interdisciplinary Reviews: Computational Statistics \bvolume10 \bpagese1418. \endbibitem
- [13] {barticle}[author] \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV., \bauthor\bsnmGalichon, \bfnmAlfred\binitsA., \bauthor\bsnmHallin, \bfnmMarc\binitsM. and \bauthor\bsnmHenry, \bfnmMarc\binitsM. (\byear2017). \btitleMonge–Kantorovich depth, quantiles, ranks and signs. \bjournalThe Annals of Statistics \bvolume45 \bpages223 – 256. \bdoi10.1214/16-AOS1450 \endbibitem
- [14] {bbook}[author] \bauthor\bsnmCleophas, \bfnmTon J\binitsT. J., \bauthor\bsnmZwinderman, \bfnmAeilko H\binitsA. H., \bauthor\bsnmCleophas, \bfnmToine F\binitsT. F. and \bauthor\bsnmCleophas, \bfnmEugene P\binitsE. P. (\byear2009). \btitleStatistics applied to clinical trials. \bpublisherSpringer. \endbibitem
- [15] {barticle}[author] \bauthor\bsnmConover, \bfnmWJ175230\binitsW. (\byear1965). \btitleSeveral k-sample Kolmogorov-Smirnov tests. \bjournalThe Annals of Mathematical Statistics \bvolume36 \bpages1019–1026. \endbibitem
- [16] {barticle}[author] \bauthor\bsnmCorchado-Sonera, \bfnmMarcos\binitsM., \bauthor\bsnmRambani, \bfnmKomal\binitsK., \bauthor\bsnmNavarro, \bfnmKristen\binitsK., \bauthor\bsnmKladney, \bfnmRaleigh\binitsR., \bauthor\bsnmDowdle, \bfnmJames\binitsJ., \bauthor\bsnmLeone, \bfnmGustavo\binitsG. and \bauthor\bsnmChamberlin, \bfnmHelen M\binitsH. M. (\byear2022). \btitleDiscovery of nonautonomous modulators of activated Ras. \bjournalG3 Genes—Genomes—Genetics \bvolume12 \bpagesjkac200. \endbibitem
- [17] {bbook}[author] \bauthor\bsnmDavison, \bfnmAnthony Christopher\binitsA. C. and \bauthor\bsnmHinkley, \bfnmDavid Victor\binitsD. V. (\byear1997). \btitleBootstrap methods and their application \bvolume1. \bpublisherCambridge university press. \endbibitem
- [18] {barticle}[author] \bauthor\bsnmDawes, \bfnmAdriana T\binitsA. T., \bauthor\bsnmWu, \bfnmDavid\binitsD., \bauthor\bsnmMahalak, \bfnmKarley K\binitsK. K., \bauthor\bsnmZitnik, \bfnmEdward M\binitsE. M., \bauthor\bsnmKravtsova, \bfnmNatalia\binitsN., \bauthor\bsnmSu, \bfnmHaiwei\binitsH. and \bauthor\bsnmChamberlin, \bfnmHelen M\binitsH. M. (\byear2017). \btitleA computational model predicts genetic nodes that allow switching between species-specific responses in a conserved signaling network. \bjournalIntegrative Biology \bvolume9 \bpages156–166. \endbibitem
- [19] {barticle}[author] \bauthor\bsnmDe Farias, \bfnmDaniela Pucci\binitsD. P. and \bauthor\bsnmVan Roy, \bfnmBenjamin\binitsB. (\byear2004). \btitleOn constraint sampling in the linear programming approach to approximate dynamic programming. \bjournalMathematics of operations research \bvolume29 \bpages462–478. \endbibitem
- [20] {barticle}[author] \bauthor\bsnmDeb, \bfnmNabarun\binitsN. and \bauthor\bsnmSen, \bfnmBodhisattva\binitsB. (\byear2023). \btitleMultivariate rank-based distribution-free nonparametric testing using measure transportation. \bjournalJournal of the American Statistical Association \bvolume118 \bpages192–207. \endbibitem
- [21] {barticle}[author] \bauthor\bsnmDel Barrio, \bfnmEustasio\binitsE., \bauthor\bsnmGiné, \bfnmEvarist\binitsE. and \bauthor\bsnmUtzet, \bfnmFrederic\binitsF. (\byear2005). \btitleAsymptotics for L2 functionals of the empirical quantile process, with applications to tests of fit based on weighted Wasserstein distances. \bjournalBernoulli \bvolume11 \bpages131–189. \endbibitem
- [22] {barticle}[author] \bauthor\bsnmDel Barrio, \bfnmEustasio\binitsE., \bauthor\bsnmGordaliza, \bfnmPaula\binitsP. and \bauthor\bsnmLoubes, \bfnmJean-Michel\binitsJ.-M. (\byear2019). \btitleA central limit theorem for Lp transportation cost on the real line with application to fairness assessment in machine learning. \bjournalInformation and Inference: A Journal of the IMA \bvolume8 \bpages817–849. \endbibitem
- [23] {barticle}[author] \bauthor\bparticledel \bsnmBarrio, \bfnmEustasio\binitsE. and \bauthor\bsnmLoubes, \bfnmJean-Michel\binitsJ.-M. (\byear2019). \btitleCentral limit theorems for empirical transportation cost in general dimension. \bjournalThe Annals of Probability \bvolume47 \bpages926 – 951. \bdoi10.1214/18-AOP1275 \endbibitem
- [24] {barticle}[author] \bauthor\bsnmDeng, \bfnmLi\binitsL. (\byear2012). \btitleThe mnist database of handwritten digit images for machine learning research. \bjournalIEEE Signal Processing Magazine \bvolume29 \bpages141–142. \endbibitem
- [25] {barticle}[author] \bauthor\bsnmDesai, \bfnmShreya\binitsS. and \bauthor\bsnmGuddati, \bfnmAchuta K\binitsA. K. (\byear2022). \btitleBimodal Age Distribution in Cancer Incidence. \bjournalWorld Journal of Oncology \bvolume13 \bpages329. \endbibitem
- [26] {barticle}[author] \bauthor\bsnmDudley, \bfnmRichard Mansfield\binitsR. M. (\byear1969). \btitleThe speed of mean Glivenko-Cantelli convergence. \bjournalThe Annals of Mathematical Statistics \bvolume40 \bpages40–50. \endbibitem
- [27] {barticle}[author] \bauthor\bsnmEfron, \bfnmB.\binitsB. (\byear1979). \btitleBootstrap Methods: Another Look at the Jackknife. \bjournalThe Annals of Statistics \bvolume7 \bpages1–26. \endbibitem
- [28] {bbook}[author] \bauthor\bsnmEmelichev, \bfnmVladimir Alekseevich\binitsV. A., \bauthor\bsnmKovalev, \bfnmMikhail Mikhailovich\binitsM. M. and \bauthor\bsnmKravtsov, \bfnmMikhail Konstantinovich\binitsM. K. (\byear1984). \btitlePolytopes, graphs and optimisation. \bpublisherCambridge University Press. \endbibitem
- [29] {barticle}[author] \bauthor\bsnmFang, \bfnmZheng\binitsZ. and \bauthor\bsnmSantos, \bfnmAndres\binitsA. (\byear2019). \btitleInference on directionally differentiable functions. \bjournalThe Review of Economic Studies \bvolume86 \bpages377–412. \endbibitem
- [30] {barticle}[author] \bauthor\bsnmFournier, \bfnmNicolas\binitsN. and \bauthor\bsnmGuillin, \bfnmArnaud\binitsA. (\byear2015). \btitleOn the rate of convergence in Wasserstein distance of the empirical measure. \bjournalProbability theory and related fields \bvolume162 \bpages707–738. \endbibitem
- [31] {barticle}[author] \bauthor\bsnmFukui, \bfnmTakayuki\binitsT., \bauthor\bsnmMori, \bfnmShoichi\binitsS., \bauthor\bsnmYokoi, \bfnmKohei\binitsK. and \bauthor\bsnmMitsudomi, \bfnmTetsuya\binitsT. (\byear2006). \btitleSignificance of the number of positive lymph nodes in resected non-small cell lung cancer. \bjournalJournal of Thoracic Oncology \bvolume1 \bpages120–125. \endbibitem
- [32] {bincollection}[author] \bauthor\bsnmGal, \bfnmTomas\binitsT. (\byear1997). \btitleA historical sketch on sensitivity analysis and parametric programming. In \bbooktitleAdvances in Sensitivity Analysis and Parametic Programming \bpages1–10. \bpublisherSpringer. \endbibitem
- [33] {barticle}[author] \bauthor\bsnmGhodrati, \bfnmLaya\binitsL. and \bauthor\bsnmPanaretos, \bfnmVictor M\binitsV. M. (\byear2022). \btitleDistribution-on-distribution regression via optimal transport maps. \bjournalBiometrika \bvolume109 \bpages957–974. \endbibitem
- [34] {barticle}[author] \bauthor\bsnmGiordano, \bfnmSharon H\binitsS. H., \bauthor\bsnmCohen, \bfnmDeborah S\binitsD. S., \bauthor\bsnmBuzdar, \bfnmAman U\binitsA. U., \bauthor\bsnmPerkins, \bfnmGeorge\binitsG. and \bauthor\bsnmHortobagyi, \bfnmGabriel N\binitsG. N. (\byear2004). \btitleBreast carcinoma in men: a population-based study. \bjournalCancer: Interdisciplinary International Journal of the American Cancer Society \bvolume101 \bpages51–57. \endbibitem
- [35] {barticle}[author] \bauthor\bsnmGladkov, \bfnmNikita A.\binitsN. A. and \bauthor\bsnmZimin, \bfnmAlexander P.\binitsA. P. (\byear2020). \btitleAn Explicit Solution for a Multimarginal Mass Transportation Problem. \bjournalSIAM Journal on Mathematical Analysis \bvolume52 \bpages3666-3696. \bdoi10.1137/18M122707X \endbibitem
- [36] {barticle}[author] \bauthor\bsnmGunsilius, \bfnmFlorian F\binitsF. F. (\byear2023). \btitleDistributional synthetic controls. \bjournalEconometrica \bvolume91 \bpages1105–1117. \endbibitem
- [37] {bmisc}[author] \bauthor\bsnmGurobi Optimization, LLC (\byear2023). \btitleGurobi Optimizer Reference Manual. \endbibitem
- [38] {barticle}[author] \bauthor\bsnmHallin, \bfnmMarc\binitsM., \bauthor\bsnmDel Barrio, \bfnmEustasio\binitsE., \bauthor\bsnmCuesta-Albertos, \bfnmJuan\binitsJ. and \bauthor\bsnmMatran, \bfnmCarlos\binitsC. (\byear2021). \btitleDistribution and quantile functions, ranks and signs in dimension d: A measure transportation approach. \bjournalAnnals of Statistics \bvolume49 \bpages1139 - 1165. \endbibitem
- [39] {barticle}[author] \bauthor\bsnmHallin, \bfnmM.\binitsM., \bauthor\bsnmMordant, \bfnmG.\binitsG. and \bauthor\bsnmSegers, \bfnmJ.\binitsJ. (\byear2021). \btitleMultivariate goodness-of-fit tests based on wasserstein distance. \bjournalElectronic Journal of Statistics \bvolume15 \bpages1328-1371 - 1371. \endbibitem
- [40] {barticle}[author] \bauthor\bsnmHart, \bfnmJeffrey D.\binitsJ. D. and \bauthor\bsnmCañette, \bfnmIsabel\binitsI. (\byear2011). \btitleNonparametric Estimation of Distributions in Random Effects Models. \bjournalJournal of Computational and Graphical Statistics \bvolume20 \bpages461–478. \endbibitem
- [41] {bmanual}[author] \bauthor\bsnmHarter, \bfnmReinhard\binitsR., \bauthor\bsnmHornik, \bfnmKurt\binitsK. and \bauthor\bsnmTheussl, \bfnmStefan\binitsS. (\byear2021). \btitleRsymphony: SYMPHONY in R \bnoteR package version 0.1-33. \endbibitem
- [42] {barticle}[author] \bauthor\bsnmHeinemann, \bfnmFlorian\binitsF., \bauthor\bsnmMunk, \bfnmAxel\binitsA. and \bauthor\bsnmZemel, \bfnmYoav\binitsY. (\byear2022). \btitleRandomized Wasserstein Barycenter Computation: Resampling with Statistical Guarantees. \bjournalSIAM Journal on Mathematics of Data Science \bvolume4 \bpages229-259. \endbibitem
- [43] {barticle}[author] \bauthor\bsnmHeitjan, \bfnmDaniel F\binitsD. F., \bauthor\bsnmManni, \bfnmAndrea\binitsA. and \bauthor\bsnmSanten, \bfnmRichard J\binitsR. J. (\byear1993). \btitleStatistical analysis of in vivo tumor growth experiments. \bjournalCancer research \bvolume53 \bpages6042–6050. \endbibitem
- [44] {barticle}[author] \bauthor\bsnmHong, \bfnmHan\binitsH. and \bauthor\bsnmLi, \bfnmJessie\binitsJ. (\byear2018). \btitleThe numerical delta method. \bjournalJournal of Econometrics \bvolume206 \bpages379–394. \endbibitem
- [45] {bbook}[author] \bauthor\bsnmHsu, \bfnmJason\binitsJ. (\byear1996). \btitleMultiple comparisons: theory and methods. \bpublisherCRC Press. \endbibitem
- [46] {barticle}[author] \bauthor\bsnmHundrieser, \bfnmShayan\binitsS., \bauthor\bsnmKlatt, \bfnmMarcel\binitsM., \bauthor\bsnmMunk, \bfnmAxel\binitsA. and \bauthor\bsnmStaudt, \bfnmThomas\binitsT. (\byear2024). \btitleA unifying approach to distributional limits for empirical optimal transport. \bjournalBernoulli \bvolume30 \bpages2846–2877. \endbibitem
- [47] {barticle}[author] \bauthor\bsnmHušková, \bfnmMarie\binitsM. and \bauthor\bsnmMeintanis, \bfnmSimos G.\binitsS. G. (\byear2008). \btitleTests for the multivariate k-sample problem based on the empirical characteristic function. \bjournalJournal of Nonparametric Statistics \bvolume20 \bpages263–277. \bdoi10.1080/10485250801948294 \endbibitem
- [48] {barticle}[author] \bauthor\bsnmKhan, \bfnmMd Marufuzzaman\binitsM. M., \bauthor\bsnmOdoi, \bfnmAgricola\binitsA. and \bauthor\bsnmOdoi, \bfnmEvah W\binitsE. W. (\byear2023). \btitleGeographic disparities in COVID-19 testing and outcomes in Florida. \bjournalBMC Public Health \bvolume23 \bpages79. \endbibitem
- [49] {barticle}[author] \bauthor\bsnmKiefer, \bfnmJ\binitsJ. (\byear1959). \btitleK-sample analogues of the Kolmogorov-Smirnov and Cramér-V. Mises tests. \bjournalThe Annals of Mathematical Statistics \bpages420–447. \endbibitem
- [50] {barticle}[author] \bauthor\bsnmKim, \bfnmIlmun\binitsI. (\byear2021). \btitleComparing a large number of multivariate distributions. \bjournalBernoulli \bvolume27 \bpages419 – 441. \bdoi10.3150/20-BEJ1244 \endbibitem
- [51] {barticle}[author] \bauthor\bsnmKlatt, \bfnmMarcel\binitsM., \bauthor\bsnmTameling, \bfnmCarla\binitsC. and \bauthor\bsnmMunk, \bfnmAxel\binitsA. (\byear2020). \btitleEmpirical regularized optimal transport: Statistical theory and applications. \bjournalSIAM Journal on Mathematics of Data Science \bvolume2 \bpages419–443. \endbibitem
- [52] {barticle}[author] \bauthor\bsnmLe Gouic, \bfnmThibaut\binitsT. and \bauthor\bsnmLoubes, \bfnmJean-Michel\binitsJ.-M. (\byear2017). \btitleExistence and consistency of Wasserstein barycenters. \bjournalProbability Theory and Related Fields \bvolume168 \bpages901–917. \endbibitem
- [53] {bbook}[author] \bauthor\bsnmLedoux, \bfnmMichel\binitsM. and \bauthor\bsnmTalagrand, \bfnmMichel\binitsM. (\byear2013). \btitleProbability in Banach Spaces: isoperimetry and processes. \bpublisherSpringer Science & Business Media. \endbibitem
- [54] {barticle}[author] \bauthor\bsnmLin, \bfnmTianyi\binitsT., \bauthor\bsnmHo, \bfnmNhat\binitsN., \bauthor\bsnmCuturi, \bfnmMarco\binitsM. and \bauthor\bsnmJordan, \bfnmMichael I\binitsM. I. (\byear2022). \btitleOn the complexity of approximating multimarginal optimal transport. \bjournalThe Journal of Machine Learning Research \bvolume23 \bpages2835–2877. \endbibitem
- [55] {barticle}[author] \bauthor\bsnmMahalak, \bfnmKarley K\binitsK. K., \bauthor\bsnmJama, \bfnmAbdulrahman M\binitsA. M., \bauthor\bsnmBillups, \bfnmSteven J\binitsS. J., \bauthor\bsnmDawes, \bfnmAdriana T\binitsA. T. and \bauthor\bsnmChamberlin, \bfnmHelen M\binitsH. M. (\byear2017). \btitleDiffering roles for sur-2/MED23 in C. elegans and C. briggsae vulval development. \bjournalDevelopment Genes and Evolution \bvolume227 \bpages213–218. \endbibitem
- [56] {barticle}[author] \bauthor\bsnmMukhopadhyay, \bfnmSubhadeep\binitsS. and \bauthor\bsnmWang, \bfnmKaijun\binitsK. (\byear2020). \btitleA nonparametric approach to high-dimensional k-sample comparison problems. \bjournalBiometrika \bvolume107 \bpages555–572. \endbibitem
- [57] {barticle}[author] \bauthor\bsnmMunk, \bfnmAxel\binitsA. and \bauthor\bsnmCzado, \bfnmClaudia\binitsC. (\byear1998). \btitleNonparametric validation of similar distributions and assessment of goodness of fit. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume60 \bpages223–241. \endbibitem
- [58] {barticle}[author] \bauthor\bsnmNations, \bfnmJoel A\binitsJ. A., \bauthor\bsnmBrown, \bfnmDerek W\binitsD. W., \bauthor\bsnmShao, \bfnmStephanie\binitsS., \bauthor\bsnmShriver, \bfnmCraig D\binitsC. D. and \bauthor\bsnmZhu, \bfnmKangmin\binitsK. (\byear2020). \btitleComparative trends in the distribution of lung cancer stage at diagnosis in the Department of Defense Cancer Registry and the Surveillance, Epidemiology, and End Results data, 1989–2012. \bjournalMilitary medicine \bvolume185 \bpagese2044–e2048. \endbibitem
- [59] {barticle}[author] \bauthor\bsnmPanaretos, \bfnmVictor M\binitsV. M. and \bauthor\bsnmZemel, \bfnmYoav\binitsY. (\byear2019). \btitleStatistical aspects of Wasserstein distances. \bjournalAnnual review of statistics and its application \bvolume6 \bpages405–431. \endbibitem
- [60] {bbook}[author] \bauthor\bsnmPanaretos, \bfnmVictor M\binitsV. M. and \bauthor\bsnmZemel, \bfnmYoav\binitsY. (\byear2020). \btitleAn invitation to statistics in Wasserstein space. \bpublisherSpringer Nature. \endbibitem
- [61] {binproceedings}[author] \bauthor\bsnmPark, \bfnmJoon Sung\binitsJ. S., \bauthor\bsnmO’Brien, \bfnmJoseph\binitsJ., \bauthor\bsnmCai, \bfnmCarrie Jun\binitsC. J., \bauthor\bsnmMorris, \bfnmMeredith Ringel\binitsM. R., \bauthor\bsnmLiang, \bfnmPercy\binitsP. and \bauthor\bsnmBernstein, \bfnmMichael S\binitsM. S. (\byear2023). \btitleGenerative agents: Interactive simulacra of human behavior. In \bbooktitleProceedings of the 36th annual acm symposium on user interface software and technology \bpages1–22. \endbibitem
- [62] {barticle}[author] \bauthor\bsnmPass, \bfnmBrendan\binitsB. (\byear2015). \btitleMulti-marginal optimal transport: theory and applications. \bjournalESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique \bvolume49 \bpages1771–1790. \endbibitem
- [63] {barticle}[author] \bauthor\bsnmRamdas, \bfnmAaditya\binitsA., \bauthor\bsnmGarcía Trillos, \bfnmNicolás\binitsN. and \bauthor\bsnmCuturi, \bfnmMarco\binitsM. (\byear2017). \btitleOn wasserstein two-sample testing and related families of nonparametric tests. \bjournalEntropy \bvolume19 \bpages47. \endbibitem
- [64] {barticle}[author] \bauthor\bsnmRizzo, \bfnmMaria L.\binitsM. L. and \bauthor\bsnmSzékely, \bfnmGábor J.\binitsG. J. (\byear2010). \btitleDISCO analysis: A nonparametric extension of analysis of variance. \bjournalThe Annals of Applied Statistics \bvolume4 \bpages1034 – 1055. \endbibitem
- [65] {binproceedings}[author] \bauthor\bsnmRömisch, \bfnmWerner\binitsW. (\byear2004). \btitleDelta method, infinite dimensional. In \bbooktitleEncyclopedia of Statistical Sciences. \bpublisherNew York: Wiley. \endbibitem
- [66] {barticle}[author] \bauthor\bsnmSantambrogio, \bfnmFilippo\binitsF. (\byear2015). \btitleOptimal transport for applied mathematicians. \bjournalBirkäuser, NY \bvolume55 \bpages94. \endbibitem
- [67] {barticle}[author] \bauthor\bsnmScholz, \bfnmFritz W\binitsF. W. and \bauthor\bsnmStephens, \bfnmMichael A\binitsM. A. (\byear1987). \btitleK-sample Anderson–Darling tests. \bjournalJournal of the American Statistical Association \bvolume82 \bpages918–924. \endbibitem
- [68] {barticle}[author] \bauthor\bsnmSejdinovic, \bfnmDino\binitsD., \bauthor\bsnmSriperumbudur, \bfnmBharath\binitsB., \bauthor\bsnmGretton, \bfnmArthur\binitsA. and \bauthor\bsnmFukumizu, \bfnmKenji\binitsK. (\byear2013). \btitleEquivalence of distance-based and RKHS-based statistics in hypothesis testing. \bjournalThe annals of statistics \bpages2263–2291. \endbibitem
- [69] {barticle}[author] \bauthor\bsnmShapiro, \bfnmAlexander\binitsA. (\byear1990). \btitleOn concepts of directional differentiability. \bjournalJournal of optimization theory and applications \bvolume66 \bpages477–487. \endbibitem
- [70] {barticle}[author] \bauthor\bsnmShapiro, \bfnmAlexander\binitsA. (\byear1991). \btitleAsymptotic analysis of stochastic programs. \bjournalAnnals of Operations Research \bvolume30 \bpages169–186. \endbibitem
- [71] {bbook}[author] \bauthor\bsnmSierksma, \bfnmGerard\binitsG. (\byear2001). \btitleLinear and integer programming: theory and practice. \bpublisherCRC Press. \endbibitem
- [72] {barticle}[author] \bauthor\bsnmSommerfeld, \bfnmMax\binitsM. and \bauthor\bsnmMunk, \bfnmAxel\binitsA. (\byear2018). \btitleInference for empirical Wasserstein distances on finite spaces. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume80 \bpages219–238. \endbibitem
- [73] {barticle}[author] \bauthor\bsnmSopik, \bfnmVictoria\binitsV. and \bauthor\bsnmNarod, \bfnmSteven A\binitsS. A. (\byear2018). \btitleThe relationship between tumour size, nodal status and distant metastases: on the origins of breast cancer. \bjournalBreast cancer research and treatment \bvolume170 \bpages647–656. \endbibitem
- [74] {barticle}[author] \bauthor\bsnmStaudt, \bfnmThomas\binitsT., \bauthor\bsnmHundrieser, \bfnmShayan\binitsS. and \bauthor\bsnmMunk, \bfnmAxel\binitsA. (\byear2022). \btitleOn the uniqueness of Kantorovich potentials. \bjournalarXiv preprint arXiv:2201.08316. \endbibitem
- [75] {barticle}[author] \bauthor\bsnmTameling, \bfnmCarla\binitsC., \bauthor\bsnmSommerfeld, \bfnmMax\binitsM. and \bauthor\bsnmMunk, \bfnmAxel\binitsA. (\byear2019). \btitleEmpirical optimal transport on countable metric spaces: Distributional limits and statistical applications. \bjournalThe Annals of Applied Probability \bvolume29 \bpages2744–2781. \endbibitem
- [76] {barticle}[author] \bauthor\bsnmTrick, \bfnmWilliam\binitsW. (\byear2011). \btitleComputer Assisted Quality of Life and Symptom Assessment of Complex Patients from April 2011-August 2012: Chicago, Illinois. \bjournalComputer \bvolume2012. \endbibitem
- [77] {barticle}[author] \bauthor\bsnmTrillos, \bfnmNicolás García\binitsN. G., \bauthor\bsnmJacobs, \bfnmMatt\binitsM. and \bauthor\bsnmKim, \bfnmJakwang\binitsJ. (\byear2023). \btitleThe multimarginal optimal transport formulation of adversarial multiclass classification. \bjournalJournal of Machine Learning Research \bvolume24 \bpages1–56. \endbibitem
- [78] {barticle}[author] \bauthor\bsnmUchida, \bfnmSeiichi\binitsS. (\byear2013). \btitleImage processing and recognition for biological images. \bjournalDevelopment, growth & differentiation \bvolume55 \bpages523–549. \endbibitem
- [79] {bbook}[author] \bauthor\bsnmVan Der Vaart, \bfnmAad W\binitsA. W., \bauthor\bsnmWellner, \bfnmJon A\binitsJ. A., \bauthor\bparticlevan der \bsnmVaart, \bfnmAad W\binitsA. W. and \bauthor\bsnmWellner, \bfnmJon A\binitsJ. A. (\byear1996). \btitleWeak convergence. \bpublisherSpringer. \endbibitem
- [80] {bbook}[author] \bauthor\bsnmVillani, \bfnmCédric\binitsC. (\byear2021). \btitleTopics in optimal transportation \bvolume58. \bpublisherAmerican Mathematical Soc. \endbibitem
- [81] {bbook}[author] \bauthor\bsnmVillani, \bfnmCédric\binitsC. \betalet al. (\byear2009). \btitleOptimal transport: old and new \bvolume338. \bpublisherSpringer. \endbibitem
- [82] {bbook}[author] \bauthor\bsnmWickham, \bfnmHadley\binitsH. (\byear2016). \btitleggplot2: Elegant Graphics for Data Analysis. \bpublisherSpringer-Verlag New York. \endbibitem
- [83] {barticle}[author] \bauthor\bsnmZand, \bfnmTanya P\binitsT. P., \bauthor\bsnmReiner, \bfnmDavid J\binitsD. J. and \bauthor\bsnmDer, \bfnmChanning J\binitsC. J. (\byear2011). \btitleRas effector switching promotes divergent cell fates in C. elegans vulval patterning. \bjournalDevelopmental cell \bvolume20 \bpages84–96. \endbibitem
- [84] {barticle}[author] \bauthor\bsnmZhan, \bfnmD\binitsD. and \bauthor\bsnmHart, \bfnmJD\binitsJ. (\byear2014). \btitleTesting equality of a large number of densities. \bjournalBiometrika \bvolume101 \bpages449–464. \endbibitem
- [85] {barticle}[author] \bauthor\bsnmZhang, \bfnmChao\binitsC., \bauthor\bsnmKokoszka, \bfnmPiotr\binitsP. and \bauthor\bsnmPetersen, \bfnmAlexander\binitsA. (\byear2022). \btitleWasserstein autoregressive models for density time series. \bjournalJournal of Time Series Analysis \bvolume43 \bpages30–52. \endbibitem
- [86] {barticle}[author] \bauthor\bsnmZhang, \bfnmHai-Liang\binitsH.-L., \bauthor\bsnmYang, \bfnmLi-Feng\binitsL.-F., \bauthor\bsnmZhu, \bfnmYao\binitsY., \bauthor\bsnmYao, \bfnmXu-Dong\binitsX.-D., \bauthor\bsnmZhang, \bfnmShi-Lin\binitsS.-L., \bauthor\bsnmDai, \bfnmBo\binitsB., \bauthor\bsnmZhu, \bfnmYi-Ping\binitsY.-P., \bauthor\bsnmShen, \bfnmYi-Jun\binitsY.-J., \bauthor\bsnmShi, \bfnmGuo-Hai\binitsG.-H. and \bauthor\bsnmYe, \bfnmDing-Wei\binitsD.-W. (\byear2011). \btitleSerum miRNA-21: Elevated levels in patients with metastatic hormone-refractory prostate cancer and potential predictive factor for the efficacy of docetaxel-based chemotherapy. \bjournalThe Prostate \bvolume71 \bpages326–331. \endbibitem
- [87] {barticle}[author] \bauthor\bsnmZhang, \bfnmJin\binitsJ. and \bauthor\bsnmWu, \bfnmYuehua\binitsY. (\byear2007). \btitlek-Sample tests based on the likelihood ratio. \bjournalComputational Statistics & Data Analysis \bvolume51 \bpages4682–4691. \endbibitem
- [88] {barticle}[author] \bauthor\bsnmZhang, \bfnmRuiyi\binitsR., \bauthor\bsnmOgden, \bfnmR Todd\binitsR. T., \bauthor\bsnmPicard, \bfnmMartin\binitsM. and \bauthor\bsnmSrivastava, \bfnmAnuj\binitsA. (\byear2022). \btitleNonparametric k-sample test on shape spaces with applications to mitochondrial shape analysis. \bjournalJournal of the Royal Statistical Society Series C: Applied Statistics \bvolume71 \bpages51–69. \endbibitem
- [89] {barticle}[author] \bauthor\bsnmZhu, \bfnmChangbo\binitsC. and \bauthor\bsnmMüller, \bfnmHans-Georg\binitsH.-G. (\byear2023). \btitleAutoregressive optimal transport models. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume85 \bpages1012–1033. \endbibitem