On the Wasserstein Distance Between -Step Probability Measures on Finite Graphs
Abstract
We consider random walks on a finite graph with respective lazinesses . Let and be the -step transition probability measures of and . In this paper, we study the Wasserstein distance between and for general . We consider the sequence formed by the Wasserstein distance at odd values of and the sequence formed by the Wasserstein distance at even values of . We first establish that these sequences always converge, and then we characterize the possible values for the sequences to converge to. We further show that each of these sequences is either eventually constant or converges at an exponential rate. By analyzing the cases of different convergence values separately, we are able to partially characterize when the Wasserstein distance is constant for sufficiently large .
Keywords— Wasserstein distance, transportation plan, Guvab, random walk, -step probability distribution, laziness, convergence, finite graph
1 Introduction
Optimal transport theory concerns the minimum cost, called the transportation distance, of moving mass from one configuration to another. In this paper, the notion of transportation distance that we are concerned with is the transportation distance, which we refer to as the Wasserstein distance. The Wasserstein distance has applications in fields such as image processing, where a goal is to efficiently transform one image into another (e.g., [RTG00]), and machine learning, where a goal is to minimize some transport-related cost (e.g., [FZM+15]).
The application of Wasserstein distance that motivates this paper is the definition of -Ricci curvature on graphs introduced by Lin, Lu, and Yau in [LLY11]:
Here is the graph distance between vertices and , while is the 1-step transition probability measure of a random walk starting at vertex with laziness , and is the Wasserstein distance between and .
The -Ricci curvature is a generalization of classical Ricci curvature, an object from Riemannian geometry that captures how volumes change as they flow along geodesics ([Oll11]). In [Oll09], Ollivier created the Ollivier-Ricci curvature to generalize the idea of Ricci curvature to discrete spaces, such as graphs. The Ollivier-Ricci curvature between and is defined via the Wasserstein distance between the -step transition probability measures of random walks starting at and . It captures roughly whether the neighborhoods of and are closer together than and themselves. The Ollivier-Ricci curvature is well-studied in geometry and graph theory ([JK21], [CK19], [BCL+18], [CKK+20], [vdHCL+21]), and is also used to study economic risk, cancer networks, and drug design, among other applications ([SGR+15], [SGT16], [SJB19], [WJB16], [WX21], [JK21]). Lin, Lu, and Yau further generalized the Ollivier-Ricci curvature to -Ricci curvature ([LLY11]), allowing for the laziness of the random walks considered to be greater than zero.
In [Oll09], Ollivier suggested exploring Ollivier-Ricci curvature on graphs at “larger and larger scales.” Thus, in this paper, we study the Wasserstein distance between -step probability measures of random walks with potentially nonzero laziness as gets larger and larger. Since -step probability distributions of random walks were used to study the initial “small-scale” -Ricci curvature, these -step probability distributions are a natural way to understand curvature at “larger and larger scales.” Jiradilok and Kamtue ([JK21]) study these -step distributions for larger and larger on infinite regular trees; in this paper, we study them instead on finite graphs.
Given a finite, connected, simple graph, we consider a random walk with starting vertex and laziness . The random walk is defined to be a Markov chain where at each step, we either stay at the current vertex with probability or pick a neighboring vertex at random and move there. We then consider the probability distribution encoding the likelihood of being at each possible vertex after steps of this random walk, which is called a -step probability distribution, or -step probability measure.
Given two such random walks on one graph, starting at vertices and with respective lazinesses , we define the Wasserstein distance between their two -step probability measures to be the minimum cost of moving between the two distributions. Here, moving 1 unit of mass across 1 edge costs 1 unit.
We can ask many questions about the Wasserstein distance at “larger and larger scales.” For instance, does the Wasserstein distance between the two -step probability distributions always converge as ? Also, what does it converge to in different cases? Even more interestingly, what can we say about the rate of convergence? In particular, when does the distance eventually remain constant, and how long could it take to reach constancy?
In this paper, we show in all cases that either the Wasserstein distance converges or the Wasserstein distance at every other step converges. We also classify what the distance converges to in all cases, addressing the first and second questions.
We then seek to understand the rate of convergence of the Wasserstein distance. We reach two main results. First, addressing the third question, we show that unless the Wasserstein distance at every other step is eventually constant, its rate of convergence is exponential (Theorem 8.1). We also address the fourth question by providing a partial characterization of exactly when the Wasserstein distance is eventually constant (Theorem 8.2).
In Section 2, we provide formal definitions of key concepts used throughout the paper. In particular, we recall the definition of the Wasserstein distance and introduce the notion of a Guvab. A Guvab refers to a pair of random walks on a finite connected simple graph, and these Guvabs are the primary object we study in this paper. In Section 3, we classify for all possible Guvabs the limiting behavior of the Wasserstein distance, when the distance converges, and what the distance converges to. This characterization provides a natural way to classify the Guvabs into four categories based on their limiting behavior: ; ; ; and . In each of Sections 4, 5, 6, and 7, we consider one of these four categories of Guvabs and determine when the Wasserstein distance is eventually constant as well as examine the rate of convergence if the Wasserstein distance is not constant. Along the way, we encounter various interesting results about the different cases. Finally, in Section 8, we present main results about constancy and rate of convergence in general, obtained by considering each of these four cases individually.
2 Preliminaries
We begin with several formal definitions that we use in the remainder of the paper. We start by recalling graph theory terminology and the definition of Wasserstein distance on graphs. Then, we review random walks on graphs and define Guvabs. Finally, we briefly discuss terminology used to describe convergence.
In this paper, all graphs we consider are finite, connected, simple graphs. For a graph , let be the vertex set of and be the edge set of , i.e., the set of unordered pairs where are adjacent vertices in . Further, for any , let be the neighbor set of . Finally, denote by the graph distance between vertices and .
Definition 2.1.
Define a distribution on the graph to be a function . We say is a nonnegative distribution if, for all , we have . A nonnegative distribution is a probability distribution if .
For convenience, we will denote by the distribution with value at all vertices, (i.e., for all , we have ). In addition, we will refer to a distribution for which as a zero-sum distribution.
Given a graph , let be an infinite sequence of distributions. Suppose that is a strictly increasing function such that for all vertices , exists. Then denote by the pointwise limit. Namely, for all , let be .
For a given graph , let be the set of all ordered pairs of distributions on that satisfy . Further, let be the set of all ordered pairs with nonnegative distributions.
We now introduce some terminology from optimal transport theory. We follow definitions equivalent to those in the book of Peyre and Cuturi [PC19].
In Definitions 2.2, 2.3, and 2.4, we let be a graph with two nonnegative distributions on such that .
Definition 2.2 (c.f. [PC19]).
Define a transportation plan from to for to be a function such that
-
•
for any vertices , we have that ,
-
•
for all vertices , we have that ,
-
•
for all vertices , we have that .
Denote by the set of all transportation plans from to .
Following [Kan06], we can intuitively visualize a transportation plan as a way to move mass distributed over the vertices of according to along the edges of to an arrangement according to . We now consider the cost of a given transportation plan : if moving 1 unit of mass across 1 edge has a cost of 1, how much does it cost to move the mass distribution of to that of according to ?
Definition 2.3 (c.f. [PC19]).
Define the cost function to take any transportation plan to its cost
Definition 2.4 (c.f. [PC19]).
Define the Wasserstein distance by
We can thus interpret the Wasserstein distance as the minimum cost of transporting mass from its arrangement in distribution to an arrangement in distribution .
Remark 2.5.
Note that for any distribution on , if , , , and are all nonnegative, then (for a proof, see for example [JK21], which notes that the Wasserstein distance between and can be defined in terms of ).
Let be a graph with two distributions on such that
Let be a distribution such that and are both nonnegative. We extend the domain of the Wasserstein distance to include distributions with negative entries by defining to be . By Remark 2.5, is well-defined.
Even if and have negative entries, we can interpret as the cost of some optimal “transportation plan” that moves mass from distribution to distribution .
Thus, in the rest of the paper, “transportation plans” between distributions and allow for negative entries in and . In this case, a transportation plan rigorously refers to a transportation plan from to for some large enough that and are both nonnegative. In particular, the movement of mass between and from a vertex to a different vertex actually refers to that same movement of mass from to between the distributions and .
We now discuss a different way of calculating the Wasserstein distance.
Definition 2.6 (c.f. [PC19]).
Given a graph , a 1-Lipschitz function is a function on the vertices of G where for any , we have that . Let be the set of all 1-Lipschitz functions on .
Theorem 2.7 (Kantorovich Duality, c.f. [PC19]).
Let be a graph with two distributions on such that . Then
We now seek a way to refer to a pair of random walks on a graph, as these pairs of random walks are the objects we study. The information needed to define such a pair consists of the graph , the starting vertices and of the two random walks, and the respective lazinesses and of the random walks. We thus define a Guvab comprised of this information.
Definition 2.8.
We define a Guvab to be a tuple where is a finite, connected, simple graph, , and with .
Definition 2.9.
Consider a graph . For any starting vertex and laziness , consider the random walk such that , and, for , we have with probability , and with probability for any . We say the probability distribution for is a k-step probability measure.
Consider some Guvab . We let be the Markov chain corresponding to a random walk with laziness starting from vertex and we let be the Markov chain corresponding to a random walk with laziness starting from vertex . When it is clear which Guvab we are referring to, we write instead of , respectively.
Consider some Guvab . For all we let be the k-step probability measures of respectively. We let and . When it is clear which Guvab we are referring to, we write instead of , respectively.
Given a Guvab , we define and to be the transition probability matrices of and , respectively. In particular, for all , we have that and , where the distributions are row vectors. We also define to be the transition probability matrix of a random walk with zero laziness on (note that does not depend on the starting vertex of the random walk). We note that and only depend on and , not and . In particular, and .
Lemma 2.10.
Let be the union of the set of eigenvalues of and the set of eigenvalues of . For all vertices , there exist some constants such that for all , we have .
Proof.
This follows from the fact that and are diagonalizable (since random walks are reversible ([LP17]) and thus have diagonalizable matrices ([LP17], Chapter 12)). Say has eigenvalues and has eigenvalues . Since is diagonalizable, we can write it as for invertible matrix and diagonal matrix with diagonal entries .
Then , so for all there exist constants such that for all we have . By similar reasoning, for all there exist constants such that for all we have . Therefore, for all , there exist some constants such that for all , we have that . If for any and we have , we can collect these like terms and thus create a list of distinct eigenvalues and constants such that for all , we have . In particular, will be exactly the elements of the union of the set of eigenvalues of and the set of eigenvalues of . ∎
In the next section, we discuss when and how the Wasserstein distance converges, which is related to the convergence of probability distributions of random walks. Since random walks can be viewed as Markov chains, we reference some classical Markov chain theory, using the same definitions as in [LP17]. We also use the following well-known Markov chain theorem.
Theorem 2.11 (c.f. [LP17]).
Suppose that a Markov chain is aperiodic and irreducible with probability distributions and stationary distribution . Then .
Finally, in our discussion of convergence, we encounter cases where the Wasserstein distance is eventually constant. To quantify this precisely, we provide the following definition.
Definition 2.12.
We call an infinite sequence for eventually constant if there exists such that for all , we have that .
3 Classifying End Behavior of
In this section, we seek to enumerate the possible end behaviors of the Wasserstein distance for a Guvab. In particular, we prove results about when the Wasserstein distance converges and what it converges to for different Guvabs. The classification of Guvabs by end behavior paves the way for our later discussion of the rate of convergence of the Wasserstein distance.
We begin with a technical lemma showing that the limit of the Wasserstein distance is the Wasserstein distance of the limit, as we expect.
Lemma 3.1.
Let be a strictly increasing function. If and (and, in particular, both limits exist), then
Proof.
Note that, by the triangle inequality,
and
This implies that
However,
(and similarly for ). The above inequality implies that
as desired. ∎
Due to classical Markov chain theory, we expect that in most cases, the probability distributions of both random walks converge to the same stationary distribution, and thus . The following definition and lemma quantify the stationary distribution that most random walks converge to. The subsequent theorem specifies what the “most cases” in which the distance goes to zero are.
Definition 3.2.
For any graph , we define the distribution to be such that for any , we have
Lemma 3.3.
When , the k-step probability measure converges to the stationary distribution .
Proof.
Recall that is the Markov chain of the random walk. We have that is aperiodic (we can return from a vertex to itself in one step) and irreducible ( is connected). We have that for any vertex ,
Thus, is a stationary distribution of . Hence, by Theorem 2.11, is a limiting distribution for and thus . ∎
Theorem 3.4.
The value converges to as if and only if one of the following conditions is true:
-
•
,
-
•
and ,
-
•
is not bipartite and ,
-
•
and there exists a path from to with an even number of steps.
Proof.
We now consider the cases where or . If , then stays at forever. Thus, in order to have , we need and . This is sufficient to imply .
It remains to look at the case where and , which we break into subcases based on whether is bipartite.
We first tackle the subcase where is not bipartite, i.e., contains an odd cycle. Since , both are aperiodic (there is a path from any vertex to itself in both an odd number of steps and an even number of steps via the odd cycle) and irreducible ( is connected). Thus, and as before.
Finally, we address the subcase where is bipartite with sides . Here, is periodic (with period 2), so must be periodic as well to have . Thus, . If are on different sides of , then and will never be on the same side, so we cannot have . Otherwise, without loss of generality let . Consider the Markov chains and with vertex set . Since and are aperiodic (we can get from a vertex to itself in one step of or by moving back and forth along the same edge) and irreducible ( is connected), and they both have the same transition matrix, the Markov chains converge to the same stationary distribution. Similar reasoning applies for and . This finishes the proof for this case, hence completing the proof of Theorem 3.4. ∎
In the next part of this section, we specify what the stationary distributions look like for any possible Guvab, particularly considering Guvabs with more than one stationary distribution . We show that all Guvabs either have one set of end behaviors they converge to or switch back and forth between two sets of end behaviors.
Suppose . Let be bipartite with sides , and without loss of generality let . Let and . Let denote restricted to for For , let be a distribution on such that
for . Further, for , let be a distribution on that is on and has value 0 elsewhere.
Lemma 3.5.
For , the distribution is the limiting distribution of .
Proof.
First, we claim that and , where is the transition matrix of . Note that for , we have
This is because for all , we have , which implies . For , we have . This is because for all , we have , which implies . Hence, and, by similar reasoning, . Thus, and as desired.
Also, we note that
We now see that is a stationary distribution of for . Since and are irreducible and aperiodic (as shown in the proof of Theorem 3.4), we have that is a limiting distribution of for . ∎
Corollary 3.6.
If , then as , we have converges to and converges to . Analogously, if then as , we have converges to and converges to .
Proof.
Suppose ; the proof will proceed analogously if . Then, will always be 0 on and, by Lemma 3.5, it will converge to on because is the probability distribution of on . Similarly, will always be 0 on and it will converge to on . Thus, converges to and converges to . ∎
Corollary 3.7.
For any Guvab, and are well-defined.
Proof.
We show that for any , we have that and are well-defined; this implies the statement of the corollary. When is bipartite and , we know that and (assuming, without loss of generality, that ). When and is not bipartite or when , we have
Finally, when , we know . This covers all possible cases for and , so we are done. ∎
For any Guvab, we refer to as and as .
The following corollary is quite important for the rest of this section and the remainder of this paper. Its relevance to this section is that will be well-defined unless . The corollary is pertinent to the rest of the paper because it indicates that the rates of convergence of and are always well-defined. Thus, for any possible Guvab, we can study and state results about the rates of convergence of and .
Corollary 3.8.
We have that and are always well-defined.
Proof.
We know that and . ∎
We soon discuss many cases where exists, so we designate a way to refer to this limit. For any Guvab where exists, we denote by the limit .
We can now state and prove our main theorems about whether the Wasserstein distance converges and the values it converges to. For any possible Guvab, Theorem 3.10 allows us to determine whether the Wasserstein distance converges. Furthermore, Theorem 3.9 allows us to, in most cases, quickly and easily determine what value the Wasserstein distance will converge to. Finally, these theorems provide a framework for us to classify the Guvabs into four categories so we can use casework to understand the rate of convergence.
Theorem 3.9.
Unless is bipartite, , and , we have that is always well defined, and furthermore
-
•
under the conditions specified in Theorem 3.4,
-
•
if and ,
-
•
if and is bipartite.
Proof.
The first condition is clear by Theorem 3.4. Next, we look at the case where and . By Theorem 3.4, this corresponds to the case where is bipartite and are on opposite sides of . Without loss of generality, let and . Then, as , we have converges to and converges to by Corollary 3.6. Analogously, converges to and converges to . Thus, . We have that because to get from to , we must move all the mass from across at least one edge to . Also, because we can achieve a distance of 1 by, for any given edge with and , moving a mass of from to .
We now consider the case when and is bipartite. Without loss of generality, let . Since , we have that and . Since , we have that . Thus, we have that and . If we show that , we will have shown the third condition. We know that will have half its mass on and half its mass on because
Thus, half the mass must move from to , so . We can also achieve a distance of exactly from to by, for any given edge with and , moving mass from to . Thus, and by an analogous argument, .
We have now considered all cases where and where . The only case left is where and . Here, and where is the distribution with 1 at and 0 elsewhere. Thus, , which is a constant. ∎
Theorem 3.10.
The distance does not converge as if and only if is bipartite, and , and
Proof.
By Theorem 3.9, we know that the only case where it is possible for not to converge is when is bipartite, , and . In this case, . Additionally, assuming without loss of generality that , we have that
Thus, converges as if and only if .
To calculate , we note that we must move all the mass of to vertex . To move all the mass at some vertex to , we necessarily move a mass of over a distance of . Thus the total transportation cost, and thus the total Wasserstein distance , is given by
By the same reasoning, we have that
Given that is bipartite, , and , we know that the Wasserstein distance converges if and only if , which is true if and only if
since the parity of depends only on the side of that is on. Thus, the theorem statement follows. ∎
We now present a table summarizing much of the information about convergence discussed in this section.
Conditions on | does not converge | ||||
---|---|---|---|---|---|
bipartite, | |||||
bipartite, | |||||
non-bipartite, | |||||
non-bipartite, |
Remark 3.11.
We know that the case of bipartite, , and is possible by considering a star with at the center and . We know that the case of non-bipartite, , and is impossible because in order for it to be possible, would need half of its mass to be at . Since mass of is proportional to degree, every edge would have to be incident to , making the graph bipartite.
The following corollary provides a categorization of the Guvabs into four types. In the next four sections of this paper, we examine each of these categories in turn.
Corollary 3.12.
Each Guvab satisfies exactly one of the following four conditions:
-
•
and ,
-
•
and ,
-
•
and ,
-
•
.
Proof.
If we understand the convergence of the Wasserstein distance in all four of these cases, then we understand the convergence for all Guvabs. The subsequent four sections each discuss the convergence of the Wasserstein distance in one of these cases. Our two main convergence theorems, presented in Section 8, put together the general results obtained by examining these four cases individually.
4 Convergence when
In this section we consider Guvabs with and . Recall that these are exactly the Guvabs for which is bipartite, and are on different sides of the bipartite graph, and . We show that all such Guvabs have a Wasserstein distance that is eventually constant. We also begin to understand how long it takes for the Wasserstein distance to reach constancy.
We first recall that the Wasserstein distance between two distributions and with potentially negative entries is the cost of an optimal transportation plan for moving mass444As discussed in section 2, the mass of a distribution at a vertex is , the value of the distribution at that vertex. from to . Thus, to prove the eventual constancy of the Wasserstein distance, we construct an algorithm that produces a transportation plan between any two distributions. Then, we show that when certain inequalities are satisfied, this transportation plan has a cost of exactly 1 and is optimal. Finally, we prove that when is eventually sufficiently close to either of the stationary distributions or , these inequalities are satisfied.
We start by constructing the algorithm. Pick a spanning tree of and let be the set of leaves of . Define a function such that .
For any finite set , let denote the set of all permutations of . We say that an -monotone ordering is a permutation of such that is a non-decreasing sequence.
Definition 4.1.
Given a graph , a spanning tree of , an -monotone ordering and zero-sum distribution , we define the tree-based transport algorithm, which transports mass from to , to be an -step algorithm in which at the th step,
-
•
if the current mass at is nonnegative, we distribute it evenly among all with indices greater than ,
-
•
if the current mass at is negative, we take an equal amount of mass to vertex from all with indices greater than , so that the mass at is now .
In Lemma 4.2, we see that this algorithm produces a valid transportation plan from to . We refer to this tree-based transportation plan as . Given and , we let denote the distribution of mass on the vertices of after steps of the algorithm.
Lemma 4.2.
The tree-based transport algorithm on always produces a valid transportation plan from to .
Proof.
After the th step of the tree-based transport algorithm, the mass at each of the vertices is , since the mass at becomes zero at the th step, and thereafter no mass is moved to or from . Thus, after the th step, the only vertices of with nonzero mass will be and . Since the total mass sums to and is adjacent to , the th step of the algorithm simply moves the positive mass to the negative mass so that all vertices have mass . ∎
We now prove a useful property of this algorithm.
Lemma 4.3.
Given a graph , tree and -monotone ordering , for all , we have that is a linear function on the space of zero-sum distributions.
Proof.
It suffices to show that for any two zero-sum distributions and , we have . We prove this by induction on .
Base case: When , we have that .
Inductive step: For the inductive hypothesis, we assume that . We want to show that . For any distribution , if denotes the number of neighbors of with indices greater than , then all of the following are true:
-
•
,
-
•
for with , we have ,
-
•
for all other vertices , we have .
Thus, . For with , we have that
Finally, for all other vertices , we have that
We have shown for all the vertices, so we have proven the inductive step and thus the lemma. ∎
In Definition 4.4 and the subsequent results, we define the inequalities used in conjunction with the tree-based transport algorithm and show that when these inequalities are satisfied, the Wasserstein distance between and will be 1.
Definition 4.4.
For any graph , zero-sum distribution , spanning tree , and -monotone ordering on , define the tree-based transport inequalities to be the union of the following two sets of inequalities:
-
•
: the set of inequalities of the form for all and ,
-
•
: the set of inequalities of the form for all
Lemma 4.5.
If the tree-based transport inequalities are satisfied, then the cost of the transportation plan is at most the sum of positive mass in , i.e.,
Proof.
We note that the inequalities in mean that for any vertex , the sign of stays the same until the mass becomes 0 at the th step, at which point it remains 0 for the rest of the algorithm.
Since only positive mass moves, it suffices to show that all the positive mass of moves a distance of at most 1. At each step of the tree-based transport algorithm, any mass that moves must move a distance of exactly 1. Thus, it suffices to show that all mass moves at most one time in .
To show this, we demonstrate that the mass that moves at each step of the algorithm has not moved before, since this means all mass moves at most once overall. We begin by demonstrating that every time positive mass moves, it moves from a vertex for which .
The only way for mass to move is via the th step of the tree-based transport algorithm, which starts from the distribution . Suppose the vertices are . If is zero, then no mass moves on the th step. If is positive, then at the th step, mass moves away from . In addition, by the inequalities in , if is positive then is positive and if is negative then is negative. Thus, by the inequalities in , we have for all . If is negative, mass moves from these to , so all mass movements are from a vertex for which . Thus, in all three of these cases, every time positive mass moves, it moves from a vertex for which .
Thus, consider the th vertex, call this , and suppose that . Then, by the inequalities in , we know has negative mass at the neighbors of , so throughout all steps of the algorithm, the mass at the neighbors of was nonpositive. This means that anytime executing a step for one of the neighbors of changed the mass at , mass moved from to its neighbors. Because no mass moved from another vertex to , any remaining positive mass at has not yet moved. We also know that the remaining mass at is always nonnegative by the inequalities in . Thus, whenever we execute a step of the algorithm for one of the neighbors of , the nonnegative mass that moves from has not yet moved.
Furthermore, during the th step of the algorithm, all the remaining nonnegative mass at moves away from it, and this mass has not yet moved. Mass movements due to steps of the algorithm for neighbors of and due to the th step, which is for , make up all possible movements of the mass initially at . This argument holds for all vertices for which , so all possible movements of positive mass move mass that has not been moved before. Thus, we are done. ∎
Corollary 4.6.
For any graph and zero-sum distribution , if for some and the tree-based transport inequalities are satisfied, then
Proof.
By Lemma 4.5, we have that , the sum of positive mass, is the upper bound. For the lower bound, we note that all positive mass must move because we only move positive mass. Thus, all positive mass must move at least a distance of 1, so will be at least the sum of positive mass. ∎
Corollary 4.7.
For a Guvab where and , suppose that there exists some spanning tree of and -monotone ordering such that satisfies the tree-based transport inequalities . Then .
Proof.
Recall that when and , we must have that is bipartite, and are on different sides of the bipartite graph, and . Thus, for all we have that and are nonzero on disjoint sets of vertices, since at all times is nonzero only on one side and is nonzero only on the other side. Thus , so by Corollary 4.6 we have that
∎
Now all that remains to be shown is that once is sufficiently close to either of the stationary distributions or , the tree-based transport inequalities will be satisfied. To prove this, we will first show that and lie on the interior of the region of distributions that satisfy the inequalities. The next lemma helps show that and satisfy the inequalities.
Lemma 4.8.
Suppose we have a bipartite graph with sides and and a distribution such that for we have and for we have . Then pick an arbitrary spanning tree T and -monotone ordering on . Consider the tree-based transport plan . After each step, for each for , we have that for with and that for with .
Proof.
We know by Lemma 4.3 that for all and for all , we have . Thus, it suffices to show that after all steps for , we have that for with and that for with .
To prove this, for all , we define the graph to consist of the vertex set and all the edges of that have both endpoints in . It suffices to show by induction on that for , we have for with and we have for with .
Base case: When , we note that . When , by the definition of , we have that for with and that for with .
Inductive step: The inductive hypothesis is that for with and that for with . Given that this is true for , we want to show that it is true for .
We suppose that ; the case where will proceed analogously. After steps, has a mass of . During the th step, this mass is distributed evenly among with ; we note that there are exactly of these neighbors. Thus, each will receive mass. By the inductive hypothesis we have that before step , each of these neighbors had mass, since each of the neighbors of is in , the opposite side of the bipartite graph. Then, after step , each has mass , and the remaining vertices with indices greater than have the same mass as before. We note that for all , if then because the edge is being removed, and otherwise . We have just shown that this is exactly the mass at all vertices with indices greater than after the th step of the algorithm. Thus, at each vertex with , we have that for , the mass at after steps is and for , the mass at after steps is . We proceed analogously in the case where . This proves the inductive hypothesis, and therefore proves the lemma. ∎
We are now ready to show that and lie on the interior of the region of distributions that satisfy the inequalities.
Corollary 4.9.
For any Guvab where and , we have that and lie strictly on the interior of the region of distributions that satisfy the tree-based transport inequalities .
Proof.
We prove this for ; by symmetry it will hold for as well since . If the sides of are and , with and , then for , we have that
and for we have that
Then for all such that , we have that . We also have that by Lemma 4.8, for all and . Thus, and lie strictly on the interior of the region of distributions that satisfy the tree-based transport inequalities . ∎
Using these results, we are now ready to prove the main claim that the Wasserstein distance is eventually constant when and .
We first define a variable that corresponds to how long takes to reach constancy. Note that this variable can be infinity if is not eventually constant.
Definition 4.10.
For any Guvab where , define to be
Theorem 4.11.
For any Guvab with and , we have .
Proof.
Pick an arbitrary spanning tree of and -monotone ordering . By Corollary 4.9, and are on the interior of the region of distributions that satisfy the tree-based transport inequalities . We note that all the inequalities in can be written in the form , where is a continuous function. Thus, by the definition of a continuous function, there exists some such that for all that satisfy for all or satisfy for all , we have that . We also know, by the formal definition of a limit, that there exists some such that for all and all , we have and . Thus, for all , we have . By Corollary 4.7, for all , we have that . Hence . ∎
We next hope to characterize how long it takes the Wasserstein distance of these Guvabs with and to become constant. In particular, we prove upper and lower bounds for . We start with the upper bound. To prove this upper bound, we first prove a lemma quantifying exactly how close to or a distribution must be in order for the tree-based transport inequalities to be satisfied.
Lemma 4.12.
Consider a Guvab with and . Pick an arbitrary spanning tree and -monotone ordering . Let . If for a distribution it is true that for all vertices we have that or it is true that for all vertices we have that , then satisfies the tree-based transport inequalities .
Proof.
We prove this for , and an analogous argument will hold for .
We note that by Lemma 4.8 we have that if we start with , then at any point in the tree-based transport algorithm through step , the absolute value of the mass at any vertex is at least . Thus, if at any point in the algorithm through step the mass at a vertex differs by at most from , then the tree-based transport inequalities are satisfied because mass is never the wrong sign.
It thus suffices to show that for all and for all , we have . To prove this, we note that is a zero-sum distribution, and by Lemma 4.3 for all and for all , we have . We consider the quantity . This will be nonincreasing as gets larger, since at step of the algorithm the absolute value of the mass at decreases by exactly while the sum of absolute values at ’s neighbors cannot increase by more than . The maximum value of this sum is (since this is an upper bound for the value at the beginning). We know that so , which is exactly what we wanted to show, so we are done. ∎
With this lemma established, we can now prove our upper bound for .
Lemma 4.13.
Let be where L is the set of all eigenvalues of and . Then for a Guvab where and , we have .
Proof.
We use [DS91, Prop. 3]. The Markov chains and are both converging to their even stationary distributions and . For convenience, denote by and by . Once at all vertices, and are both less than or equal to away from their respective stationary distributions, will satisfy the tree-based transport inequalities by Lemma 4.12. Since and are both Markov chains with limiting distributions, we use notation analogous to that of [Sin92] and say that . Similarly, . Then for and we let be the minimum nonnegative integer such that for all . Thus, by [DS91, Prop. 3], the time it takes for to eventually have distance satisfies
Then we just need to bound the right-hand side above. This gives
By similar reasoning, the same bound works for , the time it takes for to eventually have distance . Thus, is an upper bound for . ∎
We now establish a lower bound for .
Lemma 4.14.
For a Guvab where and , we have .
Proof.
We note that for if . Similarly, for if . Suppose and consider any pair of vertices such that and . Then and , so . Therefore all mass will have to move a distance of at least 2 to get from to , so . ∎
5 Convergence when
In this section, we consider Guvabs where and . Recall that these are exactly the Guvabs for which is bipartite and . As in the previous section, and for similar reasons, the Wasserstein distance will eventually be the sum of positive mass. In this case, however, the Wasserstein distance is not eventually constant but rather an exponential that we can express explicitly. To prove this, we proceed by a similar strategy as in the case. In particular, we show that the tree-based transport inequalities will eventually be satisfied, and compute the Wasserstein distance when these inequalities are satisfied.
In the next three results, we show that the tree-based transport inequalities will eventually be satisfied, and provide an initial expression for what the Wasserstein distance will be when the tree-based transport inequalities are satisfied. Later, we will calculate exactly what this expression for the Wasserstein distance evaluates to.
We begin by showing in the next two results that, analogously to before, and lie on the interior of the region of distributions that satisfy the inequalities.
Lemma 5.1.
Suppose we have a bipartite graph with sides and and a distribution such that for and for . Then pick an arbitrary spanning tree T and -monotone ordering on . Consider the tree-based transport plan . We have that after each step for , for with , we have that and for with we have that .
Proof.
We note that this is nearly the same as Lemma 4.8, but differs by a constant factor of . Given the distribution , we know by Lemma 4.8 that after all steps for , for with , we have that and for with we have that . We also know that by Lemma 4.3 so this means for with , we have that and for with , we have that . Dividing both sides by 2, we get that after all steps for , for with , we have that and for with , we have that . ∎
Corollary 5.2.
For any Guvab where and , we have that and lie strictly on the interior of the region of distributions that satisfy the tree-based transport inequalities .
Proof.
We note that when and , we have that so for all , we have that . We also know that if has sides and with , for all we have that and for all we have that . Similarly, for all we have that and for all we have that . Thus for all we have that and for all we have that . Also .
We prove the claim for - by symmetry it will hold for as well since . If the sides of are and , then for and for . Then we have that for all such that , the product . We also have that by Lemma 5.1, holds for all and for all . ∎
We now know that and are on the interior of the region satisfying the inequalities. We can hence proceed similarly to section 4 to show that will eventually satisfy the inequalities and thus will be the sum of positive mass.
Corollary 5.3.
For any Guvab where and , there exists such that for all ,
Proof.
We know by Corollary 5.2 that and lie on the interior of . Therefore, as in the proof of Theorem 4.11, by the formal definition of a limit there exists some such that for all , we have that and thus are satisfied. We note that Corollary 4.6 holds for any Guvab, including the ones we are currently inspecting, so if the tree-based transport inequalities are satisfied, . Thus for all , we have that ∎
We now know that eventually, the Wasserstein distance will be the sum of positive mass, so it remains to calculate the sum of positive mass. To do this, we will first need to define an auxiliary Markov chain and prove some properties of this Markov chain.
Definition 5.4.
Let be a two-state Markov chain with states and , where we start at , and at all times we have an chance of staying at our current state and a chance of switching to the other state. Then define to be the probability distribution after steps of this Markov chain.
Lemma 5.5.
For the Markov chain defined above, and .
Proof.
We will proceed by induction on , using the transition probabilities to go from to .
Base case: When , we know that, since the Markov chain starts at , we have and .
Inductive step: Suppose and . We know that
Similarly,
∎
We now have all the tools we need to explicitly calculate the sum of positive mass. The next lemma tells us what the sum of positive mass will be.
Lemma 5.6.
When and , there exists some such that for all , we either have that
or that
Proof.
We know is bipartite; say it has sides and . We know . Assume without loss of generality that . If is on side , then eventually for , we have that gets arbitrarily close to and for , we have that gets arbitrarily close to . In particular, for some , for all we have if and only if . Then for some , for all , when , the total positive mass of is . Similarly, for some , for all we have if and only if . Thus, for some , for all , when , the total positive mass of is .
By an analogous argument, when , for some , for all , the total positive mass of is . Similarly, for some , for all , the total positive mass of is .
Thus, to calculate what the sum of positive mass eventually equals, we simply consider how much mass of and is on each side of the bipartite graph so that we know how much mass of is on each side. We note that for any random walk with laziness , at all steps the mass on a given side has a probability of staying on that side and a probability of moving to the other side, since any mass that moves along an edge moves to the other side. Thus the mass of on and behaves identically to the mass of on and . In other words, the amount of mass of on is and the amount of mass of on is by Lemma 5.5. Similarly, if , then the amount of mass of on is and the amount of mass of on is . By symmetry, if , then the amount of mass of on is and the amount of mass of on is .
This means that if ,
and
Then the total positive mass of is
and the total positive mass of is
Thus, for some , the sum of the positive mass of is for all .
If , then we have that and we have that . By calculating this out analogously to above, we see that if there exists some such that the sum of positive mass of is for all . ∎
We now know that the Wasserstein distance will be the sum of positive mass, and we know exactly what the sum of positive mass will eventually be. Thus, we know exactly what the Wasserstein distance will eventually be. The next theorem therefore states explicitly the rate of convergence of the Wasserstein distance when and .
Theorem 5.7.
For any Guvab where and , for some it will be true that for all , we have that .
Proof.
Finally, we want to characterize when the Wasserstein distance is eventually constant when and . This will fit into our larger characterization of eventual constancy for all Guvabs with .
Corollary 5.8.
When and , we have that if and only if .
Proof.
This follows directly from Theorem 5.7. ∎
6 Convergence when
In this section we consider the case of Guvabs where and . Recall that these are exactly the Guvabs enumerated in Theorem 3.4 for which . We start by showing that the rate of convergence of is exponential when it is not eventually constant. By an analogous argument, the rate of convergence of is exponential when it is not eventually constant. We will then investigate exactly when is eventually constant.
Theorem 6.3 states that unless it is eventually constant, the rate of convergence of is exponential, and in particular . We go about proving this by showing in the next two lemmas that must be one of finitely many expressions, all of which are approximately some exponential.
The next lemma shows that must be one of finitely many expressions.
Lemma 6.1.
For any Guvab , there exists a finite set of 1-Lipschitz functions such that for all there exists such that
Proof.
We consider the set of possible 1-Lipschitz functions on the graph such that (any other 1-Lipschitz function can be transformed into such a 1-Lipschitz function by adding some value to all entries). The criteria for a function to be a 1-Lipschitz function are that for each pair of vertices and we have that and . We also have that . These each form hyperplanes in . Additionally, from these criteria we know that none of the entries of can be more than because then since the max distance between any two vertices is there would be no negative entries. Thus, the set of 1-Lipschitz functions forms a closed set bounded by a polytope in . For any cost function on the graph , we have that is a linear function on . Thus is one of the corners of the polytope. There are finitely many of these corners, corresponding to finitely many 1-Lipschitz functions . We also know that , so it is thus maximizing the cost function , and thus for all there exists such that . ∎
We now know that will be one of finitely many expressions. The next lemma shows that each of these expressions is approximately exponential.
Lemma 6.2.
For any Guvab for which and any 1-Lipschitz function , there exists some and some constant such that
unless there exists some such that for all we have that .
Proof.
Assume that there does not exist any such that for all we have that .
We know by Lemma 2.10 that for all vertices , there exist some constants such that for all ,
where in the last sum the are all distinct positive constants (by combining like terms in the sum with terms to get a sum with terms). Then
Thus, there exist constants such that Let (this is well-defined since if it wasn’t well defined we would have for all ). Let be the constant corresponding to this . Then
where . Thus we have that
∎
We now have all the pieces we need to show that is approximately some exponential. The following theorem finishes off the proof.
Theorem 6.3.
For any Guvab for which and is not eventually constant, we have that there exists some and some , such that .
Proof.
By Lemma 6.1, there exists some set of 1-Lipschitz functions such that for all there exists such that . Furthermore, by Lemma 6.2, for each of these there exists some and some positive constant such that , unless there exists some such that for all we have that . If for all , there exists some such that for all we have , then we have that is eventually constant at 0. Otherwise, let be the set of functions for which is well-defined. Then let be . Let be the set of such that , and let be . Finally, let be the set of such that and . Then for all such that , there exists some such that for all we have
Thus, since must be the output of some 1-Lipschitz function, there exists some such that for all we have for some , since it cannot be the output of any 1-Lipschitz function . However, for all , we have that . Thus for all , we have that . Hence . ∎
Remark 6.4.
Analogously, for any Guvab for which and is not eventually constant, we have that there exists some and some , such that .
We now seek to explicitly characterize all the cases where and is eventually constant. We start by understanding why we only need to consider the first few terms of to characterize all of these cases.
Lemma 6.5.
When , if there exists some such that is a constant sequence, then is also a constant sequence.
Proof.
By Lemma 2.10, if we let the distinct eigenvalues of the transition matrices be , then for any vertex and for any we can write for some constants . Note that if is a constant sequence, for all . Thus for any vertex , we will have for all .
Suppose that for some , we have that and are nonzero. Then let be the set of all for which and are nonzero. Then let . If there is only one such that , then for some , for all we will have that so the left-hand-side term will dominate and will be nonzero. Then . If there is more than one such that , then those two s will be and , since those are the only two numbers with absolute value . We know that will stay the same sign regardless of , while will switch sign with parity. Thus, for one of the parities, and will have the same sign. Thus, for some , either for all even or for all odd , we will have that , so the left-hand-side term will dominate and will be nonzero. Then . Thus, we must have for all that either or is 0.
This means that for all , either or is 0. Thus, for all , we have that . Therefore , so will be 0 at all vertices, so for all . ∎
With this lemma established, we proceed to characterize all the cases when the Wasserstein distance is eventually constant in the case where .
Theorem 6.6.
When , we have that is eventually constant if and only if one of the following holds:
-
•
and ,
-
•
, the edge , and if the edge were removed from then would have ,
-
•
and .
Proof.
We know by Lemma 6.5 that if and is eventually always 0, then and . Let be . Recall that is the transition matrix for and is the transition matrix for . Further recall that and . Then we have , so . Then .
If then dividing out by , we get . If such a exists, it must be the stationary distribution . Then . However, by definition of , and . Thus, we cannot have , so we cannot have .
This means we have that .
We also know that, given that , if and , then for all so is eventually always 0.
It therefore suffices to characterize the cases where and and . We first note that if and , we are done. Otherwise, we assume that and casework on the values of to determine which cases yield and .
If , we need that and have the same neighbor set, since if had some neighbor that was not adjacent to then would have nonzero mass at and would not. We will also show that this is a sufficient condition. If and have the same neighbor set then . For each neighbor of and , we have that and for all other vertices , we have that . Thus . We also know that by Theorem 3.4 since and for any neighbor of , the path has an even number of steps.
If , we first note that we need and to be adjacent, since and if and are not adjacent. When and are adjacent we have that , so since we have , which yields . We also note that, similarly to before, aside from the edge we have that and need to have the same set of neighbors because if there was some vertex such that and , then would have nonzero mass at and would not. We will finish by showing that if satisfy these conditions, then and .
Suppose that the conditions are satisfied. We know that , so and similarly . We also know that for all such that and , we have that and for all other vertices , we have that . Thus . Also, by Theorem 3.4 since . ∎
7 Convergence when
We next consider the case of Guvabs where . Similarly to the case, we show that the rate of convergence is exponential unless the distance is eventually constant. Furthermore, when the Wasserstein distance is eventually constant, it is constant after exactly 1 step.
We first show that the rate of convergence is exponential unless the distance is eventually constant.
Lemma 7.1.
Consider a Guvab where . Either is eventually constant, or for some and some , we have that . Also, either is eventually constant, or for some and some , we have that .
Proof.
When , we know that for some constants and . Using the same reasoning as in the proof of Lemma 6.2, we know that (unless is eventually constant) for some . We also know that (unless is eventually constant) for some . Thus, we attain the desired result. ∎
We now show that if the distance is eventually constant, it is constant after 1 step.
Lemma 7.2.
When , if there exists some such that is a constant sequence, then is also a constant sequence.
Proof.
When , we have that is for some constants and . Thus, for similar reasons as in the proof of Lemma 6.5, all the for are 0 so is constant. ∎
When , using a lemma similar to Lemma 7.2 we were able to explicitly characterize exactly when was eventually constant. Lemma 7.2 provides an important step towards making a similar characterization when . To exemplify how a characterization could be made when , we provide a family of examples of Guvabs where is eventually constant.
Definition 7.3.
We define a Gluvab to be a Guvab that satisfies all of the following conditions:
-
•
,
-
•
,
-
•
if , then for all we have that ,
-
•
if , then for exactly half of the neighbors we have that , and for exactly the other half we have that .
Example 7.4.
Consider a Guvab with (where is the path graph with vertices), is the vertex of with degree 2, is either of the other two vertices, , and . One can check that this Guvab is a Gluvab.
Lemma 7.5.
Any Gluvab satisfies .
Proof.
We aim to prove this lemma by essentially reducing each Gluvab to a random walk on a path graph. In particular, each vertex in the path corresponds to the set of vertices at a given distance from . After this, the desired result follows without much difficulty.
Construct the Markov chain that is simply a random walk with laziness on a path of length with vertices . We let the starting point of this Markov chain be . It suffices to show that for all , we have that , because that would mean that the distribution is always symmetric about so always has the same average distance .
We will show by induction on that for all ,
Base case: At , we have that is only nonzero at and that is only nonzero at , so the claim holds.
Inductive step: We suppose that this claim holds for . We will show that it holds for . We know the following facts about :
-
•
,
-
•
,
-
•
,
-
•
,
-
•
for , we have that
We now examine , and in particular the amount of mass of at each level. We let denote the mass of at the th level; in other words,
For all , we can calculate by considering the th level and considering how much mass from each level from goes to the th level. This is possible because all vertices at the same level will have indistinguishable behavior with respect to their contribution to the th level. By calculating the contribution of each different level to the th level, we can check that
-
•
,
-
•
,
-
•
,
-
•
,
-
•
for , we have that
This lines up exactly with our characterization of , so for all we have
∎
8 Main Convergence Theorems
Since we have shown that all Guvabs have or or or , and we have some understanding of the rate of convergence of the Wasserstein distance in each of these cases, we make some general statements about convergence that apply to all Guvabs. The following theorems sum up the general convergence results obtained from considering the each of the cases , , and in the previous sections.
The first theorem states that the rate of convergence of and is exponential unless it is eventually constant.
Theorem 8.1.
For any Guvab, we have that
-
•
either is eventually constant, or there exists a constant and a positive constant such that ,
-
•
either is eventually constant, or there exists a constant and a positive constant such that
Proof.
To begin, note that when , we have that converges and by Corollary 3.12. Further, when , Lemma 7.1 implies exactly that the desired result holds. Thus, it suffices to consider each of these cases , , and separately.
First, when Theorem 4.11 implies is eventually constant (and hence, we have the same for and ). This gives the desired result in the case .
When Theorem 5.7 implies that either and is eventually constant or else (and hence, we have the same for and ). This gives the desired result for
The second theorem provides a characterization of when is eventually constant when .
Theorem 8.2.
When , we have that is eventually constant if and only if one of the following holds:
-
•
, the graph is bipartite, and is odd,
-
•
and , and is bipartite,
-
•
and ,
-
•
, the edge , and if the edge were removed from then would have ,
-
•
and .
Proof.
To begin, note that when , we have that converges and by Corollary 3.12. Thus, it suffices to consider each of these cases where , and separately.
First, we look at the case where . Note that, in this case, Theorem 4.11 implies that is always eventually constant. Further, by Theorem 3.9, we see this case is equivalent to and . Further, by Theorem 3.4, this case occurs exactly when , the graph is bipartite, and is odd (i.e., the first item of the theorem statement).
Next, when , Corollary 5.8 implies is eventually constant exactly when . By Theorem 3.9, this case occurs exactly when and , and is bipartite (i.e., the second item of the theorem statement).
Finally, when , we see that Theorem 6.6 implies is eventually constant exactly when one of the following holds:
-
•
and ,
-
•
, the edge , and if the edge were removed from then would have ,
-
•
and .
Note that, each of these cases is indeed a case where by Theorem 3.4, so this case is equivalent to the final three items of the theorem statement.
Thus, considering each of these cases together, we obtain the desired result. ∎
9 Open Problems
The theorems presented in this paper open up several new questions and directions for further research, which the reader is invited to consider. Specifically, given Theorem 8.1, the remaining questions regarding the behavior of Guvabs can be broken into three main categories: 1) determining when and are eventually constant, 2) in cases and are eventually constant, determining how long they take to become constant, and 3) determining and when and are not eventually constant. In this section, we break down what we have shown and what is left to be done regarding each of these questions.
By Theorem 8.2, we have characterized the cases where is constant in all cases where . Furthermore, in the cases of and , we know that is eventually constant if and only if is eventually constant, and similarly is eventually constant if and only if is eventually constant. In the case of , it remains to characterize the cases where either or individually are eventually constant, but is not. Further, in the case we lack a complete characterization of when is eventually constant.
Question remains largely unanswered and is a promising direction for future work. The progress so far in this paper is restricted to fairly weak upper and lower bounds when , and characterizations of when is eventually constant when and . One interesting problem is that of tighter bounds for the case where , and similar bounds for the case when and is eventually constant. Also, depending on the answers to Question 1, there may be Guvabs where only one of and is eventually constant. If we find a specific Guvab that satisfies these criteria, it will be interesting to determine how long this Guvab takes to have either or be eventually constant.
Answering question will require specific knowledge of eigenvectors and eigenvalues. In full generality, this is difficult, so a potential direction for future work would be addressing it in specific examples.
10 Acknowledgements
We would like to thank our mentor, Pakawut Jiradilok, for providing us with important knowledge, guidance, and assistance throughout our project. We would also like to thank Supanat Kamtue for the problem idea and helpful thoughts and guidance. Finally, we would like to thank the PRIMES-USA program for making this project possible.
References
- [BCL+18] David P Bourne, David Cushing, Shiping Liu, F Münch, and Norbert Peyerimhoff. Ollivier–ricci idleness functions of graphs. SIAM J. Discrete Math., 32(2):1408–1424, 2018.
- [CK19] David Cushing and Supanat Kamtue. Long-scale ollivier ricci curvature of graphs:. Anal. Geom. Metr. Spaces, 7(1):22–44, 2019.
- [CKK+20] David Cushing, Supanat Kamtue, Jack Koolen, Shiping Liu, Florentin Münch, and Norbert Peyerimhoff. Rigidity of the bonnet-myers inequality for graphs with respect to ollivier ricci curvature. Adv Math, 369:107188, 2020.
- [DS91] Persi Diaconis and Daniel Stroock. Geometric bounds for eigenvalues of markov chains. Ann Appl Probab, pages 36–61, 1991.
- [FZM+15] Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, and Tomaso Poggio. Learning with a wasserstein loss. arXiv preprint arXiv:1506.05439, 2015.
- [JK21] Pakawut Jiradilok and Supanat Kamtue. Transportation distance between probability measures on the infinite regular tree. arXiv preprint arXiv:2107.09876, 2021.
- [Kan06] Leonid V Kantorovich. On the translocation of masses. J. Math. Sci., 133(4):1381–1382, 2006.
- [LLY11] Yong Lin, Linyuan Lu, and Shing-Tung Yau. Ricci curvature of graphs. Tohoku Math. J., 63(4):605–627, 2011.
- [LP17] David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
- [Oll09] Yann Ollivier. Ricci curvature of markov chains on metric spaces. J Funct Anal, 256(3):810–864, 2009.
- [Oll11] Yann Ollivier. A visual introduction to riemannian curvatures and some discrete generalizations. Anal. Geom. Metr. Spaces, 56:197–219, 2011.
- [PC19] Gabriel Peyré and Marco Cuturi. Computational optimal transport. Found. Trends Mach. Learn., 11(5-6):355–607, 2019.
- [RTG00] Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis., 40(2):99–121, 2000.
- [SGR+15] Romeil Sandhu, Tryphon Georgiou, Ed Reznik, Liangjia Zhu, Ivan Kolesov, Yasin Senbabaoglu, and Allen Tannenbaum. Graph curvature for differentiating cancer networks. Sci. Rep., 5(1):1–13, 2015.
- [SGT16] Romeil S Sandhu, Tryphon T Georgiou, and Allen R Tannenbaum. Ricci curvature: An economic indicator for market fragility and systemic risk. Sci. Adv., 2(5):e1501495, 2016.
- [Sin92] Alistair Sinclair. Improved bounds for mixing rates of markov chains and multicommodity flow. Comb. Probab. Comput., 1:351–370, 1992.
- [SJB19] Jayson Sia, Edmond Jonckheere, and Paul Bogdan. Ollivier-ricci curvature-based method to community detection in complex networks. Sci. Rep., 9(1):1–12, 2019.
- [vdHCL+21] Pim van der Hoorn, William J Cunningham, Gabor Lippner, Carlo Trugenberger, and Dmitri Krioukov. Ollivier-ricci curvature convergence in random geometric graphs. Phys. Rev. Res., 3(1):013211, 2021.
- [WJB16] Chi Wang, Edmond Jonckheere, and Reza Banirazi. Interference constrained network control based on curvature. In 2016 American Control Conference (ACC), pages 6036–6041. IEEE, 2016.
- [WX21] JunJie Wee and Kelin Xia. Ollivier persistent ricci curvature-based machine learning for the protein–ligand binding affinity prediction. J Chem Inf Model, 61(4):1617–1626, 2021.