This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the Wasserstein Distance Between kk-Step Probability Measures on Finite Graphs

Sophia Benjamin111North Carolina School of Science and Mathematics, Durham, NC.                       ,                                 Email: sophia.r.benjamin@gmail.com, Arushi Mantri222Jesuit High School, Portland, OR. Email: arushi.mantri@gmail.com, Quinn Perian333Stanford Online High School, Palo Alto, CA. Email: quinn.perian@outlook.com
(October 19, 2021)
Abstract

We consider random walks X,YX,Y on a finite graph GG with respective lazinesses α,β[0,1]\alpha,\beta\in[0,1]. Let μk\mu_{k} and νk\nu_{k} be the kk-step transition probability measures of XX and YY. In this paper, we study the Wasserstein distance between μk\mu_{k} and νk\nu_{k} for general kk. We consider the sequence formed by the Wasserstein distance at odd values of kk and the sequence formed by the Wasserstein distance at even values of kk. We first establish that these sequences always converge, and then we characterize the possible values for the sequences to converge to. We further show that each of these sequences is either eventually constant or converges at an exponential rate. By analyzing the cases of different convergence values separately, we are able to partially characterize when the Wasserstein distance is constant for sufficiently large kk.

Keywords— Wasserstein distance, transportation plan, Guvab, random walk, kk-step probability distribution, laziness, convergence, finite graph

1 Introduction

Optimal transport theory concerns the minimum cost, called the transportation distance, of moving mass from one configuration to another. In this paper, the notion of transportation distance that we are concerned with is the L1L^{1} transportation distance, which we refer to as the Wasserstein distance. The Wasserstein distance has applications in fields such as image processing, where a goal is to efficiently transform one image into another (e.g., [RTG00]), and machine learning, where a goal is to minimize some transport-related cost (e.g., [FZM+15]).

The application of Wasserstein distance that motivates this paper is the definition of α\alpha-Ricci curvature κα\kappa_{\alpha} on graphs introduced by Lin, Lu, and Yau in [LLY11]:

κα=1W(mxα,myα)d(x,y).\kappa_{\alpha}=1-\frac{W(m_{x}^{\alpha},m_{y}^{\alpha})}{\textrm{d}(x,y)}.

Here d(x,y)\textrm{d}(x,y) is the graph distance between vertices xx and yy, while mvαm_{v}^{\alpha} is the 1-step transition probability measure of a random walk starting at vertex vv with laziness α\alpha, and W(mxα,myα)W(m_{x}^{\alpha},m_{y}^{\alpha}) is the Wasserstein distance between mxαm_{x}^{\alpha} and myαm_{y}^{\alpha}.

The α\alpha-Ricci curvature is a generalization of classical Ricci curvature, an object from Riemannian geometry that captures how volumes change as they flow along geodesics ([Oll11]). In [Oll09], Ollivier created the Ollivier-Ricci curvature to generalize the idea of Ricci curvature to discrete spaces, such as graphs. The Ollivier-Ricci curvature between XX and YY is defined via the Wasserstein distance between the 11-step transition probability measures of random walks starting at XX and YY. It captures roughly whether the neighborhoods of XX and YY are closer together than XX and YY themselves. The Ollivier-Ricci curvature is well-studied in geometry and graph theory ([JK21], [CK19], [BCL+18], [CKK+20], [vdHCL+21]), and is also used to study economic risk, cancer networks, and drug design, among other applications ([SGR+15], [SGT16], [SJB19], [WJB16], [WX21], [JK21]). Lin, Lu, and Yau further generalized the Ollivier-Ricci curvature to α\alpha-Ricci curvature ([LLY11]), allowing for the laziness α\alpha of the random walks considered to be greater than zero.

In [Oll09], Ollivier suggested exploring Ollivier-Ricci curvature on graphs at “larger and larger scales.” Thus, in this paper, we study the Wasserstein distance between kk-step probability measures of random walks with potentially nonzero laziness as kk gets larger and larger. Since 11-step probability distributions of random walks were used to study the initial “small-scale” α\alpha-Ricci curvature, these kk-step probability distributions are a natural way to understand curvature at “larger and larger scales.” Jiradilok and Kamtue ([JK21]) study these kk-step distributions for larger and larger kk on infinite regular trees; in this paper, we study them instead on finite graphs.

Given a finite, connected, simple graph, we consider a random walk with starting vertex ww and laziness α\alpha. The random walk is defined to be a Markov chain where at each step, we either stay at the current vertex with probability α\alpha or pick a neighboring vertex at random and move there. We then consider the probability distribution encoding the likelihood of being at each possible vertex after kk steps of this random walk, which is called a kk-step probability distribution, or kk-step probability measure.

Given two such random walks on one graph, starting at vertices u,vu,v and with respective lazinesses α,β\alpha,\beta, we define the Wasserstein distance between their two kk-step probability measures to be the minimum cost of moving between the two distributions. Here, moving 1 unit of mass across 1 edge costs 1 unit.

We can ask many questions about the Wasserstein distance at “larger and larger scales.” For instance, does the Wasserstein distance between the two kk-step probability distributions always converge as kk\to\infty? Also, what does it converge to in different cases? Even more interestingly, what can we say about the rate of convergence? In particular, when does the distance eventually remain constant, and how long could it take to reach constancy?

In this paper, we show in all cases that either the Wasserstein distance converges or the Wasserstein distance at every other step converges. We also classify what the distance converges to in all cases, addressing the first and second questions.

We then seek to understand the rate of convergence of the Wasserstein distance. We reach two main results. First, addressing the third question, we show that unless the Wasserstein distance at every other step is eventually constant, its rate of convergence is exponential (Theorem 8.1). We also address the fourth question by providing a partial characterization of exactly when the Wasserstein distance is eventually constant (Theorem 8.2).

In Section 2, we provide formal definitions of key concepts used throughout the paper. In particular, we recall the definition of the Wasserstein distance and introduce the notion of a Guvab. A Guvab refers to a pair of random walks on a finite connected simple graph, and these Guvabs are the primary object we study in this paper. In Section 3, we classify for all possible Guvabs the limiting behavior of the Wasserstein distance, when the distance converges, and what the distance converges to. This characterization provides a natural way to classify the Guvabs into four categories based on their limiting behavior: W=1W=1; W=0W=0; W=12W=\frac{1}{2}; and β=1\beta=1. In each of Sections 4, 5, 6, and 7, we consider one of these four categories of Guvabs and determine when the Wasserstein distance is eventually constant as well as examine the rate of convergence if the Wasserstein distance is not constant. Along the way, we encounter various interesting results about the different cases. Finally, in Section 8, we present main results about constancy and rate of convergence in general, obtained by considering each of these four cases individually.

2 Preliminaries

We begin with several formal definitions that we use in the remainder of the paper. We start by recalling graph theory terminology and the definition of Wasserstein distance on graphs. Then, we review random walks on graphs and define Guvabs. Finally, we briefly discuss terminology used to describe convergence.

In this paper, all graphs we consider are finite, connected, simple graphs. For a graph GG, let V(G)V(G) be the vertex set of GG and E(G)E(G) be the edge set of GG, i.e., the set of unordered pairs {v1,v2}\{v_{1},v_{2}\} where v1,v2v_{1},v_{2} are adjacent vertices in GG. Further, for any vV(G)v\in V(G), let N(v)N(v) be the neighbor set of vv. Finally, denote by d(w1,w2)\textrm{d}(w_{1},w_{2}) the graph distance between vertices w1w_{1} and w2w_{2}.

Definition 2.1.

Define a distribution on the graph GG to be a function μ:V(G)\mu:V(G)\to\mathbb{R}. We say μ\mu is a nonnegative distribution if, for all vV(G)v\in V(G), we have μ(v)0\mu(v)\geq 0. A nonnegative distribution μ\mu is a probability distribution if wV(G)μ(w)=1\sum_{w\in V(G)}\mu(w)=1.

For convenience, we will denote by 0~\tilde{\textbf{0}} the distribution with value 0 at all vertices, (i.e., for all vV(G)v\in V(G), we have 0~(v)=0\tilde{\textbf{0}}(v)=0). In addition, we will refer to a distribution μ\mu for which wV(G)μ(w)=0\sum_{w\in V(G)}\mu(w)=0 as a zero-sum distribution.

Given a graph GG, let {μi}i=0\{\mu_{i}\}_{i=0}^{\infty} be an infinite sequence of distributions. Suppose that f:00f:\mathbb{Z}_{\geq 0}\to\mathbb{Z}_{\geq 0} is a strictly increasing function such that for all vertices wGw\in G, limkμf(k)(w)\displaystyle\lim_{k\to\infty}\mu_{f(k)}(w) exists. Then denote by limkμf(k)\displaystyle\lim_{k\to\infty}\mu_{f(k)} the pointwise limit. Namely, for all wV(G)w\in V(G), let (limkμf(k))(w)\displaystyle\left(\lim_{k\to\infty}\mu_{f(k)}\right)(w) be limk(μf(k)(w))\displaystyle\lim_{k\to\infty}(\mu_{f(k)}(w)).

For a given graph GG, let D=D(G)D=D(G) be the set of all ordered pairs (μ,ν)(\mu,\nu) of distributions on V(G)V(G) that satisfy wV(G)μ(w)=wV(G)ν(w)\sum_{w\in V(G)}\mu(w)=\sum_{w\in V(G)}\nu(w). Further, let D0D_{\geq 0} be the set of all ordered pairs (μ,ν)D(\mu,\nu)\in D with μ,ν\mu,\nu nonnegative distributions.

We now introduce some terminology from optimal transport theory. We follow definitions equivalent to those in the book of Peyre and Cuturi [PC19].

In Definitions 2.22.3, and 2.4, we let GG be a graph with two nonnegative distributions μ,ν\mu,\nu on V(G)V(G) such that (μ,ν)D0(\mu,\nu)\in D_{\geq 0}.

Definition 2.2 (c.f. [PC19]).

Define a transportation plan from μ\mu to ν\nu for (μ,ν)D0(\mu,\nu)\in D_{\geq 0} to be a function Tμ,ν:V(G)×V(G)T_{\mu,\nu}:V(G)\times V(G)\to\mathbb{R} such that

  • for any vertices w1,w2V(G)w_{1},w_{2}\in V(G), we have that Tμ,ν(w1,w2)0T_{\mu,\nu}(w_{1},w_{2})\geq 0,

  • for all vertices wV(G)w\in V(G), we have that iV(G)Tμ,ν(w,i)=μ(w)\sum_{i\in V(G)}T_{\mu,\nu}(w,i)=\mu(w),

  • for all vertices wV(G)w\in V(G), we have that iV(G)Tμ,ν(i,w)=ν(w)\sum_{i\in V(G)}T_{\mu,\nu}(i,w)=\nu(w).

Denote by 𝒯μ,ν\mathcal{T}_{\mu,\nu} the set of all transportation plans from μ\mu to ν\nu.

Following [Kan06], we can intuitively visualize a transportation plan Tμ,νT_{\mu,\nu} as a way to move mass distributed over the vertices of GG according to μ\mu along the edges of GG to an arrangement according to ν\nu. We now consider the cost of a given transportation plan Tμ,νT_{\mu,\nu}: if moving 1 unit of mass across 1 edge has a cost of 1, how much does it cost to move the mass distribution of μ\mu to that of ν\nu according to Tμ,νT_{\mu,\nu}?

Definition 2.3 (c.f. [PC19]).

Define the cost function C:𝒯μ,νC:\mathcal{T}_{\mu,\nu}\to\mathbb{R} to take any transportation plan TT to its cost

C(T)=(w1,w2)V(G)×V(G)d(w1,w2)T(w1,w2).C(T)=\sum_{(w_{1},w_{2})\in V(G)\times V(G)}\textrm{d}(w_{1},w_{2})\cdot T(w_{1},w_{2}).
Definition 2.4 (c.f. [PC19]).

Define the Wasserstein distance W0:D00W_{\geq 0}:D_{\geq 0}\to\mathbb{R}_{\geq 0} by W0(μ,ν):=minT𝒯u,vC(T)W_{\geq 0}(\mu,\nu):=\displaystyle\min_{T\in\mathcal{T}_{u,v}}C(T)

We can thus interpret the Wasserstein distance as the minimum cost of transporting mass from its arrangement in distribution μ\mu to an arrangement in distribution ν\nu.

Remark 2.5.

Note that for any distribution ψ\psi on V(G)V(G), if μ\mu, ν\nu, μ+ψ\mu+\psi, and ν+ψ\nu+\psi are all nonnegative, then W0(μ,ν)=W0(μ+ψ,ν+ψ)W_{\geq 0}(\mu,\nu)=W_{\geq 0}(\mu+\psi,\nu+\psi) (for a proof, see for example [JK21], which notes that the Wasserstein distance between μ\mu and ν\nu can be defined in terms of μν\mu-\nu).

Let GG be a graph with two distributions μ,ν\mu,\nu on V(G)V(G) such that

wV(G)μ(w)=wV(G)ν(w).\sum_{w\in V(G)}\mu(w)=\sum_{w\in V(G)}\nu(w).

Let ψ\psi be a distribution such that μ+ψ\mu+\psi and ν+ψ\nu+\psi are both nonnegative. We extend the domain of the Wasserstein distance to include distributions μ,ν\mu,\nu with negative entries by defining W(μ,ν):D0W(\mu,\nu):D\to\mathbb{R}_{\geq 0} to be W0(μ+ψ,ν+ψ)W_{\geq 0}(\mu+\psi,\nu+\psi). By Remark 2.5, W(μ,ν)W(\mu,\nu) is well-defined.

Even if μ\mu and ν\nu have negative entries, we can interpret W(μ,ν)W(\mu,\nu) as the cost of some optimal “transportation plan” that moves mass from distribution μ\mu to distribution ν\nu.

Thus, in the rest of the paper, “transportation plans” between distributions μ\mu and ν\nu allow for negative entries in μ\mu and ν\nu. In this case, a transportation plan rigorously refers to a transportation plan from μ+ψ\mu+\psi to ν+ψ\nu+\psi for some ψ\psi large enough that μ+ψ\mu+\psi and ν+ψ\nu+\psi are both nonnegative. In particular, the movement of mass between μ\mu and ν\nu from a vertex w1w_{1} to a different vertex w2w_{2} actually refers to that same movement of mass from w1w_{1} to w2w_{2} between the distributions μ+ψ\mu+\psi and ν+ψ\nu+\psi.

We now discuss a different way of calculating the Wasserstein distance.

Definition 2.6 (c.f. [PC19]).

Given a graph GG, a 1-Lipschitz function :V(G)\ell:V(G)\to\mathbb{R} is a function on the vertices of G where for any w1,w2V(G)w_{1},w_{2}\in V(G), we have that |(w1)(w2)|d(w1,w2)|\ell(w_{1})-\ell(w_{2})|\leq\textrm{d}(w_{1},w_{2}). Let L(G)L(G) be the set of all 1-Lipschitz functions on GG.

Theorem 2.7 (Kantorovich Duality, c.f. [PC19]).

Let GG be a graph with two distributions μ,ν\mu,\nu on V(G)V(G) such that wV(G)μ(w)=wV(G)ν(w)\sum_{w\in V(G)}\mu(w)=\sum_{w\in V(G)}\nu(w). Then

W(μ,ν)=maxLwG(w)(μ(w)ν(w)).W(\mu,\nu)=\max_{\ell\in L}\sum_{w\in G}\ell(w)(\mu(w)-\nu(w)).

We now seek a way to refer to a pair of random walks on a graph, as these pairs of random walks are the objects we study. The information needed to define such a pair consists of the graph GG, the starting vertices uu and vv of the two random walks, and the respective lazinesses α\alpha and β\beta of the random walks. We thus define a Guvab comprised of this information.

Definition 2.8.

We define a Guvab to be a tuple (G,u,v,α,β)(G,u,v,\alpha,\beta) where GG is a finite, connected, simple graph, u,vV(G)u,v\in V(G), and α,β[0,1]\alpha,\beta\in[0,1] with αβ\alpha\leq\beta.

Definition 2.9.

Consider a graph GG. For any starting vertex uV(G)u\in V(G) and laziness α[0,1]\alpha\in[0,1], consider the random walk R={Rk}k=0R=\{R_{k}\}_{k=0}^{\infty} such that R0=uR_{0}=u, and, for i1i\geq 1, we have Ri=Ri1R_{i}=R_{i-1} with probability α\alpha, and Ri=tR_{i}=t with probability 1αdeg(Ri1)\frac{1-\alpha}{\deg(R_{i-1})} for any tN(Ri1)t\in N(R_{i-1}). We say the probability distribution μk\mu_{k} for RkR_{k} is a k-step probability measure.

Consider some Guvab 𝒢=(G,u,v,α,β)\mathcal{G}=(G,u,v,\alpha,\beta). We let X(𝒢)={Xk}k=0X(\mathcal{G})=\{X_{k}\}_{k=0}^{\infty} be the Markov chain corresponding to a random walk with laziness α\alpha starting from vertex uu and we let Y(𝒢)={Yk}k=0Y(\mathcal{G})=\{Y_{k}\}_{k=0}^{\infty} be the Markov chain corresponding to a random walk with laziness β\beta starting from vertex vv. When it is clear which Guvab 𝒢\mathcal{G} we are referring to, we write X,YX,Y instead of X(𝒢),Y(𝒢)X(\mathcal{G}),Y(\mathcal{G}), respectively.

Consider some Guvab 𝒢=(G,u,v,α,β)\mathcal{G}=(G,u,v,\alpha,\beta). For all k0k\geq 0 we let μk(𝒢),νk(𝒢)\mu_{k}(\mathcal{G}),\nu_{k}(\mathcal{G}) be the k-step probability measures of X(𝒢),Y(𝒢)X(\mathcal{G}),Y(\mathcal{G}) respectively. We let ξk(𝒢)=μk(𝒢)νk(𝒢)\xi_{k}(\mathcal{G})=\mu_{k}(\mathcal{G})-\nu_{k}(\mathcal{G}) and Wk(𝒢)=W(μk(𝒢),νk(𝒢))W_{k}(\mathcal{G})=W(\mu_{k}(\mathcal{G}),\nu_{k}(\mathcal{G})). When it is clear which Guvab 𝒢\mathcal{G} we are referring to, we write μk,νk,ξk,Wk\mu_{k},\nu_{k},\xi_{k},W_{k} instead of μk(𝒢),νk(𝒢),ξk(𝒢),Wk(𝒢)\mu_{k}(\mathcal{G}),\nu_{k}(\mathcal{G}),\xi_{k}(\mathcal{G}),W_{k}(\mathcal{G}), respectively.

Given a Guvab 𝒢\mathcal{G}, we define PαP_{\alpha} and PβP_{\beta} to be the transition probability matrices of XX and YY, respectively. In particular, for all kk, we have that μk=μ0Pαk\mu_{k}=\mu_{0}P_{\alpha}^{k} and νk=ν0Pβk\nu_{k}=\nu_{0}P_{\beta}^{k}, where the distributions are row vectors. We also define PP to be the transition probability matrix of a random walk with zero laziness on GG (note that PP does not depend on the starting vertex of the random walk). We note that PαP_{\alpha} and PβP_{\beta} only depend on α\alpha and β\beta, not uu and vv. In particular, Pα=αI+(1α)PP_{\alpha}=\alpha I+(1-\alpha)P and Pβ=βI+(1β)PP_{\beta}=\beta I+(1-\beta)P.

Lemma 2.10.

Let {λ1,,λn}\{\lambda_{1},\ldots,\lambda_{n}\} be the union of the set of eigenvalues of PαP_{\alpha} and the set of eigenvalues of PβP_{\beta}. For all vertices ww, there exist some constants ciwc^{w}_{i} such that for all k1k\geq 1, we have ξk(w)=i=1nciwλik\xi_{k}(w)=\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k}.

Proof.

This follows from the fact that PαP_{\alpha} and PβP_{\beta} are diagonalizable (since random walks are reversible ([LP17]) and thus have diagonalizable matrices ([LP17], Chapter 12)). Say PαP_{\alpha} has eigenvalues λ1,,λm\lambda_{1},\ldots,\lambda_{m} and PβP_{\beta} has eigenvalues λm+1,,λm\lambda_{m+1},\ldots,\lambda_{m^{\prime}}. Since PαP_{\alpha} is diagonalizable, we can write it as ADA1ADA^{-1} for invertible matrix AA and diagonal matrix DD with diagonal entries λ1,,λm\lambda_{1},\ldots,\lambda_{m}.

Then μk=μ0Pαk=μ0ADkA1\mu_{k}=\mu_{0}P_{\alpha}^{k}=\mu_{0}AD^{k}A^{-1}, so for all ww there exist constants x1w,,xmwx^{w}_{1},\ldots,x^{w}_{m} such that for all k1k\geq 1 we have μk(w)=i=1mxiwλik\mu_{k}(w)=\sum_{i=1}^{m}x^{w}_{i}\lambda_{i}^{k}. By similar reasoning, for all ww there exist constants ym+1w,,ymwy^{w}_{m+1},\ldots,y^{w}_{m^{\prime}} such that for all k1k\geq 1 we have νk(w)=i=m+1myiwλik\nu_{k}(w)=\sum_{i=m+1}^{m^{\prime}}y^{w}_{i}\lambda_{i}^{k}. Therefore, for all ww, there exist some constants ciwc^{w}_{i} such that for all k1k\geq 1, we have that ξk(w)=i=1mciwλik\xi_{k}(w)=\sum_{i=1}^{m^{\prime}}c^{w}_{i}\lambda_{i}^{k}. If for any ii and jj we have λi=λj\lambda_{i}=\lambda_{j}, we can collect these like terms and thus create a list of distinct eigenvalues λ1,,λn\lambda_{1},\ldots,\lambda_{n} and constants c1w,cnwc^{w}_{1},\ldots c^{w}_{n} such that for all k1k\geq 1, we have ξk(w)=i=1nciwλik\xi_{k}(w)=\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k}. In particular, λ1,,λn\lambda_{1},\ldots,\lambda_{n} will be exactly the elements of the union of the set of eigenvalues of PαP_{\alpha} and the set of eigenvalues of PβP_{\beta}. ∎

In the next section, we discuss when and how the Wasserstein distance converges, which is related to the convergence of probability distributions of random walks. Since random walks can be viewed as Markov chains, we reference some classical Markov chain theory, using the same definitions as in [LP17]. We also use the following well-known Markov chain theorem.

Theorem 2.11 (c.f. [LP17]).

Suppose that a Markov chain XX is aperiodic and irreducible with probability distributions (μ0,μ1,)(\mu_{0},\mu_{1},\ldots) and stationary distribution π\pi. Then limkμk=π\lim_{k\to\infty}\mu_{k}=\pi.

Finally, in our discussion of convergence, we encounter cases where the Wasserstein distance is eventually constant. To quantify this precisely, we provide the following definition.

Definition 2.12.

We call an infinite sequence {Si}i=0\{S_{i}\}_{i=0}^{\infty} for SiS_{i}\in\mathbb{R} eventually constant if there exists N0N\geq 0 such that for all kNk\geq N, we have that Sk=SNS_{k}=S_{N}.

3 Classifying End Behavior of WkW_{k}

In this section, we seek to enumerate the possible end behaviors of the Wasserstein distance for a Guvab. In particular, we prove results about when the Wasserstein distance converges and what it converges to for different Guvabs. The classification of Guvabs by end behavior paves the way for our later discussion of the rate of convergence of the Wasserstein distance.

We begin with a technical lemma showing that the limit of the Wasserstein distance is the Wasserstein distance of the limit, as we expect.

Lemma 3.1.

Let f:00f:\mathbb{Z}_{\geq 0}\to\mathbb{Z}_{\geq 0} be a strictly increasing function. If limkμf(k)=μ\displaystyle\lim_{k\to\infty}\mu_{f(k)}=\mu and limkνf(k)=ν\displaystyle\lim_{k\to\infty}\nu_{f(k)}=\nu (and, in particular, both limits exist), then

limkW(μf(k),νf(k))=W(μ,ν).\lim_{k\to\infty}W(\mu_{f(k)},\nu_{f(k)})=W(\mu,\nu).
Proof.

Note that, by the triangle inequality,

W(μf(k),νf(k))W(μf(k),μ)+W(μ,ν)+W(ν,νf(k))W(\mu_{f(k)},\nu_{f(k)})\leq W(\mu_{f(k)},\mu)+W(\mu,\nu)+W(\nu,\nu_{f(k)})

and

W(μ,μf(k))+W(μf(k),νf(k))+W(νf(k),ν)W(μ,ν).W(\mu,\mu_{f(k)})+W(\mu_{f(k)},\nu_{f(k)})+W(\nu_{f(k)},\nu)\geq W(\mu,\nu).

This implies that

W(μ,ν)W(μ,μf(k))W(νf(k),ν)\displaystyle W(\mu,\nu)-W(\mu,\mu_{f(k)})-W(\nu_{f(k)},\nu) W(μf(k),νf(k))\displaystyle\leq W(\mu_{f(k)},\nu_{f(k)})
W(μf(k),μ)+W(μ,ν)+W(ν,νf(k)).\displaystyle\leq W(\mu_{f(k)},\mu)+W(\mu,\nu)+W(\nu,\nu_{f(k)}).

However,

limkW(μ,μf(k))=limkW(μμf(k),0)=0\displaystyle\lim_{k\to\infty}W(\mu,\mu_{f(k)})=\lim_{k\to\infty}W(\mu-\mu_{f(k)},0)=0

(and similarly for W(ν,νf(k))W(\nu,\nu_{f(k)})). The above inequality implies that

limkW(μf(k),νf(k))=W(μ,ν),\lim_{k\to\infty}W(\mu_{f(k)},\nu_{f(k)})=W(\mu,\nu),

as desired. ∎

Due to classical Markov chain theory, we expect that in most cases, the probability distributions of both random walks converge to the same stationary distribution, and thus limkWk=0\lim_{k\to\infty}W_{k}=0. The following definition and lemma quantify the stationary distribution that most random walks converge to. The subsequent theorem specifies what the “most cases” in which the distance goes to zero are.

Definition 3.2.

For any graph GG, we define the distribution π\pi to be such that for any iGi\in G, we have πi=deg(i)jGdeg(j).\pi_{i}=\displaystyle\frac{\deg(i)}{\sum_{j\in G}\deg(j)}.

Lemma 3.3.

When 0<α<10<\alpha<1, the k-step probability measure μk\mu_{k} converges to the stationary distribution π\pi.

Proof.

Recall that XX is the Markov chain of the random walk. We have that XX is aperiodic (we can return from a vertex to itself in one step) and irreducible (GG is connected). We have that for any vertex wGw\in G,

πw=iwπi1deg(i)=απw+iwπi1αdeg(i).\pi_{w}=\displaystyle\sum_{i\sim w}\pi_{i}\frac{1}{\deg(i)}=\displaystyle\alpha\pi_{w}+\sum_{i\sim w}\pi_{i}\frac{1-\alpha}{\deg(i)}.

Thus, π\pi is a stationary distribution of XX. Hence, by Theorem 2.11, π\pi is a limiting distribution for XX and thus limkμk=π\lim_{k\to\infty}\mu_{k}=\pi. ∎

Theorem 3.4.

The value W(μk,νk)W(\mu_{k},\nu_{k}) converges to 0 as kk\to\infty if and only if one of the following conditions is true:

  • 0<αβ<10<\alpha\leq\beta<1,

  • α=β=1\alpha=\beta=1 and u=vu=v,

  • GG is not bipartite and 0=αβ<10=\alpha\leq\beta<1,

  • α=β=0\alpha=\beta=0 and there exists a path from uu to vv with an even number of steps.

Proof.

Note that for 0<αβ<10<\alpha\leq\beta<1 , we have by Lemma 3.3 that

limkμk=limkνk=π.\lim_{k\to\infty}\mu_{k}=\lim_{k\to\infty}\nu_{k}=\pi.

Thus, limkW(μk,νk)=0\lim_{k\to\infty}W(\mu_{k},\nu_{k})=0 in this case.

We now consider the cases where α=0\alpha=0 or β=1\beta=1. If β=1\beta=1, then YY stays at vv forever. Thus, in order to have limkW(μk,νk)=0\lim_{k\to\infty}W(\mu_{k},\nu_{k})=0, we need α=1\alpha=1 and u=vu=v. This is sufficient to imply limkW(μk,νk)=0\lim_{k\to\infty}W(\mu_{k},\nu_{k})=0.

It remains to look at the case where α=0\alpha=0 and β<1\beta<1, which we break into subcases based on whether GG is bipartite.

We first tackle the subcase where GG is not bipartite, i.e., GG contains an odd cycle. Since α,β<1\alpha,\beta<1, both X,YX,Y are aperiodic (there is a path from any vertex to itself in both an odd number of steps and an even number of steps via the odd cycle) and irreducible (GG is connected). Thus, limkμk=limkνk=π\lim_{k\to\infty}\mu_{k}=\lim_{k\to\infty}\nu_{k}=\pi and limkW(μk,νk)=0\lim_{k\to\infty}W(\mu_{k},\nu_{k})=0 as before.

Finally, we address the subcase where GG is bipartite with sides S1,S2S_{1},S_{2}. Here, XX is periodic (with period 2), so YY must be periodic as well to have limkW(μk,νk)=0\lim_{k\to\infty}W(\mu_{k},\nu_{k})=0. Thus, β=0\beta=0. If u,vu,v are on different sides of GG, then XX and YY will never be on the same side, so we cannot have limkW(μk,νk)=0\lim_{k\to\infty}W(\mu_{k},\nu_{k})=0. Otherwise, without loss of generality let u,vS1u,v\in S_{1}. Consider the Markov chains X={X2k}k=0X^{\prime}=\{X_{2k}\}_{k=0}^{\infty} and Y={Y2k}k=0Y^{\prime}=\{Y_{2k}\}_{k=0}^{\infty} with vertex set S1S_{1}. Since XX^{\prime} and YY^{\prime} are aperiodic (we can get from a vertex to itself in one step of XX^{\prime} or YY^{\prime} by moving back and forth along the same edge) and irreducible (GG is connected), and they both have the same transition matrix, the Markov chains converge to the same stationary distribution. Similar reasoning applies for {X2k+1}\{X_{2k+1}\} and {Y2k+1}\{Y_{2k+1}\}. This finishes the proof for this case, hence completing the proof of Theorem 3.4. ∎

In the next part of this section, we specify what the stationary distributions look like for any possible Guvab, particularly considering Guvabs with more than one stationary distribution π\pi. We show that all Guvabs either have one set of end behaviors they converge to or switch back and forth between two sets of end behaviors.

Suppose α=0\alpha=0. Let GG be bipartite with sides S1,S2S_{1},S_{2}, and without loss of generality let uS1u\in S_{1}. Let X1={X2k}0X_{1}=\{X_{2k}\}_{0}^{\infty} and X2={X2k+1}0X_{2}=\{X_{2k+1}\}_{0}^{\infty}. Let XiX_{i}^{\prime} denote XiX_{i} restricted to SiS_{i} for i{1,2}.i\in\{1,2\}. For i{1,2}i\in\{1,2\}, let τi\tau_{i}^{\prime} be a distribution on SiS_{i} such that

(τi)w=2deg(w)jGdeg(j)(\tau_{i}^{\prime})_{w}=\frac{2\deg(w)}{\sum_{j\in G}\deg(j)}

for wSiw\in S_{i}. Further, for i{1,2}i\in\{1,2\}, let τi\tau_{i} be a distribution on GG that is τi\tau_{i}^{\prime} on SiS_{i} and has value 0 elsewhere.

Lemma 3.5.

For i{1,2}i\in\{1,2\}, the distribution τi\tau_{i}^{\prime} is the limiting distribution of XiX_{i}^{\prime}.

Proof.

First, we claim that τ1P2=τ1\tau_{1}P^{2}=\tau_{1} and τ2P2=τ2\tau_{2}P^{2}=\tau_{2}, where PP is the transition matrix of XX. Note that for wS2w\in S_{2}, we have

(τ1P)w=iw1deg(i)(τ1)i=iw(1deg(i))(deg(i)jGdeg(j))=deg(w)jGdeg(j)=(τ2)w.\displaystyle(\tau_{1}P)_{w}=\sum_{i\sim w}\frac{1}{\deg(i)}(\tau_{1})_{i}=\sum_{i\sim w}\left(\frac{1}{\deg(i)}\right)\left(\frac{\deg(i)}{\sum_{j\in G}\deg(j)}\right)=\frac{\deg(w)}{\sum_{j\in G}\deg(j)}=(\tau_{2})_{w}.

This is because for all iwi\sim w, we have iS1i\in S_{1}, which implies (τ1)i=deg(i)jGdeg(j)(\tau_{1})_{i}=\frac{\deg(i)}{\sum_{j\in G}\deg(j)}. For wS1w\in S_{1}, we have (τ1P)w=iw1deg(i)(τ1)i=0=(τ2)w\displaystyle(\tau_{1}P)_{w}=\sum_{i\sim w}\frac{1}{\deg(i)}(\tau_{1})_{i}=0=(\tau_{2})_{w}. This is because for all iwi\sim w, we have iS2i\in S_{2}, which implies (τ1)i=0(\tau_{1})_{i}=0. Hence, τ1P=τ2\tau_{1}P=\tau_{2} and, by similar reasoning, τ2P=τ1\tau_{2}P=\tau_{1}. Thus, τ1P2=τ1\tau_{1}P^{2}=\tau_{1} and τ2P2=τ2\tau_{2}P^{2}=\tau_{2} as desired.

Also, we note that wSi(τi)w=wSi2deg(w)jGdeg(j)=2|E(G)|jGdeg(j)=1.\displaystyle\sum_{w\in S_{i}}(\tau_{i}^{\prime})_{w}=\sum_{w\in S_{i}}\frac{2\deg(w)}{\sum_{j\in G}\deg(j)}=\frac{2|E(G)|}{\sum_{j\in G}\deg(j)}=1.

We now see that τi\tau_{i}^{\prime} is a stationary distribution of XiX_{i}^{\prime} for i{1,2}i\in\{1,2\}. Since X1X_{1}^{\prime} and X2X_{2}^{\prime} are irreducible and aperiodic (as shown in the proof of Theorem 3.4), we have that τi\tau_{i}^{\prime} is a limiting distribution of XiX_{i}^{\prime} for i{1,2}i\in\{1,2\}. ∎

Corollary 3.6.

If uS1u\in S_{1}, then as kk\to\infty, we have μ2k\mu_{2k} converges to τ1\tau_{1} and μ2k+1\mu_{2k+1} converges to τ2\tau_{2}. Analogously, if uS2u\in S_{2} then as kk\to\infty, we have μ2k\mu_{2k} converges to τ2\tau_{2} and μ2k+1\mu_{2k+1} converges to τ1\tau_{1}.

Proof.

Suppose uS1u\in S_{1}; the proof will proceed analogously if uS2u\in S_{2}. Then, μ2k\mu_{2k} will always be 0 on S2S_{2} and, by Lemma 3.5, it will converge to τ1\tau_{1}^{\prime} on S1S_{1} because μ2k\mu_{2k} is the probability distribution of X1X_{1}^{\prime} on S1S_{1}. Similarly, μ2k+1\mu_{2k+1} will always be 0 on S1S_{1} and it will converge to τ2\tau_{2}^{\prime} on S2S_{2}. Thus, μ2k\mu_{2k} converges to τ1\tau_{1} and μ2k+1\mu_{2k+1} converges to τ2\tau_{2}. ∎

Corollary 3.7.

For any Guvab, limkξ2k\displaystyle\lim_{k\to\infty}\xi_{2k} and limkξ2k+1\displaystyle\lim_{k\to\infty}\xi_{2k+1} are well-defined.

Proof.

We show that for any μ\mu, we have that limkμ2k\lim_{k\to\infty}\mu_{2k} and limkμ2k+1\lim_{k\to\infty}\mu_{2k+1} are well-defined; this implies the statement of the corollary. When GG is bipartite and α=0\alpha=0, we know that limkμ2k=τ1\lim_{k\to\infty}\mu_{2k}=\tau_{1} and limkμ2k+1=τ2\lim_{k\to\infty}\mu_{2k+1}=\tau_{2} (assuming, without loss of generality, that uS1u\in S_{1}). When α=0\alpha=0 and GG is not bipartite or when 0<α<10<\alpha<1, we have

limkμ2k=limkμ2k+1=π.\lim_{k\to\infty}\mu_{2k}=\lim_{k\to\infty}\mu_{2k+1}=\pi.

Finally, when α=1\alpha=1, we know limkμ2k=limkμ2k+1=𝟙u\lim_{k\to\infty}\mu_{2k}=\lim_{k\to\infty}\mu_{2k+1}=\mathbbm{1}_{u}. This covers all possible cases for α\alpha and GG, so we are done. ∎

For any Guvab, we refer to limkξ2k\displaystyle\lim_{k\to\infty}\xi_{2k} as ξ0\xi^{0} and limkξ2k+1\displaystyle\lim_{k\to\infty}\xi_{2k+1} as ξ1\xi^{1}.

The following corollary is quite important for the rest of this section and the remainder of this paper. Its relevance to this section is that limkWk\lim_{k\to\infty}W_{k} will be well-defined unless limkW2klimkW2k+1\lim_{k\to\infty}W_{2k}\neq\lim_{k\to\infty}W_{2k+1}. The corollary is pertinent to the rest of the paper because it indicates that the rates of convergence of {W2k}\{W_{2k}\} and {W2k+1}\{W_{2k+1}\} are always well-defined. Thus, for any possible Guvab, we can study and state results about the rates of convergence of {W2k}\{W_{2k}\} and {W2k+1}\{W_{2k+1}\}.

Corollary 3.8.

We have that limkW2k\lim_{k\to\infty}W_{2k} and limkW2k+1\lim_{k\to\infty}W_{2k+1} are always well-defined.

Proof.

We know that limkW2k=W(ξ0,0~)\lim_{k\to\infty}W_{2k}=W(\xi^{0},\tilde{\textbf{0}}) and limkW2k+1=W(ξ1,0~)\lim_{k\to\infty}W_{2k+1}=W(\xi^{1},\tilde{\textbf{0}}). ∎

We soon discuss many cases where limkWk\lim_{k\to\infty}W_{k} exists, so we designate a way to refer to this limit. For any Guvab 𝒢\mathcal{G} where limkWk\lim_{k\to\infty}W_{k} exists, we denote by WW the limit limkWk\lim_{k\to\infty}W_{k}.

We can now state and prove our main theorems about whether the Wasserstein distance converges and the values it converges to. For any possible Guvab, Theorem 3.10 allows us to determine whether the Wasserstein distance converges. Furthermore, Theorem 3.9 allows us to, in most cases, quickly and easily determine what value the Wasserstein distance will converge to. Finally, these theorems provide a framework for us to classify the Guvabs into four categories so we can use casework to understand the rate of convergence.

Theorem 3.9.

Unless GG is bipartite, α=0\alpha=0, and β=1\beta=1, we have that W=limkW(μk,νk)\displaystyle W=\lim_{k\to\infty}W(\mu_{k},\nu_{k}) is always well defined, and furthermore

  • W=0W=0 under the conditions specified in Theorem 3.4,

  • W=1W=1 if α=β=0\alpha=\beta=0 and W0W\neq 0,

  • W=12W=\frac{1}{2} if 0=α<β<10=\alpha<\beta<1 and GG is bipartite.

Proof.

The first condition is clear by Theorem 3.4. Next, we look at the case where α=β=0\alpha=\beta=0 and W0W\neq 0. By Theorem 3.4, this corresponds to the case where GG is bipartite and u,vu,v are on opposite sides of GG. Without loss of generality, let uS1u\in S_{1} and vS2v\in S_{2}. Then, as kk\to\infty, we have μ2k\mu_{2k} converges to τ1\tau_{1} and ν2k\nu_{2k} converges to τ2\tau_{2} by Corollary 3.6. Analogously, μ2k+1\mu_{2k+1} converges to τ2\tau_{2} and ν2k+1\nu_{2k+1} converges to τ1\tau_{1}. Thus, limkW(μk,νk)=W(τ1,τ2)\displaystyle\lim_{k\to\infty}W(\mu_{k},\nu_{k})=W(\tau_{1},\tau_{2}). We have that W(τ1,τ2)1W(\tau_{1},\tau_{2})\geq 1 because to get from τ1\tau_{1} to τ2\tau_{2}, we must move all the mass from S1S_{1} across at least one edge to S2S_{2}. Also, W(τ1,τ2)1W(\tau_{1},\tau_{2})\leq 1 because we can achieve a distance of 1 by, for any given edge abab with aS1a\in S_{1} and bS2b\in S_{2}, moving a mass of 2jGdeg(j)\displaystyle\frac{2}{\sum_{j\in G}\deg(j)} from aa to bb.

We now consider the case when 0=α<β<10=\alpha<\beta<1 and GG is bipartite. Without loss of generality, let uS1u\in S_{1}. Since α=0\alpha=0, we have that limkμ2k=τ1\lim_{k\to\infty}\mu_{2k}=\tau_{1} and limkμ2k+1=τ2\lim_{k\to\infty}\mu_{2k+1}=\tau_{2}. Since β>0\beta>0, we have that limkνk=π\lim_{k\to\infty}\nu_{k}=\pi. Thus, we have that limkW(μ2k,ν2k)=W(τ1,π)\displaystyle\lim_{k\to\infty}W(\mu_{2k},\nu_{2k})=W(\tau_{1},\pi) and limkW(μ2k+1,ν2k+1)=W(τ2,π)\displaystyle\lim_{k\to\infty}W(\mu_{2k+1},\nu_{2k+1})=W(\tau_{2},\pi). If we show that W(τ1,π)=W(τ2,π)=12W(\tau_{1},\pi)=W(\tau_{2},\pi)=\frac{1}{2}, we will have shown the third condition. We know that π\pi will have half its mass on S1S_{1} and half its mass on S2S_{2} because

vS1πv=vS1deg(v)jGdeg(j)=|E(G)|jGdeg(j)=12.\sum_{v\in S_{1}}\pi_{v}=\sum_{v\in S_{1}}\frac{\deg(v)}{\sum_{j\in G}\deg(j)}=\frac{|E(G)|}{\sum_{j\in G}\deg(j)}=\frac{1}{2}.

Thus, half the mass must move from S2S_{2} to S1S_{1}, so W(π,τ1)12W(\pi,\tau_{1})\geq\frac{1}{2}. We can also achieve a distance of exactly 12\frac{1}{2} from π\pi to τ1\tau_{1} by, for any given edge abab with aS1a\in S_{1} and bS2b\in S_{2}, moving 1jGdeg(j)\displaystyle\frac{1}{\sum_{j\in G}\deg(j)} mass from bb to aa. Thus, W(π,τ1)=12W(\pi,\tau_{1})=\frac{1}{2} and by an analogous argument, W(π,τ2)=12W(\pi,\tau_{2})=\frac{1}{2}.

We have now considered all cases where α,β<1\alpha,\beta<1 and where α=β=1\alpha=\beta=1. The only case left is where 0<α<10<\alpha<1 and β=1\beta=1. Here, limkμk=π\lim_{k\to\infty}\mu_{k}=\pi and νk=𝟙v\nu_{k}=\mathbbm{1}_{v} where 𝟙v\mathbbm{1}_{v} is the distribution with 1 at vv and 0 elsewhere. Thus, limkW(μk,νk)=W(π,𝟙v)\lim_{k\to\infty}W(\mu_{k},\nu_{k})=W(\pi,\mathbbm{1}_{v}), which is a constant. ∎

Theorem 3.10.

The distance W(μk,νk)W(\mu_{k},\nu_{k}) does not converge as kk\to\infty if and only if GG is bipartite, α=0\alpha=0 and β=1\beta=1, and

wV(G)(1)d(v,w)d(v,w)deg(w)0.\sum_{w\in V(G)}(-1)^{\emph{{d}}(v,w)}\emph{{d}}(v,w)\deg(w)\neq 0.
Proof.

By Theorem 3.9, we know that the only case where it is possible for W(μk,νk)W(\mu_{k},\nu_{k}) not to converge is when GG is bipartite, α=0\alpha=0, and β=1\beta=1. In this case, νk=𝟙v\nu_{k}=\mathbbm{1}_{v}. Additionally, assuming without loss of generality that uS1u\in S_{1}, we have that

limkμ2k=τ1 and limkμ2k+1=τ2.\lim_{k\to\infty}\mu_{2k}=\tau_{1}\text{ and }\lim_{k\to\infty}\mu_{2k+1}=\tau_{2}.

Thus, W(μk,νk)W(\mu_{k},\nu_{k}) converges as kk\to\infty if and only if W(𝟙v,τ1)=W(𝟙v,τ2)W(\mathbbm{1}_{v},\tau_{1})=W(\mathbbm{1}_{v},\tau_{2}).

To calculate W(𝟙v,τ1)W(\mathbbm{1}_{v},\tau_{1}), we note that we must move all the mass of τ1\tau_{1} to vertex vv. To move all the mass at some vertex ww to vv, we necessarily move a mass of (τ1)w(\tau_{1})_{w} over a distance of d(w,v)\textrm{d}(w,v). Thus the total transportation cost, and thus the total Wasserstein distance W(𝟙v,τ1)W(\mathbbm{1}_{v},\tau_{1}), is given by

wGd(v,w)(τ1)w\displaystyle\sum_{w\in G}\textrm{d}(v,w)(\tau_{1})_{w} =wS1d(v,w)2deg(w)jGdeg(j)+wS2d(v,w)0\displaystyle=\sum_{w\in S_{1}}\textrm{d}(v,w)\frac{2\deg(w)}{\sum_{j\in G}\deg(j)}+\sum_{w\in S_{2}}\textrm{d}(v,w)\cdot 0
=2jGdeg(j)wS1d(v,w)deg(w).\displaystyle=\frac{2}{\sum_{j\in G}\deg(j)}\sum_{w\in S_{1}}\textrm{d}(v,w)\deg(w).

By the same reasoning, we have that

W(𝟙v,τ2)=2jGdeg(j)wS2d(v,w)deg(w).W(\mathbbm{1}_{v},\tau_{2})=\frac{2}{\sum_{j\in G}\deg(j)}\sum_{w\in S_{2}}\textrm{d}(v,w)\deg(w).

Given that GG is bipartite, α=0\alpha=0, and β=1\beta=1, we know that the Wasserstein distance converges if and only if W(𝟙v,τ1)W(𝟙v,τ2)=0W(\mathbbm{1}_{v},\tau_{1})-W(\mathbbm{1}_{v},\tau_{2})=0, which is true if and only if

wV(G)(1)d(v,w)d(v,w)deg(w)=0\sum_{w\in V(G)}(-1)^{\textrm{d}(v,w)}\textrm{d}(v,w)\deg(w)=0

since the parity of d(v,w)\textrm{d}(v,w) depends only on the side of GG that ww is on. Thus, the theorem statement follows. ∎

We now present a table summarizing much of the information about convergence discussed in this section.

Conditions on 𝒢\mathcal{G} W=0W=0 W=12W=\frac{1}{2} W=1W=1 W=C0,12,1W=C\neq 0,\frac{1}{2},1 WkW_{k} does not converge
GG bipartite, β=1\beta=1 \checkmark \checkmark \checkmark \checkmark \checkmark
GG bipartite, β<1\beta<1 \checkmark \checkmark \checkmark ×\times ×\times
GG non-bipartite, β=1\beta=1 \checkmark ×\times \checkmark \checkmark ×\times
GG non-bipartite, β<1\beta<1 \checkmark ×\times ×\times ×\times ×\times
Table 1: Is it possible for the Wasserstein distance to converge to particular limits in different cases of conditions on 𝒢\mathcal{G}?
Remark 3.11.

We know that the case of GG bipartite, β=1\beta=1, and W=12W=\frac{1}{2} is possible by considering a star with vv at the center and 0<α<10<\alpha<1. We know that the case of GG non-bipartite, β=1\beta=1, and W=12W=\frac{1}{2} is impossible because in order for it to be possible, limkμk\lim_{k\to\infty}\mu_{k} would need half of its mass to be at vv. Since mass of limkμk\lim_{k\to\infty}\mu_{k} is proportional to degree, every edge would have to be incident to vv, making the graph bipartite.

The following corollary provides a categorization of the Guvabs into four types. In the next four sections of this paper, we examine each of these categories in turn.

Corollary 3.12.

Each Guvab satisfies exactly one of the following four conditions:

  • W=1W=1 and β<1\beta<1,

  • W=12W=\frac{1}{2} and β<1\beta<1,

  • W=0W=0 and β<1\beta<1,

  • β=1\beta=1.

Proof.

If β<1\beta<1, we have W=0W=0 under the conditions in Theorem 3.4 and W=1W=1 or W=12W=\frac{1}{2} otherwise, since the conditions in Theorem 3.9 cover all possible cases where β<1\beta<1 and W0W\neq 0. ∎

If we understand the convergence of the Wasserstein distance in all four of these cases, then we understand the convergence for all Guvabs. The subsequent four sections each discuss the convergence of the Wasserstein distance in one of these cases. Our two main convergence theorems, presented in Section 8, put together the general results obtained by examining these four cases individually.

4 Convergence when W=1W=1

In this section we consider Guvabs with W=1W=1 and β<1\beta<1. Recall that these are exactly the Guvabs for which GG is bipartite, uu and vv are on different sides of the bipartite graph, and α=β=0\alpha=\beta=0. We show that all such Guvabs have a Wasserstein distance that is eventually constant. We also begin to understand how long it takes for the Wasserstein distance to reach constancy.

We first recall that the Wasserstein distance between two distributions μ\mu and ν\nu with potentially negative entries is the cost of an optimal transportation plan for moving mass444As discussed in section 2, the mass of a distribution μ\mu at a vertex ww is μ(w)\mu(w), the value of the distribution at that vertex. from μ\mu to ν\nu. Thus, to prove the eventual constancy of the Wasserstein distance, we construct an algorithm that produces a transportation plan between any two distributions. Then, we show that when certain inequalities are satisfied, this transportation plan has a cost of exactly 1 and is optimal. Finally, we prove that when ξk\xi_{k} is eventually sufficiently close to either of the stationary distributions ξ0\xi^{0} or ξ1\xi^{1}, these inequalities are satisfied.

We start by constructing the algorithm. Pick a spanning tree TT of GG and let LL be the set of leaves of TT. Define a function r:V(G)r:V(G)\to\mathbb{Z} such that r(w)=minLd(w,)r(w)=\min_{\ell\in L}\textrm{d}(w,\ell).

For any finite set SS, let Perm(S)\textrm{Perm}(S) denote the set of all permutations of SS. We say that an rr-monotone ordering 𝒪=(w1,,wn)Perm(V(G))\mathcal{O}=(w_{1},\ldots,w_{n})\in\textrm{Perm}(V(G)) is a permutation of V(G)V(G) such that r(w1),,r(wn)r(w_{1}),\ldots,r(w_{n}) is a non-decreasing sequence.

Definition 4.1.

Given a graph GG, a spanning tree TT of GG, an rr-monotone ordering 𝒪\mathcal{O} and zero-sum distribution ξ\xi, we define the tree-based transport algorithm, which transports mass from ξ\xi to 0~\tilde{\textbf{0}}, to be an (n1)(n-1)-step algorithm in which at the iith step,

  • if the current mass at wiw_{i} is nonnegative, we distribute it evenly among all vwiv\sim w_{i} with indices greater than ii,

  • if the current mass at wiw_{i} is negative, we take an equal amount of mass to vertex wiw_{i} from all vN(wi)v\in N(w_{i}) with indices greater than ii, so that the mass at wiw_{i} is now 0.

In Lemma 4.2, we see that this algorithm produces a valid transportation plan from ξ\xi to 0~\tilde{\textbf{0}}. We refer to this tree-based transportation plan as A(G,T,𝒪,ξ)A(G,T,\mathcal{O},\xi). Given G,TG,T and 𝒪\mathcal{O}, we let Ai(ξ)A_{i}(\xi) denote the distribution of mass on the vertices of GG after ii steps of the algorithm.

Lemma 4.2.

The tree-based transport algorithm on G,T,𝒪,ξG,T,\mathcal{O},\xi always produces a valid transportation plan from ξ\xi to 0~\tilde{\textbf{0}}.

Proof.

After the iith step of the tree-based transport algorithm, the mass at each of the vertices w1,,wiw_{1},\ldots,w_{i} is 0, since the mass at wjw_{j} becomes zero at the jjth step, and thereafter no mass is moved to or from wjw_{j}. Thus, after the (n2)(n-2)th step, the only vertices of GG with nonzero mass will be wn1w_{n-1} and wnw_{n}. Since the total mass sums to 0 and wn1w_{n-1} is adjacent to wnw_{n}, the (n1)(n-1)th step of the algorithm simply moves the positive mass to the negative mass so that all vertices have mass 0. ∎

We now prove a useful property of this algorithm.

Lemma 4.3.

Given a graph GG, tree TT and rr-monotone ordering 𝒪\mathcal{O}, for all ii, we have that AiA_{i} is a linear function on the space of zero-sum distributions.

Proof.

It suffices to show that for any two zero-sum distributions ξ\xi and ξ\xi^{\prime}, we have Ai(ξ+ξ)=Ai(ξ)+Ai(ξ)A_{i}(\xi+\xi^{\prime})=A_{i}(\xi)+A_{i}(\xi^{\prime}). We prove this by induction on ii.

Base case: When i=0i=0, we have that A0(ξ+ξ)=ξ+ξ=A0(ξ)+A0(ξ)A_{0}(\xi+\xi^{\prime})=\xi+\xi^{\prime}=A_{0}(\xi)+A_{0}(\xi^{\prime}).

Inductive step: For the inductive hypothesis, we assume that Ai(ξ+ξ)=Ai(ξ)+Ai(ξ)A_{i}(\xi+\xi^{\prime})=A_{i}(\xi)+A_{i}(\xi^{\prime}). We want to show that Ai+1(ξ+ξ)=Ai+1(ξ)+Ai+1(ξ)A_{i+1}(\xi+\xi^{\prime})=A_{i+1}(\xi)+A_{i+1}(\xi^{\prime}). For any distribution ξ\xi, if nn denotes the number of neighbors of wi+1w_{i+1} with indices greater than i+1i+1, then all of the following are true:

  • Ai+1(ξ)(wi+1)=0A_{i+1}(\xi)(w_{i+1})=0,

  • for wjN(wi+1)w_{j}\in N(w_{i+1}) with j>i+1j>i+1, we have Ai+1(ξ)(wj)=Ai(ξ)(wj)+1nAi(ξ)(wi+1)A_{i+1}(\xi)(w_{j})=A_{i}(\xi)(w_{j})+\frac{1}{n}A_{i}(\xi)(w_{i+1}),

  • for all other vertices ww, we have Ai+1(ξ)(w)=Ai(ξ)(w)A_{i+1}(\xi)(w)=A_{i}(\xi)(w).

Thus, Ai+1(ξ+ξ)(wi+1)=0=0+0=Ai+1(ξ)(wi+1)+Ai+1(ξ)(wi+1)A_{i+1}(\xi+\xi^{\prime})(w_{i+1})=0=0+0=A_{i+1}(\xi)(w_{i+1})+A_{i+1}(\xi^{\prime})(w_{i+1}). For wjN(wi+1)w_{j}\in N(w_{i+1}) with j>i+1j>i+1, we have that

Ai+1(ξ+ξ)(wj)\displaystyle A_{i+1}(\xi+\xi^{\prime})(w_{j}) =Ai(ξ+ξ)(wj)+1nAi(ξ+ξ)(wi+1)\displaystyle=A_{i}(\xi+\xi^{\prime})(w_{j})+\frac{1}{n}A_{i}(\xi+\xi^{\prime})(w_{i+1})
=Ai(ξ)(wj)+1nAi(ξ)(wi+1)+Ai(ξ)(wj)+1nAi(ξ)(wi+1)\displaystyle=A_{i}(\xi)(w_{j})+\frac{1}{n}A_{i}(\xi)(w_{i+1})+A_{i}(\xi^{\prime})(w_{j})+\frac{1}{n}A_{i}(\xi^{\prime})(w_{i+1})
=Ai+1(ξ)(wj)+Ai+1(ξ)(wj).\displaystyle=A_{i+1}(\xi)(w_{j})+A_{i+1}(\xi^{\prime})(w_{j}).

Finally, for all other vertices ww, we have that

Ai+1(ξ+ξ)(w)=Ai(ξ+ξ)(w)=Ai(ξ)(w)+Ai(ξ)(w)=Ai+1(ξ)(w)+Ai+1(ξ)(w).A_{i+1}(\xi+\xi^{\prime})(w)=A_{i}(\xi+\xi^{\prime})(w)=A_{i}(\xi)(w)+A_{i}(\xi^{\prime})(w)=A_{i+1}(\xi)(w)+A_{i+1}(\xi^{\prime})(w).

We have shown Ai+1(ξ+ξ)=Ai+1(ξ)+Ai+1(ξ)A_{i+1}(\xi+\xi^{\prime})=A_{i+1}(\xi)+A_{i+1}(\xi^{\prime}) for all the vertices, so we have proven the inductive step and thus the lemma. ∎

In Definition 4.4 and the subsequent results, we define the inequalities used in conjunction with the tree-based transport algorithm and show that when these inequalities are satisfied, the Wasserstein distance between ξ\xi and 0~\tilde{\textbf{0}} will be 1.

Definition 4.4.

For any graph GG, zero-sum distribution ξ\xi, spanning tree TT, and rr-monotone ordering 𝒪=(w1,wn)\mathcal{O}=(w_{1},\ldots w_{n}) on V(G)V(G), define the tree-based transport inequalities (G,T,𝒪,ξ)\mathcal{I}(G,T,\mathcal{O},\xi) to be the union of the following two sets of inequalities:

  • I1I_{1}: the set of inequalities of the form ξ(wj)Ai(ξ)(wj)>0\xi(w_{j})A_{i}(\xi)(w_{j})>0 for all 0i|V(G)|20\leq i\leq|V(G)|-2 and i<jV(G)i<j\leq V(G),

  • I2I_{2}: the set of inequalities of the form ξ(t)ξ(w)<0\xi(t)\xi(w)<0 for all twt\sim w

Lemma 4.5.

If the tree-based transport inequalities (G,T,𝒪,ξ)\mathcal{I}(G,T,\mathcal{O},\xi) are satisfied, then the cost of the transportation plan is at most the sum of positive mass in ξ\xi, i.e., C(A(G,T,𝒪,ξ))12wG|ξ(w)|.\displaystyle C(A(G,T,\mathcal{O},\xi))\leq\frac{1}{2}\sum_{w\in G}|\xi(w)|.

Proof.

We note that the inequalities in I1I_{1} mean that for any vertex wjw_{j}, the sign of Ai(ξ)(wj)A_{i}(\xi)(w_{j}) stays the same until the mass becomes 0 at the jjth step, at which point it remains 0 for the rest of the algorithm.

Since only positive mass moves, it suffices to show that all the positive mass of ξ\xi moves a distance of at most 1. At each step of the tree-based transport algorithm, any mass that moves must move a distance of exactly 1. Thus, it suffices to show that all mass moves at most one time in A(G,T,𝒪,ξ)A(G,T,\mathcal{O},\xi).

To show this, we demonstrate that the mass that moves at each step of the algorithm has not moved before, since this means all mass moves at most once overall. We begin by demonstrating that every time positive mass moves, it moves from a vertex ww for which ξ(w)>0\xi(w)>0.

The only way for mass to move is via the iith step of the tree-based transport algorithm, which starts from the distribution Ai1(ξ)A_{i-1}(\xi). Suppose the vertices are w1,wnw_{1},\ldots w_{n}. If Ai1(ξ)(wi)A_{i-1}(\xi)(w_{i}) is zero, then no mass moves on the iith step. If Ai1(ξ)(wi)A_{i-1}(\xi)(w_{i}) is positive, then at the iith step, mass moves away from wiw_{i}. In addition, by the inequalities in I1I_{1}, if Ai1(ξ)(wi)A_{i-1}(\xi)(w_{i}) is positive then ξ(wi)\xi(w_{i}) is positive and if Ai1(ξ)(wi)A_{i-1}(\xi)(w_{i}) is negative then ξ(wi)\xi(w_{i}) is negative. Thus, by the inequalities in I2I_{2}, we have ξ(t)>0\xi(t)>0 for all twit\sim w_{i}. If Ai1(ξ)(wi)A_{i-1}(\xi)(w_{i}) is negative, mass moves from these twit\sim w_{i} to wiw_{i}, so all mass movements are from a vertex tt for which ξ(t)>0\xi(t)>0. Thus, in all three of these cases, every time positive mass moves, it moves from a vertex ww for which ξ(w)>0\xi(w)>0.

Thus, consider the iith vertex, call this ww, and suppose that ξ(w)>0\xi(w)>0. Then, by the inequalities in I2I_{2}, we know ξ\xi has negative mass at the neighbors of ww, so throughout all steps of the algorithm, the mass at the neighbors of ww was nonpositive. This means that anytime executing a step for one of the neighbors of ww changed the mass at ww, mass moved from ww to its neighbors. Because no mass moved from another vertex to ww, any remaining positive mass at ww has not yet moved. We also know that the remaining mass at ww is always nonnegative by the inequalities in I1I_{1}. Thus, whenever we execute a step of the algorithm for one of the neighbors of ww, the nonnegative mass that moves from ww has not yet moved.

Furthermore, during the iith step of the algorithm, all the remaining nonnegative mass at ww moves away from it, and this mass has not yet moved. Mass movements due to steps of the algorithm for neighbors of ww and due to the iith step, which is for ww, make up all possible movements of the mass initially at ww. This argument holds for all vertices ww for which ξ(w)>0\xi(w)>0, so all possible movements of positive mass move mass that has not been moved before. Thus, we are done. ∎

Corollary 4.6.

For any graph GG and zero-sum distribution ξ\xi, if for some TT and 𝒪\mathcal{O} the tree-based transport inequalities (G,T,𝒪,ξ)\mathcal{I}(G,T,\mathcal{O},\xi) are satisfied, then

W(ξ,0)=12wG|ξ(w)|.\displaystyle W(\xi,0)=\frac{1}{2}\sum_{w\in G}|\xi(w)|.
Proof.

By Lemma 4.5, we have that 12wG|ξ(w)|\frac{1}{2}\sum_{w\in G}|\xi(w)|, the sum of positive mass, is the upper bound. For the lower bound, we note that all positive mass must move because we only move positive mass. Thus, all positive mass must move at least a distance of 1, so W(ξ,0)W(\xi,0) will be at least the sum of positive mass. ∎

Corollary 4.7.

For a Guvab 𝒢\mathcal{G} where W=1W=1 and β<1\beta<1, suppose that there exists some spanning tree TT of GG and rr-monotone ordering 𝒪\mathcal{O} such that ξk\xi_{k} satisfies the tree-based transport inequalities (G,T,𝒪,ξk)\mathcal{I}(G,T,\mathcal{O},\xi_{k}). Then Wk(𝒢)=1W_{k}(\mathcal{G})=1.

Proof.

Recall that when W=1W=1 and β<1\beta<1, we must have that GG is bipartite, uu and vv are on different sides of the bipartite graph, and α=β=0\alpha=\beta=0. Thus, for all kk we have that μk\mu_{k} and νk\nu_{k} are nonzero on disjoint sets of vertices, since at all times μk\mu_{k} is nonzero only on one side and νk\nu_{k} is nonzero only on the other side. Thus wG|ξ(w)|=2\sum_{w\in G}|\xi(w)|=2, so by Corollary 4.6 we have that

Wk(𝒢)=W(ξk,0)=12wG|ξ(w)|=1.W_{k}(\mathcal{G})=W(\xi_{k},0)=\frac{1}{2}\sum_{w\in G}|\xi(w)|=1.

Now all that remains to be shown is that once ξk\xi_{k} is sufficiently close to either of the stationary distributions ξ0\xi^{0} or ξ1\xi^{1}, the tree-based transport inequalities (G,T,𝒪,ξk)\mathcal{I}(G,T,\mathcal{O},\xi_{k}) will be satisfied. To prove this, we will first show that ξ0\xi^{0} and ξ1\xi^{1} lie on the interior of the region of distributions that satisfy the inequalities. The next lemma helps show that ξ0\xi^{0} and ξ1\xi^{1} satisfy the inequalities.

Lemma 4.8.

Suppose we have a bipartite graph GG with sides S0S_{0} and S1S_{1} and a distribution ξ\xi such that for wS0w\in S_{0} we have ξ(w)=deg(w)|E(G)|\xi(w)=\frac{\deg(w)}{|E(G)|} and for wS1w\in S_{1} we have ξ(w)=deg(w)|E(G)|\xi(w)=-\frac{\deg(w)}{|E(G)|}. Then pick an arbitrary spanning tree T and rr-monotone ordering 𝒪\mathcal{O} on V(G)V(G). Consider the tree-based transport plan A(G,T,𝒪,ξ)A(G,T,\mathcal{O},\xi). After each step, for each ii for in2i\leq n-2, we have that Ai(ξ)(wj)1|E(G)|A_{i}(\xi)(w_{j})\geq\frac{1}{|E(G)|} for wjS0w_{j}\in S_{0} with j>ij>i and that Ai(ξ)(wj)1|E(G)|A_{i}(\xi)(w_{j})\leq-\frac{1}{|E(G)|} for wjS1w_{j}\in S_{1} with j>ij>i.

Proof.

We know by Lemma 4.3 that for all wGw\in G and for all 0in20\leq i\leq n-2, we have Ai(|E(G)|ξ)(w)=|E(G)|((Ai(ξ)(w))A_{i}(|E(G)|\xi)(w)=|E(G)|((A_{i}(\xi)(w)). Thus, it suffices to show that after all steps ii for in2i\leq n-2, we have that Ai(|E(G)|ξ)(wj)1A_{i}(|E(G)|\xi)(w_{j})\geq 1 for wjS0w_{j}\in S_{0} with j>ij>i and that Ai(|E(G)|ξ)(wj)1A_{i}(|E(G)|\xi)(w_{j})\leq-1 for wjS1w_{j}\in S_{1} with j>ij>i.

To prove this, for all 0in20\leq i\leq n-2, we define the graph GiG_{i} to consist of the vertex set V(Gi)={wi+1,wn}V(G_{i})=\{w_{i+1},\ldots w_{n}\} and all the edges of E(G)E(G) that have both endpoints in V(Gi)V(G_{i}). It suffices to show by induction on ii that for in2i\leq n-2, we have Ai(|E(G)|ξ)(wj)=degGiwjA_{i}(|E(G)|\xi)(w_{j})=\deg_{G_{i}}w_{j} for wjS0w_{j}\in S_{0} with j>ij>i and we have Ai(|E(G)|ξ)(wj)=degGiwjA_{i}(|E(G)|\xi)(w_{j})=-\deg_{G_{i}}w_{j} for wjS1w_{j}\in S_{1} with j>ij>i.

Base case: When i=0i=0, we note that G0=GG_{0}=G. When i=0i=0, by the definition of ξ\xi, we have that Ai(|E(G)|ξ)(wj)=degGwjA_{i}(|E(G)|\xi)(w_{j})=\deg_{G}w_{j} for wjS0w_{j}\in S_{0} with j>ij>i and that Ai(|E(G)|ξ)(wj)=degGwjA_{i}(|E(G)|\xi)(w_{j})=-\deg_{G}w_{j} for wjS1w_{j}\in S_{1} with j>ij>i.

Inductive step: The inductive hypothesis is that Ai(|E(G)|ξ)(wj)=degGiwjA_{i}(|E(G)|\xi)(w_{j})=\deg_{G_{i}}w_{j} for wjS0w_{j}\in S_{0} with j>ij>i and that Ai(|E(G)|ξ)(wj)=degGiwjA_{i}(|E(G)|\xi)(w_{j})=-\deg_{G_{i}}w_{j} for wjS1w_{j}\in S_{1} with j>ij>i. Given that this is true for i1i-1, we want to show that it is true for ii.

We suppose that wiS0w_{i}\in S_{0}; the case where wiS1w_{i}\in S_{1} will proceed analogously. After i1i-1 steps, wiw_{i} has a mass of degGi1wi\deg_{G_{i-1}}w_{i}. During the iith step, this mass is distributed evenly among wjwiw_{j}\sim w_{i} with j>ij>i; we note that there are exactly degGi1wi\deg_{G_{i-1}}w_{i} of these neighbors. Thus, each wjw_{j} will receive +1+1 mass. By the inductive hypothesis we have that before step ii, each of these neighbors wjw_{j} had degGi1wj-\deg_{G_{i-1}}w_{j} mass, since each of the neighbors of wiw_{i} is in S1S_{1}, the opposite side of the bipartite graph. Then, after step ii, each wjw_{j} has mass (degGi1wj1)-(\deg_{G_{i-1}}w_{j}-1), and the remaining vertices with indices greater than ii have the same mass as before. We note that for all >i\ell>i, if wwiw_{\ell}\sim w_{i} then degGiw=degGi1w1\deg_{G_{i}}w_{\ell}=\deg_{G_{i-1}}w_{\ell}-1 because the edge {(w,wi}\{(w_{\ell},w_{i}\} is being removed, and otherwise degGiw=degGi1w\deg_{G_{i}}w_{\ell}=\deg_{G_{i-1}}w_{\ell}. We have just shown that this is exactly the mass at all vertices with indices greater than ii after the iith step of the algorithm. Thus, at each vertex ww_{\ell} with >i\ell>i, we have that for wS0w_{\ell}\in S_{0}, the mass at ww_{\ell} after ii steps is degGiw\deg_{G_{i}}w_{\ell} and for wS1w_{\ell}\in S_{1}, the mass at ww_{\ell} after ii steps is degGiw-\deg_{G_{i}}w_{\ell}. We proceed analogously in the case where wiS1w_{i}\in S_{1}. This proves the inductive hypothesis, and therefore proves the lemma. ∎

We are now ready to show that ξ0\xi^{0} and ξ1\xi^{1} lie on the interior of the region of distributions that satisfy the inequalities.

Corollary 4.9.

For any Guvab 𝒢\mathcal{G} where W=1W=1 and β<1\beta<1, we have that ξ0\xi^{0} and ξ1\xi^{1} lie strictly on the interior of the region R|V(G)|R\subset\mathbb{R}^{|V(G)|} of distributions ξ\xi that satisfy the tree-based transport inequalities (G,T,𝒪,ξ)\mathcal{I}(G,T,\mathcal{O},\xi).

Proof.

We prove this for ξ0\xi^{0}; by symmetry it will hold for ξ1\xi^{1} as well since ξ1=ξ0\xi^{1}=-\xi^{0}. If the sides of GG are S0S_{0} and S1S_{1}, with uS0u\in S_{0} and vS1v\in S_{1}, then for wS0w\in S_{0}, we have that

ξ0(w)=limkμ2k(w)limkν2k(w)=deg(w)|E(G)|0=deg(w)|E(G)|\xi^{0}(w)=\lim_{k\to\infty}\mu_{2k}(w)-\lim_{k\to\infty}\nu_{2k}(w)=\frac{\deg(w)}{|E(G)|}-0=\frac{\deg(w)}{|E(G)|}

and for wS1w\in S_{1} we have that

ξ0(w)=limkμ2k+1(w)limkν2k+1(w)=0deg(w)|E(G)|=deg(w)|E(G)|.\xi^{0}(w)=\lim_{k\to\infty}\mu_{2k+1}(w)-\lim_{k\to\infty}\nu_{2k+1}(w)=0-\frac{\deg(w)}{|E(G)|}=-\frac{\deg(w)}{|E(G)|}.

Then for all t,wGt,w\in G such that twt\sim w, we have that ξ(t)ξ(w)<0\xi(t)\xi(w)<0. We also have that by Lemma 4.8, ξ0(wj)Ai(ξ0)(wj)1|E(G)|2>0\xi^{0}(w_{j})A_{i}(\xi^{0})(w_{j})\geq\frac{1}{|E(G)|^{2}}>0 for all 0i|V(G)|20\leq i\leq|V(G)|-2 and i<jV(G)i<j\leq V(G). Thus, ξ0\xi^{0} and ξ1\xi^{1} lie strictly on the interior of the region R|V(G)|R\subset\mathbb{R}^{|V(G)|} of distributions ξ\xi that satisfy the tree-based transport inequalities (G,T,𝒪,ξ)\mathcal{I}(G,T,\mathcal{O},\xi). ∎

Using these results, we are now ready to prove the main claim that the Wasserstein distance is eventually constant when W=1W=1 and β<1\beta<1.

We first define a variable that corresponds to how long {Wk}\{W_{k}\} takes to reach constancy. Note that this variable can be infinity if {Wk}\{W_{k}\} is not eventually constant.

Definition 4.10.

For any Guvab 𝒢\mathcal{G} where Wk1W_{k}\to 1, define ρ(𝒢)\rho(\mathcal{G}) to be

inf{N:{W(μk,νk)}kN=(1,1,1,)}.\inf\{N\in\mathbb{Z}:\{W(\mu_{k},\nu_{k})\}_{k\geq N}=(1,1,1,\ldots)\}.
Theorem 4.11.

For any Guvab 𝒢\mathcal{G} with W=1W=1 and β<1\beta<1, we have ρ(𝒢)<\rho(\mathcal{G})<\infty.

Proof.

Pick an arbitrary spanning tree TT of GG and rr-monotone ordering 𝒪\mathcal{O}. By Corollary 4.9, ξ0\xi^{0} and ξ1\xi^{1} are on the interior of the region R|V(G)|R\subset\mathbb{R}^{|V(G)|} of distributions ξ\xi that satisfy the tree-based transport inequalities (G,T,𝒪,ξ)\mathcal{I}(G,T,\mathcal{O},\xi). We note that all the inequalities in (G,T,𝒪,ξ)\mathcal{I}(G,T,\mathcal{O},\xi) can be written in the form f(ξ)>0f(\xi)>0, where f:|V(G)|f:\mathbb{R}^{|V(G)|}\to\mathbb{R} is a continuous function. Thus, by the definition of a continuous function, there exists some ε>0\varepsilon>0 such that for all ξ|V(G)|\xi\in\mathbb{R}^{|V(G)|} that satisfy |ξ(w)ξ0(w)|<ε|\xi(w)-\xi^{0}(w)|<\varepsilon for all wGw\in G or satisfy |ξ(w)ξ1(w)|<ε|\xi(w)-\xi^{1}(w)|<\varepsilon for all wGw\in G, we have that ξR\xi\in R. We also know, by the formal definition of a limit, that there exists some NN such that for all kNk\geq N and all wGw\in G, we have |ξ2k(w)ξ0(w)|<ε|\xi_{2k}(w)-\xi^{0}(w)|<\varepsilon and |ξ2k+1(w)ξ1(w)|<ε|\xi_{2k+1}(w)-\xi^{1}(w)|<\varepsilon. Thus, for all k2Nk\geq 2N, we have ξkR\xi_{k}\in R. By Corollary 4.7, for all k2Nk\geq 2N, we have that Wk=1W_{k}=1. Hence ρ(𝒢)2N<\rho(\mathcal{G})\leq 2N<\infty. ∎

We next hope to characterize how long it takes the Wasserstein distance of these Guvabs with W=1W=1 and β<1\beta<1 to become constant. In particular, we prove upper and lower bounds for ρ(𝒢)\rho(\mathcal{G}). We start with the upper bound. To prove this upper bound, we first prove a lemma quantifying exactly how close to ξ0\xi^{0} or ξ1\xi^{1} a distribution must be in order for the tree-based transport inequalities (G,T,𝒪,ξ)\mathcal{I}(G,T,\mathcal{O},\xi) to be satisfied.

Lemma 4.12.

Consider a Guvab with W=1W=1 and β<1\beta<1. Pick an arbitrary spanning tree TT and rr-monotone ordering 𝒪\mathcal{O}. Let ε(G)=1|V||E|\varepsilon(G)=\frac{1}{|V||E|}. If for a distribution ξ\xi it is true that for all vertices ww we have that |ξ(w)ξ0(w)|<ε(G)|\xi(w)-\xi^{0}(w)|<\varepsilon(G) or it is true that for all vertices ww we have that |ξ(w)ξ1(w)|<ε(G)|\xi(w)-\xi^{1}(w)|<\varepsilon(G), then ξ\xi satisfies the tree-based transport inequalities (G,T,𝒪,ξ)\mathcal{I}(G,T,\mathcal{O},\xi).

Proof.

We prove this for ξ0\xi^{0}, and an analogous argument will hold for ξ1\xi^{1}.

We note that by Lemma 4.8 we have that if we start with ξ0\xi^{0}, then at any point in the tree-based transport algorithm through step |V|2|V|-2, the absolute value of the mass at any vertex is at least 1|E|\frac{1}{|E|}. Thus, if at any point ii in the algorithm through step |V|2|V|-2 the mass at a vertex differs by at most 1|E|\frac{1}{|E|} from Ai(ξ0)A_{i}(\xi^{0}), then the tree-based transport inequalities \mathcal{I} are satisfied because mass is never the wrong sign.

It thus suffices to show that for all 0i|V|20\leq i\leq|V|-2 and for all wGw\in G, we have |Ai(ξ)(w)Ai(ξ0)(w)|1|E||A_{i}(\xi)(w)-A_{i}(\xi^{0})(w)|\leq\frac{1}{|E|}. To prove this, we note that ξ(w)ξ0(w)\xi(w)-\xi^{0}(w) is a zero-sum distribution, and by Lemma 4.3 for all ii and for all ww, we have Ai(ξ)(w)=Ai(ξ0)(w)+Ai(ξξ0)(w)A_{i}(\xi)(w)=A_{i}(\xi^{0})(w)+A_{i}(\xi-\xi^{0})(w). We consider the quantity wG|Ai(ξ)(w)Ai(ξ0)(w)|=wG|Ai(ξξ0)(w)|\sum_{w\in G}|A_{i}(\xi)(w)-A_{i}(\xi^{0})(w)|=\sum_{w\in G}|A_{i}(\xi-\xi^{0})(w)|. This will be nonincreasing as ii gets larger, since at step ii of the algorithm the absolute value of the mass at wiw_{i} decreases by exactly |Ai1(ξξ0)(wi)||A_{i-1}(\xi-\xi^{0})(w_{i})| while the sum of absolute values at wiw_{i}’s neighbors cannot increase by more than |Ai1(ξξ0)(wi)||A_{i-1}(\xi-\xi^{0})(w_{i})|. The maximum value of this sum is |V|ε(G)|V|\varepsilon(G) (since this is an upper bound for the value at the beginning). We know that maxwG|Ai(ξ)(w)Ai(ξ0)(w)|wG|Ai(ξ)(w)Ai(ξ0)(w)|\max_{w\in G}|A_{i}(\xi)(w)-A_{i}(\xi^{0})(w)|\leq\sum_{w\in G}|A_{i}(\xi)(w)-A_{i}(\xi^{0})(w)| so maxwG|Ai(ξ)(w)Ai(ξ0)(w)||V|ε(G)=1|E|\max_{w\in G}|A_{i}(\xi)(w)-A_{i}(\xi^{0})(w)|\leq|V|\varepsilon(G)=\frac{1}{|E|}, which is exactly what we wanted to show, so we are done. ∎

With this lemma established, we can now prove our upper bound for ρ(𝒢)\rho(\mathcal{G}).

Lemma 4.13.

Let λmax\lambda_{\max} be max|λ|L:|λ|<1|λ|\displaystyle\max_{|\lambda|\in L:|\lambda|<1}|\lambda| where L is the set of all eigenvalues of XX and YY. Then for a Guvab 𝒢\mathcal{G} where W=1W=1 and β<1\beta<1, we have ρ(𝒢)10ln|V|1λmax2\displaystyle\rho(\mathcal{G})\leq\frac{10\ln|V|}{1-\lambda_{\max}^{2}}.

Proof.

We use [DS91, Prop. 3]. The Markov chains X2kX_{2k} and Y2kY_{2k} are both converging to their even stationary distributions limkμ2k\lim_{k\to\infty}\mu_{2k} and limkν2k\lim_{k\to\infty}\nu_{2k}. For convenience, denote limkμ2k\lim_{k\to\infty}\mu_{2k} by γu\gamma_{u} and limkν2k\lim_{k\to\infty}\nu_{2k} by γv\gamma_{v}. Once at all vertices, μ2k\mu_{2k} and ν2k\nu_{2k} are both less than or equal to 12|V||E|\frac{1}{2|V||E|} away from their respective stationary distributions, ξ2k\xi_{2k} will satisfy the tree-based transport inequalities (G,T,𝒪,ξ2k)\mathcal{I}(G,T,\mathcal{O},\xi_{2k}) by Lemma 4.12. Since X2kX_{2k} and Y2kY_{2k} are both Markov chains with limiting distributions, we use notation analogous to that of [Sin92] and say that Δevenu(k)=12wG|μ2k(w)γu(w)|\Delta_{\textrm{even}\,u}(k)=\displaystyle\frac{1}{2}\sum_{w\in G}|\mu_{2k}(w)-\gamma_{u}(w)|. Similarly, Δevenv(k)=12wG|ν2k(w)γv(w)|\Delta_{\textrm{even}\,v}(k)=\displaystyle\frac{1}{2}\sum_{w\in G}|\nu_{2k}(w)-\gamma_{v}(w)|. Then for ε>0\varepsilon>0 and x{u,v}x\in\{u,v\} we let τevenx(ε)\tau_{\textrm{even}\,x}(\varepsilon) be the minimum nonnegative integer kk such that Δevenx(k)ε\Delta_{\textrm{even}\,x}(k^{\prime})\leq\varepsilon for all kkk^{\prime}\geq k. Thus, by [DS91, Prop. 3], the time ρeven\rho_{\textrm{even}} it takes for W2kW_{2k} to eventually have distance 11 satisfies

ρevenmaxx{u,v}2τevenx(12|V||E|)maxx{u,v}21λmax2(ln1γx(x)+ln112|V||E|).\rho_{\textrm{even}}\leq\max_{x\in\{u,v\}}2\tau_{\textrm{even}\,x}(\frac{1}{2|V||E|})\leq\max_{x\in\{u,v\}}\frac{2}{1-\lambda_{\max}^{2}}(\ln{\frac{1}{\gamma_{x}(x)}}+\ln{\cfrac{1}{\frac{1}{2|V||E|}}}).

Then we just need to bound the right-hand side above. This gives

ρeven\displaystyle\displaystyle\rho_{\textrm{even}} 21λmax2(ln|E|+ln2|V||E|)=21λmax2(ln2|V||E|2)\displaystyle\leq\frac{2}{1-\lambda_{\max}^{2}}(\ln{|E|}+\ln{2|V||E|})=\frac{2}{1-\lambda_{\max}^{2}}(\ln{2|V||E|^{2}})
21λmax2(ln|V|3(|V|1)2)\displaystyle\leq\frac{2}{1-\lambda_{\max}^{2}}(\ln{|V|^{3}(|V|-1)^{2}})
<21λmax25ln(|V|)=10ln|V|1λmax2.\displaystyle<\frac{2}{1-\lambda_{\max}^{2}}\cdot 5\ln(|V|)=\frac{10\ln|V|}{1-\lambda_{\max}^{2}}.

By similar reasoning, the same bound works for ρodd\rho_{\textrm{odd}}, the time it takes for W2k+1W_{2k+1} to eventually have distance 11. Thus, 10ln|V|1λmax2\displaystyle\frac{10\ln|V|}{1-\lambda_{\max}^{2}} is an upper bound for ρ(𝒢)\rho(\mathcal{G}). ∎

We now establish a lower bound for ρ(𝒢)\rho(\mathcal{G}).

Lemma 4.14.

For a Guvab 𝒢\mathcal{G} where W=1W=1 and β<1\beta<1, we have ρ(𝒢)d(u,v)21\displaystyle\rho(\mathcal{G})\geq\frac{\emph{{d}(u,v)}}{2}-1.

Proof.

We note that μk(t)=0\mu_{k}(t)=0 for tVt\in V if d(t,u)>k\textrm{d}(t,u)>k. Similarly, νk(w)=0\nu_{k}(w)=0 for wVw\in V if d(w,v)>k\textrm{d}(w,v)>k. Suppose k<d(u,v)21k<\frac{\textrm{d}(u,v)}{2}-1 and consider any pair of vertices t,wt,w such that μk(t)>0\mu_{k}(t)>0 and νk(w)>0\nu_{k}(w)>0. Then d(t,u)d(u,v)21\textrm{d}(t,u)\leq\frac{\textrm{d}(u,v)}{2}-1 and d(w,v)d(u,v)21\textrm{d}(w,v)\leq\frac{\textrm{d}(u,v)}{2}-1, so d(t,w)d(u,v)(d(t,u)+d(w,v))=2\textrm{d}(t,w)\geq\textrm{d}(u,v)-(\textrm{d}(t,u)+\textrm{d}(w,v))=2. Therefore all mass will have to move a distance of at least 2 to get from μk\mu_{k} to νk\nu_{k}, so Wk2>1W_{k}\geq 2>1. ∎

5 Convergence when W=12W=\frac{1}{2}

In this section, we consider Guvabs where W=12W=\frac{1}{2} and β<1\beta<1. Recall that these are exactly the Guvabs for which GG is bipartite and 0=α<β<10=\alpha<\beta<1. As in the previous section, and for similar reasons, the Wasserstein distance will eventually be the sum of positive mass. In this case, however, the Wasserstein distance is not eventually constant but rather an exponential that we can express explicitly. To prove this, we proceed by a similar strategy as in the W=1W=1 case. In particular, we show that the tree-based transport inequalities will eventually be satisfied, and compute the Wasserstein distance when these inequalities are satisfied.

In the next three results, we show that the tree-based transport inequalities will eventually be satisfied, and provide an initial expression for what the Wasserstein distance will be when the tree-based transport inequalities are satisfied. Later, we will calculate exactly what this expression for the Wasserstein distance evaluates to.

We begin by showing in the next two results that, analogously to before, ξ0\xi^{0} and ξ1\xi^{1} lie on the interior of the region of distributions that satisfy the inequalities.

Lemma 5.1.

Suppose we have a bipartite graph GG with sides S0S_{0} and S1S_{1} and a distribution ξ\xi such that for wS0w\in S_{0} ξ(w)=deg(w)2|E(G)|\xi(w)=\frac{\deg(w)}{2|E(G)|} and for wS1w\in S_{1} ξ(w)=deg(w)2|E(G)|\xi(w)=-\frac{\deg(w)}{2|E(G)|}. Then pick an arbitrary spanning tree T and rr-monotone ordering 𝒪\mathcal{O} on V(G)V(G). Consider the tree-based transport plan A(G,T,𝒪,ξ)A(G,T,\mathcal{O},\xi). We have that after each step ii for in2i\leq n-2, for wjS0w_{j}\in S_{0} with j>ij>i, we have that Ai(ξ)(wj)12|E(G)|A_{i}(\xi)(w_{j})\geq\frac{1}{2|E(G)|} and for wjS1w_{j}\in S_{1} with j>ij>i we have that Ai(ξ)(wj)12|E(G)|A_{i}(\xi)(w_{j})\leq-\frac{1}{2|E(G)|}.

Proof.

We note that this is nearly the same as Lemma 4.8, but differs by a constant factor of 12\frac{1}{2}. Given the distribution ξ\xi, we know by Lemma 4.8 that after all steps ii for in2i\leq n-2, for wjS0w_{j}\in S_{0} with j>ij>i, we have that Ai(2ξ)(wj)1|E(G)|A_{i}(2\xi)(w_{j})\geq\frac{1}{|E(G)|} and for wjS1w_{j}\in S_{1} with j>ij>i we have that Ai(2ξ)(wj)1|E(G)|A_{i}(2\xi)(w_{j})\leq-\frac{1}{|E(G)|}. We also know that Ai(2ξ)=2Ai(ξ)A_{i}(2\xi)=2A_{i}(\xi) by Lemma 4.3 so this means for wjS0w_{j}\in S_{0} with j>ij>i, we have that 2Ai(ξ)(wj)1|E(G)|2A_{i}(\xi)(w_{j})\geq\frac{1}{|E(G)|} and for wS1w\in S_{1} with j>ij>i, we have that 2Ai(ξ)(wj)1|E(G)|2A_{i}(\xi)(w_{j})\leq-\frac{1}{|E(G)|}. Dividing both sides by 2, we get that after all steps ii for in2i\leq n-2, for wjS0w_{j}\in S_{0} with j>ij>i, we have that Ai(ξ)(wj)12|E(G)|A_{i}(\xi)(w_{j})\geq\frac{1}{2|E(G)|} and for wjS1w_{j}\in S_{1} with j>ij>i, we have that Ai(ξ)(wj)12|E(G)|A_{i}(\xi)(w_{j})\leq-\frac{1}{2|E(G)|}. ∎

Corollary 5.2.

For any Guvab 𝒢\mathcal{G} where W=12W=\frac{1}{2} and β<1\beta<1, we have that ξ0\xi^{0} and ξ1\xi^{1} lie strictly on the interior of the region R|V(G)|R\subset\mathbb{R}^{|V(G)|} of distributions ξ\xi that satisfy the tree-based transport inequalities (G,T,𝒪,ξ)\mathcal{I}(G,T,\mathcal{O},\xi).

Proof.

We note that when W=12W=\frac{1}{2} and β<1\beta<1, we have that β>0\beta>0 so for all wGw\in G, we have that limkνk(w)=deg(w)2|E(G)|\lim_{k\to\infty}\nu_{k}(w)=\frac{\deg(w)}{2|E(G)|}. We also know that if GG has sides S0S_{0} and S1S_{1} with uS0u\in S_{0}, for all wS0w\in S_{0} we have that limkμ2k(w)=deg(w)|E(G)|\displaystyle\lim_{k\to\infty}\mu_{2k}(w)=\frac{\deg(w)}{|E(G)|} and for all wS1w\in S_{1} we have that limkμ2k(w)=0\displaystyle\lim_{k\to\infty}\mu_{2k}(w)=0. Similarly, for all wS1w\in S_{1} we have that limkμ2k+1(w)=deg(w)|E(G)|\displaystyle\lim_{k\to\infty}\mu_{2k+1}(w)=\frac{\deg(w)}{|E(G)|} and for all wS0w\in S_{0} we have that limkμ2k+1(w)=0\displaystyle\lim_{k\to\infty}\mu_{2k+1}(w)=0. Thus for all wS0w\in S_{0} we have that ξ0(w)=deg(w)2|E(G)|\xi^{0}(w)=\frac{\deg(w)}{2|E(G)|} and for all wS1w\in S_{1} we have that ξ0(w)=deg(w)2|E(G)|\xi^{0}(w)=\frac{-\deg(w)}{2|E(G)|}. Also ξ1=ξ0\xi^{1}=-\xi^{0}.

We prove the claim for ξ0\xi^{0} - by symmetry it will hold for ξ1\xi^{1} as well since ξ1=ξ0\xi^{1}=-\xi^{0}. If the sides of GG are S0S_{0} and S1S_{1}, then for wS0w\in S_{0} ξ0(w)=deg(w)2|E(G)|\xi^{0}(w)=\frac{\deg(w)}{2|E(G)|} and for wS1w\in S_{1} ξ0(w)=deg(w)2|E(G)|\xi^{0}(w)=-\frac{\deg(w)}{2|E(G)|}. Then we have that for all t,wGt,w\in G such that twt\sim w, the product ξ(t)ξ(w)<0\xi(t)\xi(w)<0. We also have that by Lemma 5.1, ξ0(w)Ai(ξ0)(w)14|E(G)|2>0\xi^{0}(w)A_{i}(\xi^{0})(w)\geq\frac{1}{4|E(G)|^{2}}>0 holds for all wGw\in G and for all 0i|V(G)|20\leq i\leq|V(G)|-2. ∎

We now know that ξ0\xi^{0} and ξ1\xi^{1} are on the interior of the region satisfying the inequalities. We can hence proceed similarly to section 4 to show that ξk\xi_{k} will eventually satisfy the inequalities and thus WkW_{k} will be the sum of positive mass.

Corollary 5.3.

For any Guvab where W=12W=\frac{1}{2} and β<1\beta<1, there exists NN such that for all kNk\geq N,

Wk=12wG|ξk(w)|.W_{k}=\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|.
Proof.

We know by Corollary 5.2 that ξ0\xi^{0} and ξ1\xi^{1} lie on the interior of RR. Therefore, as in the proof of Theorem 4.11, by the formal definition of a limit there exists some NN such that for all kNk\geq N, we have that ξkR\xi_{k}\in R and thus (G,T,𝒪,ξk)\mathcal{I}(G,T,\mathcal{O},\xi_{k}) are satisfied. We note that Corollary 4.6 holds for any Guvab, including the ones we are currently inspecting, so if the tree-based transport inequalities (G,T,𝒪,ξk)\mathcal{I}(G,T,\mathcal{O},\xi_{k}) are satisfied, Wk=12wG|ξk(w)|W_{k}=\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|. Thus for all kNk\geq N, we have that Wk=12wG|ξk(w)|.W_{k}=\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|.

We now know that eventually, the Wasserstein distance will be the sum of positive mass, so it remains to calculate the sum of positive mass. To do this, we will first need to define an auxiliary Markov chain and prove some properties of this Markov chain.

Definition 5.4.

Let s(α)s(\alpha) be a two-state Markov chain with states s0s_{0} and s1s_{1}, where we start at s0s_{0}, and at all times we have an α\alpha chance of staying at our current state and a 1α1-\alpha chance of switching to the other state. Then define (σα)k(\sigma_{\alpha})_{k} to be the probability distribution after kk steps of this Markov chain.

Lemma 5.5.

For the Markov chain defined above, (σα)k(s0)=0.5+0.5(2α1)k(\sigma_{\alpha})_{k}(s_{0})=0.5+0.5(2\alpha-1)^{k} and (σα)k(s1)=0.50.5(2α1)k(\sigma_{\alpha})_{k}(s_{1})=0.5-0.5(2\alpha-1)^{k}.

Proof.

We will proceed by induction on kk, using the transition probabilities to go from (σα)k(\sigma_{\alpha})_{k} to (σα)k+1(\sigma_{\alpha})_{k+1}.

Base case: When k=0k=0, we know that, since the Markov chain starts at s0s_{0}, we have (σα)0(s0)=1=0.5+0.5(2α1)0(\sigma_{\alpha})_{0}(s_{0})=1=0.5+0.5(2\alpha-1)^{0} and (σα)0(s1)=0=0.50.5(2α1)0(\sigma_{\alpha})_{0}(s_{1})=0=0.5-0.5(2\alpha-1)^{0}.

Inductive step: Suppose (σα)k(s0)=0.5+0.5(2α1)k(\sigma_{\alpha})_{k}(s_{0})=0.5+0.5(2\alpha-1)^{k} and (σα)k(s1)=0.50.5(2α1)k(\sigma_{\alpha})_{k}(s_{1})=0.5-0.5(2\alpha-1)^{k}. We know that

(σα)k+1(s0)\displaystyle(\sigma_{\alpha})_{k+1}(s_{0}) =α(σα)k(s0)+(1α)(σα)k(s1)\displaystyle=\alpha(\sigma_{\alpha})_{k}(s_{0})+(1-\alpha)(\sigma_{\alpha})_{k}(s_{1})
=α(0.5+0.5(2α1)k)+(1α)(0.50.5(2α1)k)\displaystyle=\alpha(0.5+0.5(2\alpha-1)^{k})+(1-\alpha)(0.5-0.5(2\alpha-1)^{k})
=0.5+(2α1)0.5(2α1)k\displaystyle=0.5+(2\alpha-1)\cdot 0.5(2\alpha-1)^{k}
=0.5+0.5(2α1)k+1.\displaystyle=0.5+0.5(2\alpha-1)^{k+1}.

Similarly,

(σα)k+1(s1)\displaystyle(\sigma_{\alpha})_{k+1}(s_{1}) =(1α)(σα)k(s0)+α(σα)k(s1)\displaystyle=(1-\alpha)(\sigma_{\alpha})_{k}(s_{0})+\alpha(\sigma_{\alpha})_{k}(s_{1})
=(1α)(0.5+0.5(2α1)k)+α(0.50.5(2α1)k)\displaystyle=(1-\alpha)(0.5+0.5(2\alpha-1)^{k})+\alpha(0.5-0.5(2\alpha-1)^{k})
=0.5+(2α1)(0.5)(2α1)k\displaystyle=0.5+(2\alpha-1)\cdot(-0.5)(2\alpha-1)^{k}
=0.50.5(2α1)k+1.\displaystyle=0.5-0.5(2\alpha-1)^{k+1}.

We now have all the tools we need to explicitly calculate the sum of positive mass. The next lemma tells us what the sum of positive mass will be.

Lemma 5.6.

When β<1\beta<1 and W=12W=\frac{1}{2}, there exists some NN such that for all kNk\geq N, we either have that

12wG|ξk(w)|=0.5+0.5(12β)k\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|=0.5+0.5(1-2\beta)^{k}

or that

12wG|ξk(w)|=0.50.5(12β)k.\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|=0.5-0.5(1-2\beta)^{k}.
Proof.

We know GG is bipartite; say it has sides S0S_{0} and S1S_{1}. We know 0=α<β<10=\alpha<\beta<1. Assume without loss of generality that vS0v\in S_{0}. If uu is on side S0S_{0}, then eventually for wS0w\in S_{0}, we have that ξ2k(w)\xi_{2k}(w) gets arbitrarily close to deg(w)2|E(G)|\displaystyle\frac{\deg(w)}{2|E(G)|} and for wS1w\in S_{1}, we have that ξ2k(w)\xi_{2k}(w) gets arbitrarily close to deg(w)2|E(G)|\displaystyle-\frac{\deg(w)}{2|E(G)|}. In particular, for some NN, for all kNk\geq N we have ξ2k(w)>0\xi_{2k}(w)>0 if and only if wS0w\in S_{0}. Then for some NN, for all kNk\geq N, when uS0u\in S_{0}, the total positive mass of ξ2k\xi_{2k} is wS0ξ2k(w)\displaystyle\sum_{w\in S_{0}}\xi_{2k}(w). Similarly, for some NN, for all kNk\geq N we have ξ2k+1(w)>0\xi_{2k+1}(w)>0 if and only if wS1w\in S_{1}. Thus, for some NN, for all kNk\geq N, when uS0u\in S_{0}, the total positive mass of ξ2k+1\xi_{2k+1} is wS1ξ2k+1(w)\displaystyle\sum_{w\in S_{1}}\xi_{2k+1}(w).

By an analogous argument, when uS1u\in S_{1}, for some NN, for all kNk\geq N, the total positive mass of ξ2k\xi_{2k} is wS0ξ2k(w)\displaystyle\sum_{w\in S_{0}}\xi_{2k}(w). Similarly, for some NN, for all kNk\geq N, the total positive mass of ξ2k+1\xi_{2k+1} is wS1ξ2k+1(w)\displaystyle\sum_{w\in S_{1}}\xi_{2k+1}(w).

Thus, to calculate what the sum of positive mass eventually equals, we simply consider how much mass of μ\mu and ν\nu is on each side of the bipartite graph so that we know how much mass of ξ\xi is on each side. We note that for any random walk with laziness β\beta, at all steps the mass on a given side has a probability β\beta of staying on that side and a probability 1β1-\beta of moving to the other side, since any mass that moves along an edge moves to the other side. Thus the mass of νk\nu_{k} on S0S_{0} and S1S_{1} behaves identically to the mass of s(β)s(\beta) on s0s_{0} and s1s_{1}. In other words, the amount of mass of νk\nu_{k} on S0S_{0} is (σβ)k(s0)=0.5+0.5(2β1)k(\sigma_{\beta})_{k}(s_{0})=0.5+0.5(2\beta-1)^{k} and the amount of mass of νk\nu_{k} on S1S_{1} is (σβ)k(s1)=0.50.5(2β1)k(\sigma_{\beta})_{k}(s_{1})=0.5-0.5(2\beta-1)^{k} by Lemma 5.5. Similarly, if uS0u\in S_{0}, then the amount of mass of μk\mu_{k} on S0S_{0} is (σ0)k(s0)=0.5+0.5(1)k(\sigma_{0})_{k}(s_{0})=0.5+0.5(-1)^{k} and the amount of mass of μk\mu_{k} on S1S_{1} is (σ0)k(s1)=0.50.5(1)k(\sigma_{0})_{k}(s_{1})=0.5-0.5(-1)^{k}. By symmetry, if uS1u\in S_{1}, then the amount of mass of μk\mu_{k} on S0S_{0} is (σ0)k(s1)=0.50.5(1)k(\sigma_{0})_{k}(s_{1})=0.5-0.5(-1)^{k} and the amount of mass of μk\mu_{k} on S1S_{1} is (σ0)k(s0)=0.5+0.5(1)k(\sigma_{0})_{k}(s_{0})=0.5+0.5(-1)^{k}.

This means that if uS0u\in S_{0},

wS0ξk(w)\displaystyle\sum_{w\in S_{0}}\xi_{k}(w) =wS0μk(w)wS0νk(w)\displaystyle=\sum_{w\in S_{0}}\mu_{k}(w)-\sum_{w\in S_{0}}\nu_{k}(w)
=(σ0)k(s0)(σβ)k(s0)\displaystyle=(\sigma_{0})_{k}(s_{0})-(\sigma_{\beta})_{k}(s_{0})
=0.5+0.5(1)k(0.5+0.5(2β1)k)\displaystyle=0.5+0.5(-1)^{k}-(0.5+0.5(2\beta-1)^{k})

and

wS1ξk(w)\displaystyle\sum_{w\in S_{1}}\xi_{k}(w) =wS1μk(w)wS1νk(w)\displaystyle=\sum_{w\in S_{1}}\mu_{k}(w)-\sum_{w\in S_{1}}\nu_{k}(w)
=(σ0)k(s1)(σβ)k(s1)\displaystyle=(\sigma_{0})_{k}(s_{1})-(\sigma_{\beta})_{k}(s_{1})
=0.50.5(1)k(0.50.5(2β1)k).\displaystyle=0.5-0.5(-1)^{k}-(0.5-0.5(2\beta-1)^{k}).

Then the total positive mass of ξ2k\xi_{2k} is

wS0ξ2k(w)=0.5+0.5(1)2k(0.5+0.5(2β1)2k)=0.50.5(12β)2k\sum_{w\in S_{0}}\xi_{2k}(w)=0.5+0.5(-1)^{2k}-(0.5+0.5(2\beta-1)^{2k})=0.5-0.5(1-2\beta)^{2k}

and the total positive mass of ξ2k+1\xi_{2k+1} is

wS1ξ2k+1(w)=0.50.5(1)2k+1(0.50.5(2β1)2k+1)=0.50.5(12β)2k+1.\sum_{w\in S_{1}}\xi_{2k+1}(w)=0.5-0.5(-1)^{2k+1}-(0.5-0.5(2\beta-1)^{2k+1})=0.5-0.5(1-2\beta)^{2k+1}.

Thus, for some NN, the sum of the positive mass of ξk\xi_{k} is 0.50.5(12β)k0.5-0.5(1-2\beta)^{k} for all kNk\geq N.

If uS1u\in S_{1}, then we have that wS0ξk(w)=(σ0)k(s1)(σβ)k(s0)\displaystyle\sum_{w\in S_{0}}\xi_{k}(w)=(\sigma_{0})_{k}(s_{1})-(\sigma_{\beta})_{k}(s_{0}) and we have that wS1ξk(w)=(σ0)k(s0)(σβ)k(s1)\displaystyle\sum_{w\in S_{1}}\xi_{k}(w)=(\sigma_{0})_{k}(s_{0})-(\sigma_{\beta})_{k}(s_{1}). By calculating this out analogously to above, we see that if uS1u\in S_{1} there exists some NN such that the sum of positive mass of ξk\xi_{k} is 0.5+0.5(12β)k0.5+0.5(1-2\beta)^{k} for all kNk\geq N. ∎

We now know that the Wasserstein distance will be the sum of positive mass, and we know exactly what the sum of positive mass will eventually be. Thus, we know exactly what the Wasserstein distance will eventually be. The next theorem therefore states explicitly the rate of convergence of the Wasserstein distance when β<1\beta<1 and W=12W=\frac{1}{2}.

Theorem 5.7.

For any Guvab where W=12W=\frac{1}{2} and β<1\beta<1, for some NN it will be true that for all kNk\geq N, we have that |Wk12|=0.5|12β|k|W_{k}-\frac{1}{2}|=0.5|1-2\beta|^{k}.

Proof.

Corollary 5.3 tells us that for some N1N_{1}, we will have Wk=12wG|ξk(w)|W_{k}=\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)| for all kN1k\geq N_{1}. Lemma 5.6 tells us that for some N2N_{2}, we will have 12wG|ξk(w)|=0.5+0.5(12β)k\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|=0.5+0.5(1-2\beta)^{k} for all kN2k\geq N_{2} or we will have 12wG|ξk(w)|=0.50.5(12β)k\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|=0.5-0.5(1-2\beta)^{k} for all kN2k\geq N_{2}. This means that for all kN2k\geq N_{2}, we have |(12wG|ξk(w)|)12|=0.5|12β|k|(\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|)-\frac{1}{2}|=0.5|1-2\beta|^{k}. Thus, for all kmax(N1,N2)k\geq\max(N_{1},N_{2}), we have

|Wk12|=|(12wG|ξk(w)|)12|=0.5|12β|k.\left|W_{k}-\frac{1}{2}\right|=\left|\left(\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|\right)-\frac{1}{2}\right|=0.5|1-2\beta|^{k}.

Finally, we want to characterize when the Wasserstein distance is eventually constant when W=12W=\frac{1}{2} and β<1\beta<1. This will fit into our larger characterization of eventual constancy for all Guvabs with β<1\beta<1.

Corollary 5.8.

When W=12W=\frac{1}{2} and β<1\beta<1, we have that ρ<\rho<\infty if and only if β=12\beta=\frac{1}{2}.

Proof.

This follows directly from Theorem 5.7. ∎

6 Convergence when W=0W=0

In this section we consider the case of Guvabs where W=0W=0 and β<1\beta<1. Recall that these are exactly the Guvabs enumerated in Theorem 3.4 for which β<1\beta<1. We start by showing that the rate of convergence of {W2k}\{W_{2k}\} is exponential when it is not eventually constant. By an analogous argument, the rate of convergence of {W2k+1}\{W_{2k+1}\} is exponential when it is not eventually constant. We will then investigate exactly when {Wk}\{W_{k}\} is eventually constant.

Theorem 6.3 states that unless it is eventually constant, the rate of convergence of {W2k}\{W_{2k}\} is exponential, and in particular W2kcλeven2kW_{2k}\sim c\cdot\lambda_{\textrm{even}}^{2k}. We go about proving this by showing in the next two lemmas that W2kW_{2k} must be one of finitely many expressions, all of which are approximately some exponential.

The next lemma shows that W2kW_{2k} must be one of finitely many expressions.

Lemma 6.1.

For any Guvab 𝒢\mathcal{G}, there exists a finite set F={fi,f2,,fm}F=\{f_{i},f_{2},\ldots,f_{m}\} of 1-Lipschitz functions fi:V(G)f_{i}:V(G)\to\mathbb{R} such that for all kk there exists fFf\in F such that Wk=wGf(w)ξk(w).\displaystyle W_{k}=\sum_{w\in G}f(w)\xi_{k}(w).

Proof.

We consider the set LL of possible 1-Lipschitz functions \ell on the graph GG such that wV(G)(w)=0\sum_{w\in V(G)}\ell(w)=0 (any other 1-Lipschitz function can be transformed into such a 1-Lipschitz function by adding some value to all entries). The criteria for a function \ell to be a 1-Lipschitz function are that for each pair of vertices w1w_{1} and w2w_{2} we have that (w1)(w2)d(w1,w2)\ell(w_{1})-\ell(w_{2})\leq d(w_{1},w_{2}) and (w2)(w1)d(w1,w2)\ell(w_{2})-\ell(w_{1})\leq d(w_{1},w_{2}). We also have that wG(w)=0\sum_{w\in G}\ell(w)=0. These each form hyperplanes in |V(G)|\mathbb{R}^{|V(G)|}. Additionally, from these criteria we know that none of the entries of \ell can be more than |V(G)||V(G)| because then since the max distance between any two vertices is |V(G)||V(G)| there would be no negative entries. Thus, the set of 1-Lipschitz functions forms a closed set bounded by a polytope in |V(G)|\mathbb{R}^{|V(G)|}. For any cost function CC on the graph GG, we have that wV(G)C(w)(w)\sum_{w\in V(G)}C(w)\ell(w) is a linear function on LL. Thus argmaxLwV(G)C(w)(w)\displaystyle\textrm{argmax}_{\ell\in L}\sum_{w\in V(G)}C(w)\ell(w) is one of the corners of the polytope. There are finitely many of these corners, corresponding to finitely many 1-Lipschitz functions {f1,f2,,fm}\{f_{1},f_{2},\ldots,f_{m}\}. We also know that Wk=maxLwV(G)ξk(w)(w)W_{k}=\max_{\ell\in L}\sum_{w\in V(G)}\xi_{k}(w)\ell(w), so it is thus maximizing the cost function ξ\xi, and thus for all kk there exists fFf\in F such that Wk=wGf(w)ξk(w)\displaystyle W_{k}=\sum_{w\in G}f(w)\xi_{k}(w). ∎

We now know that W2kW_{2k} will be one of finitely many expressions. The next lemma shows that each of these expressions is approximately exponential.

Lemma 6.2.

For any Guvab 𝒢\mathcal{G} for which W=0W=0 and any 1-Lipschitz function ff, there exists some 0<λf<10<\lambda_{f}<1 and some constant cfc_{f} such that

wGf(w)ξ2k(w)cfλf2k\sum_{w\in G}f(w)\xi_{2k}(w)\sim c_{f}\cdot\lambda_{f}^{2k}

unless there exists some NN such that for all k>Nk>N we have that wGf(w)ξ2k(w)=0\sum_{w\in G}f(w)\xi_{2k}(w)=0.

Proof.

Assume that there does not exist any NN such that for all k>Nk>N we have that wGf(w)ξ2k(w)=0\sum_{w\in G}f(w)\xi_{2k}(w)=0.

We know by Lemma 2.10 that for all vertices ww, there exist some constants ciwc^{w}_{i} such that for all k1k\geq 1,

ξ2k(w)=i=1mciwλi2k=i=1mciw(λi2)k=i=1nciw(λi2)k\xi_{2k}(w)=\sum_{i=1}^{m}c^{w}_{i}\lambda_{i}^{2k}=\sum_{i=1}^{m}c^{w}_{i}(\lambda_{i}^{2})^{k}=\sum_{i=1}^{n}c^{w}_{i}(\lambda_{i}^{2})^{k}

where in the last sum the λi2\lambda_{i}^{2} are all distinct positive constants (by combining like terms in the sum with mm terms to get a sum with nn terms). Then

wGf(w)ξ2k(w)=wGf(w)i=1nciw(λi2)k.\sum_{w\in G}f(w)\xi_{2k}(w)=\sum_{w\in G}f(w)\sum_{i=1}^{n}c^{w}_{i}(\lambda_{i}^{2})^{k}.

Thus, there exist constants cf1,cfnc_{f}^{1},\ldots c_{f}^{n} such that wGf(w)ξ2k(w)=i=1ncfi(λi2)k.\sum_{w\in G}f(w)\xi_{2k}(w)=\sum_{i=1}^{n}c_{f}^{i}(\lambda_{i}^{2})^{k}. Let λf2=maxi,cfi0λi2\lambda_{f}^{2}=\max_{i,c_{f}^{i}\neq 0}\lambda_{i}^{2} (this is well-defined since if it wasn’t well defined we would have wGf(w)ξ2k(w)=0\sum_{w\in G}f(w)\xi_{2k}(w)=0 for all k1k\geq 1). Let cfc_{f} be the constant corresponding to this λf2\lambda_{f}^{2}. Then

wGf(w)ξ2k(w)cfλf2k=i=1ncfi(λi2)kcfλf2k=1+O(c2k),\frac{\sum_{w\in G}f(w)\xi_{2k}(w)}{c_{f}\cdot\lambda_{f}^{2k}}=\frac{\sum_{i=1}^{n}c_{f}^{i}(\lambda_{i}^{2})^{k}}{c_{f}\cdot\lambda_{f}^{2k}}=1+O(c^{2k}),

where 0<c<10<c<1. Thus we have that

wGf(w)ξ2k(w)cfλf2k.\sum_{w\in G}f(w)\xi_{2k}(w)\sim c_{f}\cdot\lambda_{f}^{2k}.

We now have all the pieces we need to show that W2kW_{2k} is approximately some exponential. The following theorem finishes off the proof.

Theorem 6.3.

For any Guvab 𝒢\mathcal{G} for which W=0W=0 and {W2k}\{W_{2k}\} is not eventually constant, we have that there exists some 0<λeven<10<\lambda_{\emph{{even}}}<1 and some c>0c>0, such that W2kcλeven2kW_{2k}\sim c\cdot\lambda_{\emph{{even}}}^{2k}.

Proof.

By Lemma 6.1, there exists some set F={f1,fn}F=\{f_{1},\ldots f_{n}\} of 1-Lipschitz functions fi:V(G)f_{i}:V(G)\to\mathbb{R} such that for all kk there exists fFf\in F such that W2k=wGf(w)ξ2k(w)\displaystyle W_{2k}=\sum_{w\in G}f(w)\xi_{2k}(w). Furthermore, by Lemma 6.2, for each of these fFf\in F there exists some λf\lambda_{f} and some positive constant cfc_{f} such that wGf(w)ξ2k(w)cfλf2k\sum_{w\in G}f(w)\xi_{2k}(w)\sim c_{f}\cdot\lambda_{f}^{2k}, unless there exists some NN such that for all k>Nk>N we have that wGf(w)ξ2k(w)=0\sum_{w\in G}f(w)\xi_{2k}(w)=0. If for all fFf\in F, there exists some NN such that for all k>Nk>N we have wGf(w)ξ2k(w)=0\sum_{w\in G}f(w)\xi_{2k}(w)=0, then we have that {W2k}\{W_{2k}\} is eventually constant at 0. Otherwise, let F~\tilde{F} be the set of functions ff for which λf\lambda_{f} is well-defined. Then let λeven\lambda_{\textrm{even}} be maxfF~λf\max_{f\in\tilde{F}}\lambda_{f}. Let FF~F^{\prime}\subset\tilde{F} be the set of ff such that λf=λeven\lambda_{f}=\lambda_{\textrm{even}}, and let cc be maxfFcf\max_{f\in F^{\prime}}c_{f}. Finally, let F~\mathcal{F}\subset\tilde{F} be the set of fF~f\in\tilde{F} such that λf=λeven\lambda_{f}=\lambda_{\textrm{even}} and cf=cc_{f}=c. Then for all fFf\in F such that ff\notin\mathcal{F}, there exists some NN such that for all kNk\geq N we have

wGf(w)ξ2k(w)<maxfwGf(w)ξ2k(w)W2k.\sum_{w\in G}f(w)\xi_{2k}(w)<\max_{f\in\mathcal{F}}\sum_{w\in G}f(w)\xi_{2k}(w)\leq W_{2k}.

Thus, since W2kW_{2k} must be the output of some 1-Lipschitz function, there exists some NN such that for all kNk\geq N we have W2k=wGf(w)ξ2k(w)W_{2k}=\sum_{w\in G}f(w)\xi_{2k}(w) for some ff\in\mathcal{F}, since it cannot be the output of any 1-Lipschitz function ff\notin\mathcal{F}. However, for all ff\in\mathcal{F}, we have that wGf(w)ξ2k(w)cλeven2k\sum_{w\in G}f(w)\xi_{2k}(w)\sim c\cdot\lambda_{\textrm{even}}^{2k}. Thus for all kNk\geq N, we have that W2kcλeven2kW_{2k}\sim c\cdot\lambda_{\textrm{even}}^{2k}. Hence W2kcλeven2kW_{2k}\sim c\cdot\lambda_{\textrm{even}}^{2k}. ∎

Remark 6.4.

Analogously, for any Guvab 𝒢\mathcal{G} for which W=0W=0 and {W2k+1}\{W_{2k+1}\} is not eventually constant, we have that there exists some 0<λodd<10<\lambda_{\emph{{odd}}}<1 and some c>0c>0, such that W2k+1cλodd2k+1W_{2k+1}\sim c\cdot\lambda_{\emph{{odd}}}^{2k+1}.

We now seek to explicitly characterize all the cases where W=0W=0 and WkW_{k} is eventually constant. We start by understanding why we only need to consider the first few terms of {Wk}\{W_{k}\} to characterize all of these cases.

Lemma 6.5.

When limkWk=0\displaystyle\lim_{k\to\infty}W_{k}=0, if there exists some N0N\geq 0 such that {W(μk,νk)}kN\{W(\mu_{k},\nu_{k})\}_{k\geq N} is a constant sequence, then {W(μk,νk)}k1\displaystyle\{W(\mu_{k},\nu_{k})\}_{k\geq 1} is also a constant sequence.

Proof.

By Lemma 2.10, if we let the distinct eigenvalues of the transition matrices be λ1,,λn\lambda_{1},\ldots,\lambda_{n}, then for any vertex ww and for any k1k\geq 1 we can write (μkνk)w=i=1nciwλik(\mu_{k}-\nu_{k})_{w}=\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k} for some constants c1w,,cnwc^{w}_{1},\ldots,c^{w}_{n}. Note that if {W(μk,νk)}kN\{W(\mu_{k},\nu_{k})\}_{k\geq N} is a constant sequence, 0=limkWk=W(μk,νk)0=\lim_{k\to\infty}W_{k}=W(\mu_{k},\nu_{k}) for all kNk\geq N. Thus for any vertex ww, we will have i=1nciwλik=0\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k}=0 for all kNk\geq N.

Suppose that for some ii, we have that ciwc^{w}_{i} and λi\lambda_{i} are nonzero. Then let Λ\Lambda be the set of all λi\lambda_{i} for which ciwc^{w}_{i} and λi\lambda_{i} are nonzero. Then let λm=maxλΛ|λ|\lambda_{m}=\max_{\lambda\in\Lambda}|\lambda|. If there is only one λiΛ\lambda_{i}\in\Lambda such that |λi|=λm|\lambda_{i}|=\lambda_{m}, then for some NN, for all k>Nk>N we will have that |ciwλik|>ji|cjwλjk||c^{w}_{i}\lambda_{i}^{k}|>\sum_{j\neq i}|c^{w}_{j}\lambda_{j}^{k}| so the left-hand-side term will dominate and i=1nciwλik\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k} will be nonzero. Then 0limkWk0\neq\lim_{k\to\infty}W_{k}. If there is more than one λΛ\lambda\in\Lambda such that |λ|=λm|\lambda|=\lambda_{m}, then those two λ\lambdas will be λm\lambda_{m} and λm=λm\lambda_{m^{\prime}}=-\lambda_{m}, since those are the only two numbers with absolute value λm\lambda_{m}. We know that cmwλmkc^{w}_{m}\lambda_{m}^{k} will stay the same sign regardless of kk, while cmwλmkc^{w}_{m^{\prime}}\lambda_{m^{\prime}}^{k} will switch sign with parity. Thus, for one of the parities, cmwλmkc^{w}_{m}\lambda_{m}^{k} and cmwλmkc^{w}_{m^{\prime}}\lambda_{m^{\prime}}^{k} will have the same sign. Thus, for some NN, either for all even k>Nk>N or for all odd k>Nk>N, we will have that |cmwλmk+cmwλmk|>jm,m|cjwλjk||c^{w}_{m}\lambda_{m}^{k}+c^{w}_{m^{\prime}}\lambda_{m^{\prime}}^{k}|>\sum_{j\neq m,m^{\prime}}|c^{w}_{j}\lambda_{j}^{k}|, so the left-hand-side term will dominate and i=1nciwλik\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k} will be nonzero. Then 0limkWk0\neq\lim_{k\to\infty}W_{k}. Thus, we must have for all 1in1\leq i\leq n that either ciwc^{w}_{i} or λi\lambda_{i} is 0.

This means that for all 1in1\leq i\leq n, either ciwc_{i}^{w} or λi\lambda_{i} is 0. Thus, for all k1k\geq 1, we have that ciwλik=0c_{i}^{w}\lambda_{i}^{k}=0. Therefore (μkνk)w=i=1nciwλik=0(\mu_{k}-\nu_{k})_{w}=\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k}=0, so μkνk\mu_{k}-\nu_{k} will be 0 at all vertices, so W(μk,νk)=0W(\mu_{k},\nu_{k})=0 for all k1k\geq 1. ∎

With this lemma established, we proceed to characterize all the cases when the Wasserstein distance is eventually constant in the case where W=0W=0.

Theorem 6.6.

When limkWk=0\lim_{k\to\infty}W_{k}=0, we have that WkW_{k} is eventually constant if and only if one of the following holds:

  • α=β=0\alpha=\beta=0 and N(u)=N(v)N(u)=N(v),

  • α=β=1degu+1\displaystyle\alpha=\beta=\frac{1}{\deg u+1}, the edge {u,v}E(G)\{u,v\}\in E(G), and if the edge {u,v}\{u,v\} were removed from E(G)E(G) then u,vu,v would have N(u)=N(v)N(u)=N(v),

  • α=β\alpha=\beta and u=vu=v.

Proof.

We know by Lemma 6.5 that if limkWk=0\lim_{k\to\infty}W_{k}=0 and WkW_{k} is eventually always 0, then μ1=ν1\mu_{1}=\nu_{1} and μ2=ν2\mu_{2}=\nu_{2}. Let μ1=ν1\mu_{1}=\nu_{1} be ϕ\phi. Recall that PαP_{\alpha} is the transition matrix for XX and PβP_{\beta} is the transition matrix for YY. Further recall that Pα=αI+(1α)PP_{\alpha}=\alpha I+(1-\alpha)P and Pβ=βI+(1β)PP_{\beta}=\beta I+(1-\beta)P. Then we have ϕPα=ϕPβ\phi P_{\alpha}=\phi P_{\beta}, so ϕ(αI+(1α)P)=ϕ(βI+(1β)P)\phi(\alpha I+(1-\alpha)P)=\phi(\beta I+(1-\beta)P). Then ϕ((βα)I+(αβ)P)=0\phi((\beta-\alpha)I+(\alpha-\beta)P)=0.

If αβ\alpha\neq\beta then dividing out by (αβ)(\alpha-\beta), we get ϕP=ϕ\phi P=\phi. If such a ϕ\phi exists, it must be the stationary distribution π\pi. Then ϕ=μ0P=𝟙uP=π\phi=\mu_{0}P=\mathbbm{1}_{u}P=\pi. However, (𝟙uP)u=Pu,u=0(\mathbbm{1}_{u}P)_{u}=P_{u,u}=0 by definition of PP, and πu=deg(u)2|E(G)|>0\pi_{u}=\frac{\deg(u)}{2|E(G)|}>0. Thus, we cannot have 𝟙uP=π\mathbbm{1}_{u}P=\pi, so we cannot have αβ\alpha\neq\beta.

This means we have that α=β\alpha=\beta.

We also know that, given that limkWk=0\lim_{k\to\infty}W_{k}=0, if μ1=ν1\mu_{1}=\nu_{1} and α=β\alpha=\beta, then μk=μ1(Pα)k1=ν1(Pα)k1=νk\mu_{k}=\mu_{1}(P_{\alpha})^{k-1}=\nu_{1}(P_{\alpha})^{k-1}=\nu_{k} for all k1k\geq 1 so WkW_{k} is eventually always 0.

It therefore suffices to characterize the cases where limkWk=0\lim_{k\to\infty}W_{k}=0 and μ1=ν1\mu_{1}=\nu_{1} and α=β\alpha=\beta. We first note that if u=vu=v and α=β\alpha=\beta, we are done. Otherwise, we assume that α=β\alpha=\beta and casework on the values of α\alpha to determine which cases yield limkWk=0\lim_{k\to\infty}W_{k}=0 and μ1=ν1\mu_{1}=\nu_{1}.

If α=0\alpha=0, we need that uu and vv have the same neighbor set, since if uu had some neighbor nn that was not adjacent to vv then μ1\mu_{1} would have nonzero mass at nn and ν1\nu_{1} would not. We will also show that this is a sufficient condition. If uu and vv have the same neighbor set then deg(u)=deg(v)\deg(u)=\deg(v). For each neighbor nn of uu and vv, we have that μ1(n)=1αdeg(u)=1βdeg(v)=ν1(n)\mu_{1}(n)=\frac{1-\alpha}{\deg(u)}=\frac{1-\beta}{\deg(v)}=\nu_{1}(n) and for all other vertices ww, we have that μ1(w)=ν1(w)=0\mu_{1}(w)=\nu_{1}(w)=0. Thus μ1=ν1\mu_{1}=\nu_{1}. We also know that limkWk=0\lim_{k\to\infty}W_{k}=0 by Theorem 3.4 since α=β=0\alpha=\beta=0 and for any neighbor nn of uu, the path unvu\to n\to v has an even number of steps.

If 0<α<10<\alpha<1, we first note that we need uu and vv to be adjacent, since μ1(u)=α>0\mu_{1}(u)=\alpha>0 and ν1(u)=0\nu_{1}(u)=0 if uu and vv are not adjacent. When uu and vv are adjacent we have that ν1(u)=1αdeg(u)\nu_{1}(u)=\frac{1-\alpha}{\deg(u)}, so since ν1(u)=μ1(u)\nu_{1}(u)=\mu_{1}(u) we have 1αdeg(u)=α\frac{1-\alpha}{\deg(u)}=\alpha, which yields α=1degu+1\alpha=\frac{1}{\deg u+1}. We also note that, similarly to before, aside from the edge {u,v}\{u,v\} we have that uu and vv need to have the same set of neighbors because if there was some vertex nu,vn\neq u,v such that nun\sim u and n≁vn\not\sim v, then μ1\mu_{1} would have nonzero mass at nn and ν1\nu_{1} would not. We will finish by showing that if α,β,u,v\alpha,\beta,u,v satisfy these conditions, then μ1=ν1\mu_{1}=\nu_{1} and limkWk=0\lim_{k\to\infty}W_{k}=0.

Suppose that the conditions are satisfied. We know that deg(u)=deg(v)\deg(u)=\deg(v), so μ1(u)=α=1αdeg(u)=ν1(u)\mu_{1}(u)=\alpha=\frac{1-\alpha}{\deg(u)}=\nu_{1}(u) and similarly μ1(v)=ν1(v)\mu_{1}(v)=\nu_{1}(v). We also know that for all nu,vn\neq u,v such that nun\sim u and nvn\sim v, we have that μ1(n)=1αdeg(u)=1βdeg(v)=ν1(n)\mu_{1}(n)=\frac{1-\alpha}{\deg(u)}=\frac{1-\beta}{\deg(v)}=\nu_{1}(n) and for all other vertices ww, we have that μ1(w)=ν1(w)=0\mu_{1}(w)=\nu_{1}(w)=0. Thus μ1=ν1\mu_{1}=\nu_{1}. Also, limkWk=0\lim_{k\to\infty}W_{k}=0 by Theorem 3.4 since 0<αβ<10<\alpha\leq\beta<1. ∎

7 Convergence when β=1\beta=1

We next consider the case of Guvabs where β=1\beta=1. Similarly to the W=0W=0 case, we show that the rate of convergence is exponential unless the distance is eventually constant. Furthermore, when the Wasserstein distance is eventually constant, it is constant after exactly 1 step.

We first show that the rate of convergence is exponential unless the distance is eventually constant.

Lemma 7.1.

Consider a Guvab where β=1\beta=1. Either {W2k}\{W_{2k}\} is eventually constant, or for some cec_{e} and some λe\lambda_{e}, we have that |W2klimkW2k|ceλe2k|W_{2k}-\lim_{k\to\infty}W_{2k}|\sim c_{e}\cdot\lambda_{e}^{2k}. Also, either {W2k+1}\{W_{2k+1}\} is eventually constant, or for some coc_{o} and some λo\lambda_{o}, we have that |W2k+1limkW2k+1|coλo2k+1|W_{2k+1}-\lim_{k\to\infty}W_{2k+1}|\sim c_{o}\cdot\lambda_{o}^{2k+1}.

Proof.

When β=1\beta=1, we know that Wk=wGμk(w)d(w,v)=ciλikW_{k}=\sum_{w\in G}\mu_{k}(w)\textrm{d}(w,v)=\sum c_{i}\cdot\lambda_{i}^{k} for some constants cic_{i} and λi\lambda_{i}. Using the same reasoning as in the proof of Lemma 6.2, we know that (unless ciλi2k\sum c_{i}\cdot\lambda_{i}^{2k} is eventually constant) ciλi2kceλe2k\sum c_{i}\cdot\lambda_{i}^{2k}\sim c_{e}\cdot\lambda_{e}^{2k} for some ce,λec_{e},\lambda_{e}. We also know that (unless ciλi2k+1\sum c_{i}\cdot\lambda_{i}^{2k+1} is eventually constant) ciλi2k+1coλo2k+1\sum c_{i}\cdot\lambda_{i}^{2k+1}\sim c_{o}\cdot\lambda_{o}^{2k+1} for some co,λoc_{o},\lambda_{o}. Thus, we attain the desired result. ∎

We now show that if the distance is eventually constant, it is constant after 1 step.

Lemma 7.2.

When β=1\beta=1, if there exists some N0N\geq 0 such that {W(μn,νn)}nN\{W(\mu_{n},\nu_{n})\}_{n\geq N} is a constant sequence, then {W(μn,νn)}n1\{W(\mu_{n},\nu_{n})\}_{n\geq 1} is also a constant sequence.

Proof.

When β=1\beta=1, we have that WkW_{k} is wGμk(w)d(w,v)=ciλik\sum_{w\in G}\mu_{k}(w)\textrm{d}(w,v)=\sum c_{i}\cdot\lambda_{i}^{k} for some constants cic_{i} and λi\lambda_{i}. Thus, for similar reasons as in the proof of Lemma 6.5, all the cic_{i} for λi0,1\lambda_{i}\neq 0,1 are 0 so {W(μn,νn)}n1\{W(\mu_{n},\nu_{n})\}_{n\geq 1} is constant. ∎

When W=0W=0, using a lemma similar to Lemma 7.2 we were able to explicitly characterize exactly when WkW_{k} was eventually constant. Lemma 7.2 provides an important step towards making a similar characterization when β=1\beta=1. To exemplify how a characterization could be made when β=1\beta=1, we provide a family of examples of Guvabs where WkW_{k} is eventually constant.

Definition 7.3.

We define a Gluvab 𝒥\mathcal{J} to be a Guvab that satisfies all of the following conditions:

  • β=1\beta=1,

  • 2d(u,v)=maxwGd(w,v)2\textrm{d}(u,v)=\max_{w\in G}\textrm{d}(w,v),

  • if d(x,v)=maxwGd(w,v)\textrm{d}(x,v)=\max_{w\in G}\textrm{d}(w,v), then for all nxn\sim x we have that d(n,v)<d(x,v)\textrm{d}(n,v)<\textrm{d}(x,v),

  • if 0<d(x,v)<maxwGd(w,v)0<\textrm{d}(x,v)<\max_{w\in G}\textrm{d}(w,v), then for exactly half of the neighbors nxn\sim x we have that d(n,v)<d(x,v)\textrm{d}(n,v)<\textrm{d}(x,v), and for exactly the other half we have that d(n,v)>d(x,v)\textrm{d}(n,v)>\textrm{d}(x,v).

Example 7.4.

Consider a Guvab with G=P3G=P_{3} (where P3P_{3} is the path graph with 33 vertices), vv is the vertex of P3P_{3} with degree 2, uu is either of the other two vertices, α=13\alpha=\frac{1}{3}, and β=1\beta=1. One can check that this Guvab is a Gluvab.

Lemma 7.5.

Any Gluvab 𝒥\mathcal{J} satisfies W0=W1=W_{0}=W_{1}=\cdots.

Proof.

We aim to prove this lemma by essentially reducing each Gluvab to a random walk on a path graph. In particular, each vertex mim_{i} in the path corresponds to the set of vertices {wG:d(w,v)=i}\{w\in G:d(w,v)=i\} at a given distance ii from vv. After this, the desired result follows without much difficulty.

Construct the Markov chain MM that is simply a random walk with laziness α\alpha on a path of length 2d(u,v)2\textrm{d}(u,v) with vertices m0,m1,,md(u,v),,m2d(u,v)m_{0},m_{1},\ldots,m_{\textrm{d}(u,v)},\ldots,m_{2\textrm{d}(u,v)}. We let the starting point of this Markov chain be md(u,v)m_{\textrm{d}(u,v)}. It suffices to show that for all ii, we have that wG,d(w,v)=iμk(w)=Mk(mi)\displaystyle\sum_{w\in G,\,\textrm{d}(w,v)=i}\mu_{k}(w)=M_{k}(m_{i}), because that would mean that the distribution is always symmetric about uu so μk\mu_{k} always has the same average distance d(u,v)\textrm{d}(u,v).

We will show by induction on kk that for all ii,

wG,d(w,v)=iμk(w)=Mk(mi).\sum_{w\in G,\,\textrm{d}(w,v)=i}\mu_{k}(w)=M_{k}(m_{i}).

Base case: At k=0k=0, we have that μk\mu_{k} is only nonzero at uu and that MkM_{k} is only nonzero at md(u,v)m_{\textrm{d}(u,v)}, so the claim holds.

Inductive step: We suppose that this claim holds for kk. We will show that it holds for k+1k+1. We know the following facts about MM:

  • Mk+1(m0)=1α2Mk(m1)+αMk(m0)M_{k+1}(m_{0})=\frac{1-\alpha}{2}M_{k}(m_{1})+\alpha M_{k}(m_{0}),

  • Mk+1(m2d(u,v))=1α2Mk(m2d(u,v)1)+αMk(m2d(u,v))M_{k+1}(m_{2\textrm{d}(u,v)})=\frac{1-\alpha}{2}M_{k}(m_{2\textrm{d}(u,v)-1})+\alpha M_{k}(m_{2\textrm{d}(u,v)}),

  • Mk+1(m1)=1α2Mk(m2)+αMk(m1)+(1α)Mk(m0)M_{k+1}(m_{1})=\frac{1-\alpha}{2}M_{k}(m_{2})+\alpha M_{k}(m_{1})+(1-\alpha)M_{k}(m_{0}),

  • Mk+1(m2d(u,v)1)=1α2Mk(m2d(u,v)2)+αMk(m2d(u,v)1)+(1α)Mk(m2d(u,v))M_{k+1}(m_{2\textrm{d}(u,v)-1})=\frac{1-\alpha}{2}M_{k}(m_{2\textrm{d}(u,v)-2})+\alpha M_{k}(m_{2\textrm{d}(u,v)-1})+(1-\alpha)M_{k}(m_{2\textrm{d}(u,v)}),

  • for 1<i<2d(u,v)11<i<2\textrm{d}(u,v)-1, we have that Mk+1(mi)=αMk(mi)+1α2(Mk(mi1)+Mk(mi+1)).M_{k+1}(m_{i})=\alpha M_{k}(m_{i})+\frac{1-\alpha}{2}(M_{k}(m_{i-1})+M_{k}(m_{i+1})).

We now examine μk+1\mu_{k+1}, and in particular the amount of mass of μk+1\mu_{k+1} at each level. We let Sk(i)S_{k}(i) denote the mass of μk\mu_{k} at the iith level; in other words,

Sk(i)=wG,d(w,v)=iμk(w).S_{k}(i)=\displaystyle\sum_{w\in G,\,d(w,v)=i}\mu_{k}(w).

For all ii, we can calculate Sk+1(i)S_{k+1}(i) by considering the iith level and considering how much mass from each level from SkS_{k} goes to the iith level. This is possible because all vertices at the same level will have indistinguishable behavior with respect to their contribution to the iith level. By calculating the contribution of each different level to the iith level, we can check that

  • Sk+1(0)=1α2Sk(1)+αSk(0)S_{k+1}(0)=\frac{1-\alpha}{2}S_{k}(1)+\alpha S_{k}(0),

  • Sk+1(2d(u,v))=1α2Sk(2d(u,v)1)+αSk(2d(u,v))S_{k+1}(2\textrm{d}(u,v))=\frac{1-\alpha}{2}S_{k}(2\textrm{d}(u,v)-1)+\alpha S_{k}(2\textrm{d}(u,v)),

  • Sk+1(1)=1α2Sk(2)+αSk(1)+(1α)Sk(0)S_{k+1}(1)=\frac{1-\alpha}{2}S_{k}(2)+\alpha S_{k}(1)+(1-\alpha)S_{k}(0),

  • Sk+1(2d(u,v)1)=1α2Sk(2d(u,v)2)+αSk(2d(u,v)1)+(1α)Sk(2d(u,v))S_{k+1}(2\textrm{d}(u,v)-1)=\frac{1-\alpha}{2}S_{k}(2\textrm{d}(u,v)-2)+\alpha S_{k}(2\textrm{d}(u,v)-1)+(1-\alpha)S_{k}(2\textrm{d}(u,v)),

  • for 1<i<2d(u,v)11<i<2\textrm{d}(u,v)-1, we have that Sk+1(i)=αSk(i)+1α2(Sk(i1)+Sk(i+1)).S_{k+1}(i)=\alpha S_{k}(i)+\frac{1-\alpha}{2}(S_{k}(i-1)+S_{k}(i+1)).

This lines up exactly with our characterization of Mk+1M_{k+1}, so for all ii we have

wG,d(w,v)=iμk+1(w)=Mk+1(mi).\displaystyle\sum_{w\in G,\textrm{d}(w,v)=i}\mu_{k+1}(w)=M_{k+1}(m_{i}).

8 Main Convergence Theorems

Since we have shown that all Guvabs have W=1W=1 or W=12W=\frac{1}{2} or W=0W=0 or β=1\beta=1, and we have some understanding of the rate of convergence of the Wasserstein distance in each of these cases, we make some general statements about convergence that apply to all Guvabs. The following theorems sum up the general convergence results obtained from considering the each of the cases W=1W=1, W=12W=\frac{1}{2}, W=0W=0 and β=1\beta=1 in the previous sections.

The first theorem states that the rate of convergence of {W2k}\{W_{2k}\} and {W2k+1}\{W_{2k+1}\} is exponential unless it is eventually constant.

Theorem 8.1.

For any Guvab, we have that

  • either {W2k}\{W_{2k}\} is eventually constant, or there exists a constant λeven(1,1)\lambda_{\emph{{even}}}\in(-1,1) and a positive constant ceven>0c_{\emph{{even}}}>0 such that |W2klimkW2k|ceven|λeven|2k|W_{2k}-\lim_{k\to\infty}W_{2k}|\sim c_{\emph{{even}}}\cdot|\lambda_{\emph{{even}}}|^{2k},

  • either {W2k+1}\{W_{2k+1}\} is eventually constant, or there exists a constant λodd(1,1)\lambda_{\emph{{odd}}}\in(-1,1) and a positive constant codd>0c_{\emph{{odd}}}>0 such that |W2k+1limkW2k+1|codd|λodd|2k+1|W_{2k+1}-\lim_{k\to\infty}W_{2k+1}|\sim c_{\emph{{odd}}}\cdot|\lambda_{\emph{{odd}}}|^{2k+1}

Proof.

To begin, note that when β<1\beta<1, we have that WkW_{k} converges and W{1,12,0}W\in\{1,\frac{1}{2},0\} by Corollary 3.12. Further, when β=1\beta=1, Lemma 7.1 implies exactly that the desired result holds. Thus, it suffices to consider each of these cases W=1W=1, W=12W=\frac{1}{2}, and W=0W=0 separately.

First, when W=1,W=1, Theorem 4.11 implies {Wk}\{W_{k}\} is eventually constant (and hence, we have the same for {W2k}\{W_{2k}\} and {W2k+1}\{W_{2k+1}\}). This gives the desired result in the case W=1W=1.

When W=12,W=\frac{1}{2}, Theorem 5.7 implies that either β=12\beta=\frac{1}{2} and {Wk}\{W_{k}\} is eventually constant or else |WklimkWk|0.5|12β|2k|W_{k}-\lim_{k\to\infty}W_{k}|\sim 0.5\cdot|1-2\beta|^{2k} (and hence, we have the same for {W2k}\{W_{2k}\} and {W2k+1}\{W_{2k+1}\}). This gives the desired result for W=12.W=\frac{1}{2}.

Finally, we note that when W=0W=0, Theorem 6.3 (and Remark 6.4) gives exactly the desired result. Thus, having checked each case, we conclude the proof. ∎

The second theorem provides a characterization of when {Wk}\{W_{k}\} is eventually constant when β<1\beta<1.

Theorem 8.2.

When β<1\beta<1, we have that {Wk}\{W_{k}\} is eventually constant if and only if one of the following holds:

  • α=β=0\alpha=\beta=0, the graph GG is bipartite, and d(u,v)\textrm{d}(u,v) is odd,

  • α=0\alpha=0 and β=12\beta=\frac{1}{2}, and GG is bipartite,

  • α=β=0\alpha=\beta=0 and N(u)=N(v)N(u)=N(v),

  • α=β=1degu+1\displaystyle\alpha=\beta=\frac{1}{\deg u+1}, the edge {u,v}E(G)\{u,v\}\in E(G), and if the edge {u,v}\{u,v\} were removed from E(G)E(G) then u,vu,v would have N(u)=N(v)N(u)=N(v),

  • α=β\alpha=\beta and u=vu=v.

Proof.

To begin, note that when β<1\beta<1, we have that WkW_{k} converges and W{1,12,0}W\in\{1,\frac{1}{2},0\} by Corollary 3.12. Thus, it suffices to consider each of these cases where W=1W=1, W=12W=\frac{1}{2} and W=0W=0 separately.

First, we look at the case where W=1W=1. Note that, in this case, Theorem 4.11 implies that {Wk}\{W_{k}\} is always eventually constant. Further, by Theorem 3.9, we see this case is equivalent to α=β=0\alpha=\beta=0 and W0W\neq 0. Further, by Theorem 3.4, this case occurs exactly when α=β=0\alpha=\beta=0, the graph GG is bipartite, and d(u,v)\textrm{d}(u,v) is odd (i.e., the first item of the theorem statement).

Next, when W=12W=\frac{1}{2}, Corollary 5.8 implies {Wk}\{W_{k}\} is eventually constant exactly when β=12\beta=\frac{1}{2}. By Theorem 3.9, this case occurs exactly when α=0\alpha=0 and β=12\beta=\frac{1}{2}, and GG is bipartite (i.e., the second item of the theorem statement).

Finally, when W=0W=0, we see that Theorem 6.6 implies {Wk}\{W_{k}\} is eventually constant exactly when one of the following holds:

  • α=β=0\alpha=\beta=0 and N(u)=N(v)N(u)=N(v),

  • α=β=1degu+1\displaystyle\alpha=\beta=\frac{1}{\deg u+1}, the edge {u,v}E(G)\{u,v\}\in E(G), and if the edge {u,v}\{u,v\} were removed from E(G)E(G) then u,vu,v would have N(u)=N(v)N(u)=N(v),

  • α=β\alpha=\beta and u=vu=v.

Note that, each of these cases is indeed a case where W=0W=0 by Theorem 3.4, so this case is equivalent to the final three items of the theorem statement.

Thus, considering each of these cases together, we obtain the desired result. ∎

9 Open Problems

The theorems presented in this paper open up several new questions and directions for further research, which the reader is invited to consider. Specifically, given Theorem 8.1, the remaining questions regarding the behavior of Guvabs can be broken into three main categories: 1) determining when {W2k}\{W_{2k}\} and {W2k+1}\{W_{2k+1}\} are eventually constant, 2) in cases {W2k}\{W_{2k}\} and {W2k+1}\{W_{2k+1}\} are eventually constant, determining how long they take to become constant, and 3) determining cc and λ\lambda when {W2k}\{W_{2k}\} and {W2k+1}\{W_{2k+1}\} are not eventually constant. In this section, we break down what we have shown and what is left to be done regarding each of these questions.

By Theorem 8.2, we have characterized the cases where {Wk}\{W_{k}\} is constant in all cases where β<1\beta<1. Furthermore, in the cases of W=1W=1 and W=12W=\frac{1}{2}, we know that {W2k}\{W_{2k}\} is eventually constant if and only if {Wk}\{W_{k}\} is eventually constant, and similarly {W2k+1}\{W_{2k+1}\} is eventually constant if and only if {Wk}\{W_{k}\} is eventually constant. In the case of W=0W=0, it remains to characterize the cases where either {W2k}\{W_{2k}\} or {W2k+1}\{W_{2k+1}\} individually are eventually constant, but {Wk}\{W_{k}\} is not. Further, in the β=1\beta=1 case we lack a complete characterization of when {Wk}\{W_{k}\} is eventually constant.

Question 2)2) remains largely unanswered and is a promising direction for future work. The progress so far in this paper is restricted to fairly weak upper and lower bounds when W=1W=1, and characterizations of when {Wk}\{W_{k}\} is eventually constant when W=0W=0 and β=1\beta=1. One interesting problem is that of tighter bounds for the case where W=1W=1, and similar bounds for the case when W=12W=\frac{1}{2} and WW is eventually constant. Also, depending on the answers to Question 1, there may be Guvabs where only one of {W2k}\{W_{2k}\} and {W2k+1}\{W_{2k+1}\} is eventually constant. If we find a specific Guvab that satisfies these criteria, it will be interesting to determine how long this Guvab takes to have either {W2k}\{W_{2k}\} or {W2k+1}\{W_{2k+1}\} be eventually constant.

Answering question 3)3) will require specific knowledge of eigenvectors and eigenvalues. In full generality, this is difficult, so a potential direction for future work would be addressing it in specific examples.

10 Acknowledgements

We would like to thank our mentor, Pakawut Jiradilok, for providing us with important knowledge, guidance, and assistance throughout our project. We would also like to thank Supanat Kamtue for the problem idea and helpful thoughts and guidance. Finally, we would like to thank the PRIMES-USA program for making this project possible.

References

  • [BCL+18] David P Bourne, David Cushing, Shiping Liu, F Münch, and Norbert Peyerimhoff. Ollivier–ricci idleness functions of graphs. SIAM J. Discrete Math., 32(2):1408–1424, 2018.
  • [CK19] David Cushing and Supanat Kamtue. Long-scale ollivier ricci curvature of graphs:. Anal. Geom. Metr. Spaces, 7(1):22–44, 2019.
  • [CKK+20] David Cushing, Supanat Kamtue, Jack Koolen, Shiping Liu, Florentin Münch, and Norbert Peyerimhoff. Rigidity of the bonnet-myers inequality for graphs with respect to ollivier ricci curvature. Adv Math, 369:107188, 2020.
  • [DS91] Persi Diaconis and Daniel Stroock. Geometric bounds for eigenvalues of markov chains. Ann Appl Probab, pages 36–61, 1991.
  • [FZM+15] Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, and Tomaso Poggio. Learning with a wasserstein loss. arXiv preprint arXiv:1506.05439, 2015.
  • [JK21] Pakawut Jiradilok and Supanat Kamtue. Transportation distance between probability measures on the infinite regular tree. arXiv preprint arXiv:2107.09876, 2021.
  • [Kan06] Leonid V Kantorovich. On the translocation of masses. J. Math. Sci., 133(4):1381–1382, 2006.
  • [LLY11] Yong Lin, Linyuan Lu, and Shing-Tung Yau. Ricci curvature of graphs. Tohoku Math. J., 63(4):605–627, 2011.
  • [LP17] David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
  • [Oll09] Yann Ollivier. Ricci curvature of markov chains on metric spaces. J Funct Anal, 256(3):810–864, 2009.
  • [Oll11] Yann Ollivier. A visual introduction to riemannian curvatures and some discrete generalizations. Anal. Geom. Metr. Spaces, 56:197–219, 2011.
  • [PC19] Gabriel Peyré and Marco Cuturi. Computational optimal transport. Found. Trends Mach. Learn., 11(5-6):355–607, 2019.
  • [RTG00] Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis., 40(2):99–121, 2000.
  • [SGR+15] Romeil Sandhu, Tryphon Georgiou, Ed Reznik, Liangjia Zhu, Ivan Kolesov, Yasin Senbabaoglu, and Allen Tannenbaum. Graph curvature for differentiating cancer networks. Sci. Rep., 5(1):1–13, 2015.
  • [SGT16] Romeil S Sandhu, Tryphon T Georgiou, and Allen R Tannenbaum. Ricci curvature: An economic indicator for market fragility and systemic risk. Sci. Adv., 2(5):e1501495, 2016.
  • [Sin92] Alistair Sinclair. Improved bounds for mixing rates of markov chains and multicommodity flow. Comb. Probab. Comput., 1:351–370, 1992.
  • [SJB19] Jayson Sia, Edmond Jonckheere, and Paul Bogdan. Ollivier-ricci curvature-based method to community detection in complex networks. Sci. Rep., 9(1):1–12, 2019.
  • [vdHCL+21] Pim van der Hoorn, William J Cunningham, Gabor Lippner, Carlo Trugenberger, and Dmitri Krioukov. Ollivier-ricci curvature convergence in random geometric graphs. Phys. Rev. Res., 3(1):013211, 2021.
  • [WJB16] Chi Wang, Edmond Jonckheere, and Reza Banirazi. Interference constrained network control based on curvature. In 2016 American Control Conference (ACC), pages 6036–6041. IEEE, 2016.
  • [WX21] JunJie Wee and Kelin Xia. Ollivier persistent ricci curvature-based machine learning for the protein–ligand binding affinity prediction. J Chem Inf Model, 61(4):1617–1626, 2021.