On the Wasserstein Distance Between $k$ -Step Probability Measures on Finite Graphs

Sophia Benjamin¹¹1North Carolina School of Science and Mathematics, Durham, NC. , Email: sophia.r.benjamin@gmail.com, Arushi Mantri²²2Jesuit High School, Portland, OR. Email: arushi.mantri@gmail.com, Quinn Perian³³3Stanford Online High School, Palo Alto, CA. Email: quinn.perian@outlook.com

(October 19, 2021)

Abstract

We consider random walks $X,Y$ on a finite graph $G$ with respective lazinesses $\alpha,\beta\in[0,1]$ . Let $\mu_{k}$ and $\nu_{k}$ be the $k$ -step transition probability measures of $X$ and $Y$ . In this paper, we study the Wasserstein distance between $\mu_{k}$ and $\nu_{k}$ for general $k$ . We consider the sequence formed by the Wasserstein distance at odd values of $k$ and the sequence formed by the Wasserstein distance at even values of $k$ . We first establish that these sequences always converge, and then we characterize the possible values for the sequences to converge to. We further show that each of these sequences is either eventually constant or converges at an exponential rate. By analyzing the cases of different convergence values separately, we are able to partially characterize when the Wasserstein distance is constant for sufficiently large $k$ .

Keywords— Wasserstein distance, transportation plan, Guvab, random walk, $k$ -step probability distribution, laziness, convergence, finite graph

1 Introduction

Optimal transport theory concerns the minimum cost, called the transportation distance, of moving mass from one configuration to another. In this paper, the notion of transportation distance that we are concerned with is the $L^{1}$ transportation distance, which we refer to as the Wasserstein distance. The Wasserstein distance has applications in fields such as image processing, where a goal is to efficiently transform one image into another (e.g., [RTG00]), and machine learning, where a goal is to minimize some transport-related cost (e.g., [FZM⁺15]).

The application of Wasserstein distance that motivates this paper is the definition of $\alpha$ -Ricci curvature $\kappa_{\alpha}$ on graphs introduced by Lin, Lu, and Yau in [LLY11]:

\kappa_{\alpha}=1-\frac{W(m_{x}^{\alpha},m_{y}^{\alpha})}{\textrm{d}(x,y)}.

Here $\textrm{d}(x,y)$ is the graph distance between vertices $x$ and $y$ , while $m_{v}^{\alpha}$ is the 1-step transition probability measure of a random walk starting at vertex $v$ with laziness $\alpha$ , and $W(m_{x}^{\alpha},m_{y}^{\alpha})$ is the Wasserstein distance between $m_{x}^{\alpha}$ and $m_{y}^{\alpha}$ .

The $\alpha$ -Ricci curvature is a generalization of classical Ricci curvature, an object from Riemannian geometry that captures how volumes change as they flow along geodesics ([Oll11]). In [Oll09], Ollivier created the Ollivier-Ricci curvature to generalize the idea of Ricci curvature to discrete spaces, such as graphs. The Ollivier-Ricci curvature between $X$ and $Y$ is defined via the Wasserstein distance between the $1$ -step transition probability measures of random walks starting at $X$ and $Y$ . It captures roughly whether the neighborhoods of $X$ and $Y$ are closer together than $X$ and $Y$ themselves. The Ollivier-Ricci curvature is well-studied in geometry and graph theory ([JK21], [CK19], [BCL⁺18], [CKK⁺20], [vdHCL⁺21]), and is also used to study economic risk, cancer networks, and drug design, among other applications ([SGR⁺15], [SGT16], [SJB19], [WJB16], [WX21], [JK21]). Lin, Lu, and Yau further generalized the Ollivier-Ricci curvature to $\alpha$ -Ricci curvature ([LLY11]), allowing for the laziness $\alpha$ of the random walks considered to be greater than zero.

In [Oll09], Ollivier suggested exploring Ollivier-Ricci curvature on graphs at “larger and larger scales.” Thus, in this paper, we study the Wasserstein distance between $k$ -step probability measures of random walks with potentially nonzero laziness as $k$ gets larger and larger. Since $1$ -step probability distributions of random walks were used to study the initial “small-scale” $\alpha$ -Ricci curvature, these $k$ -step probability distributions are a natural way to understand curvature at “larger and larger scales.” Jiradilok and Kamtue ([JK21]) study these $k$ -step distributions for larger and larger $k$ on infinite regular trees; in this paper, we study them instead on finite graphs.

Given a finite, connected, simple graph, we consider a random walk with starting vertex $w$ and laziness $\alpha$ . The random walk is defined to be a Markov chain where at each step, we either stay at the current vertex with probability $\alpha$ or pick a neighboring vertex at random and move there. We then consider the probability distribution encoding the likelihood of being at each possible vertex after $k$ steps of this random walk, which is called a $k$ -step probability distribution, or $k$ -step probability measure.

Given two such random walks on one graph, starting at vertices $u,v$ and with respective lazinesses $\alpha,\beta$ , we define the Wasserstein distance between their two $k$ -step probability measures to be the minimum cost of moving between the two distributions. Here, moving 1 unit of mass across 1 edge costs 1 unit.

We can ask many questions about the Wasserstein distance at “larger and larger scales.” For instance, does the Wasserstein distance between the two $k$ -step probability distributions always converge as $k\to\infty$ ? Also, what does it converge to in different cases? Even more interestingly, what can we say about the rate of convergence? In particular, when does the distance eventually remain constant, and how long could it take to reach constancy?

In this paper, we show in all cases that either the Wasserstein distance converges or the Wasserstein distance at every other step converges. We also classify what the distance converges to in all cases, addressing the first and second questions.

We then seek to understand the rate of convergence of the Wasserstein distance. We reach two main results. First, addressing the third question, we show that unless the Wasserstein distance at every other step is eventually constant, its rate of convergence is exponential (Theorem 8.1). We also address the fourth question by providing a partial characterization of exactly when the Wasserstein distance is eventually constant (Theorem 8.2).

In Section 2, we provide formal definitions of key concepts used throughout the paper. In particular, we recall the definition of the Wasserstein distance and introduce the notion of a Guvab. A Guvab refers to a pair of random walks on a finite connected simple graph, and these Guvabs are the primary object we study in this paper. In Section 3, we classify for all possible Guvabs the limiting behavior of the Wasserstein distance, when the distance converges, and what the distance converges to. This characterization provides a natural way to classify the Guvabs into four categories based on their limiting behavior: $W=1$ ; $W=0$ ; $W=\frac{1}{2}$ ; and $\beta=1$ . In each of Sections 4, 5, 6, and 7, we consider one of these four categories of Guvabs and determine when the Wasserstein distance is eventually constant as well as examine the rate of convergence if the Wasserstein distance is not constant. Along the way, we encounter various interesting results about the different cases. Finally, in Section 8, we present main results about constancy and rate of convergence in general, obtained by considering each of these four cases individually.

2 Preliminaries

We begin with several formal definitions that we use in the remainder of the paper. We start by recalling graph theory terminology and the definition of Wasserstein distance on graphs. Then, we review random walks on graphs and define Guvabs. Finally, we briefly discuss terminology used to describe convergence.

In this paper, all graphs we consider are finite, connected, simple graphs. For a graph $G$ , let $V(G)$ be the vertex set of $G$ and $E(G)$ be the edge set of $G$ , i.e., the set of unordered pairs $\{v_{1},v_{2}\}$ where $v_{1},v_{2}$ are adjacent vertices in $G$ . Further, for any $v\in V(G)$ , let $N(v)$ be the neighbor set of $v$ . Finally, denote by $\textrm{d}(w_{1},w_{2})$ the graph distance between vertices $w_{1}$ and $w_{2}$ .

Definition 2.1.

Define a distribution on the graph $G$ to be a function $\mu:V(G)\to\mathbb{R}$ . We say $\mu$ is a nonnegative distribution if, for all $v\in V(G)$ , we have $\mu(v)\geq 0$ . A nonnegative distribution $\mu$ is a probability distribution if $\sum_{w\in V(G)}\mu(w)=1$ .

For convenience, we will denote by $\tilde{\textbf{0}}$ the distribution with value $0$ at all vertices, (i.e., for all $v\in V(G)$ , we have $\tilde{\textbf{0}}(v)=0$ ). In addition, we will refer to a distribution $\mu$ for which $\sum_{w\in V(G)}\mu(w)=0$ as a zero-sum distribution.

Given a graph $G$ , let $\{\mu_{i}\}_{i=0}^{\infty}$ be an infinite sequence of distributions. Suppose that $f:\mathbb{Z}_{\geq 0}\to\mathbb{Z}_{\geq 0}$ is a strictly increasing function such that for all vertices $w\in G$ , $\displaystyle\lim_{k\to\infty}\mu_{f(k)}(w)$ exists. Then denote by $\displaystyle\lim_{k\to\infty}\mu_{f(k)}$ the pointwise limit. Namely, for all $w\in V(G)$ , let $\displaystyle\left(\lim_{k\to\infty}\mu_{f(k)}\right)(w)$ be $\displaystyle\lim_{k\to\infty}(\mu_{f(k)}(w))$ .

For a given graph $G$ , let $D=D(G)$ be the set of all ordered pairs $(\mu,\nu)$ of distributions on $V(G)$ that satisfy $\sum_{w\in V(G)}\mu(w)=\sum_{w\in V(G)}\nu(w)$ . Further, let $D_{\geq 0}$ be the set of all ordered pairs $(\mu,\nu)\in D$ with $\mu,\nu$ nonnegative distributions.

We now introduce some terminology from optimal transport theory. We follow definitions equivalent to those in the book of Peyre and Cuturi [PC19].

In Definitions 2.2, 2.3, and 2.4, we let $G$ be a graph with two nonnegative distributions $\mu,\nu$ on $V(G)$ such that $(\mu,\nu)\in D_{\geq 0}$ .

Definition 2.2 (c.f. [PC19]).

Define a transportation plan from $\mu$ to $\nu$ for $(\mu,\nu)\in D_{\geq 0}$ to be a function $T_{\mu,\nu}:V(G)\times V(G)\to\mathbb{R}$ such that

•

for any vertices $w_{1},w_{2}\in V(G)$ , we have that $T_{\mu,\nu}(w_{1},w_{2})\geq 0$ ,
•

for all vertices $w\in V(G)$ , we have that $\sum_{i\in V(G)}T_{\mu,\nu}(w,i)=\mu(w)$ ,
•

for all vertices $w\in V(G)$ , we have that $\sum_{i\in V(G)}T_{\mu,\nu}(i,w)=\nu(w)$ .

Denote by $\mathcal{T}_{\mu,\nu}$ the set of all transportation plans from $\mu$ to $\nu$ .

Following [Kan06], we can intuitively visualize a transportation plan $T_{\mu,\nu}$ as a way to move mass distributed over the vertices of $G$ according to $\mu$ along the edges of $G$ to an arrangement according to $\nu$ . We now consider the cost of a given transportation plan $T_{\mu,\nu}$ : if moving 1 unit of mass across 1 edge has a cost of 1, how much does it cost to move the mass distribution of $\mu$ to that of $\nu$ according to $T_{\mu,\nu}$ ?

Definition 2.3 (c.f. [PC19]).

Define the cost function $C:\mathcal{T}_{\mu,\nu}\to\mathbb{R}$ to take any transportation plan $T$ to its cost

C(T)=\sum_{(w_{1},w_{2})\in V(G)\times V(G)}\textrm{d}(w_{1},w_{2})\cdot T(w_{1},w_{2}).

Definition 2.4 (c.f. [PC19]).

Define the Wasserstein distance $W_{\geq 0}:D_{\geq 0}\to\mathbb{R}_{\geq 0}$ by $W_{\geq 0}(\mu,\nu):=\displaystyle\min_{T\in\mathcal{T}_{u,v}}C(T)$

We can thus interpret the Wasserstein distance as the minimum cost of transporting mass from its arrangement in distribution $\mu$ to an arrangement in distribution $\nu$ .

Remark 2.5.

Note that for any distribution $\psi$ on $V(G)$ , if $\mu$ , $\nu$ , $\mu+\psi$ , and $\nu+\psi$ are all nonnegative, then $W_{\geq 0}(\mu,\nu)=W_{\geq 0}(\mu+\psi,\nu+\psi)$ (for a proof, see for example [JK21], which notes that the Wasserstein distance between $\mu$ and $\nu$ can be defined in terms of $\mu-\nu$ ).

Let $G$ be a graph with two distributions $\mu,\nu$ on $V(G)$ such that

\sum_{w\in V(G)}\mu(w)=\sum_{w\in V(G)}\nu(w).

Let $\psi$ be a distribution such that $\mu+\psi$ and $\nu+\psi$ are both nonnegative. We extend the domain of the Wasserstein distance to include distributions $\mu,\nu$ with negative entries by defining $W(\mu,\nu):D\to\mathbb{R}_{\geq 0}$ to be $W_{\geq 0}(\mu+\psi,\nu+\psi)$ . By Remark 2.5, $W(\mu,\nu)$ is well-defined.

Even if $\mu$ and $\nu$ have negative entries, we can interpret $W(\mu,\nu)$ as the cost of some optimal “transportation plan” that moves mass from distribution $\mu$ to distribution $\nu$ .

Thus, in the rest of the paper, “transportation plans” between distributions $\mu$ and $\nu$ allow for negative entries in $\mu$ and $\nu$ . In this case, a transportation plan rigorously refers to a transportation plan from $\mu+\psi$ to $\nu+\psi$ for some $\psi$ large enough that $\mu+\psi$ and $\nu+\psi$ are both nonnegative. In particular, the movement of mass between $\mu$ and $\nu$ from a vertex $w_{1}$ to a different vertex $w_{2}$ actually refers to that same movement of mass from $w_{1}$ to $w_{2}$ between the distributions $\mu+\psi$ and $\nu+\psi$ .

We now discuss a different way of calculating the Wasserstein distance.

Definition 2.6 (c.f. [PC19]).

Given a graph $G$ , a 1-Lipschitz function $\ell:V(G)\to\mathbb{R}$ is a function on the vertices of G where for any $w_{1},w_{2}\in V(G)$ , we have that $|\ell(w_{1})-\ell(w_{2})|\leq\textrm{d}(w_{1},w_{2})$ . Let $L(G)$ be the set of all 1-Lipschitz functions on $G$ .

Theorem 2.7 (Kantorovich Duality, c.f. [PC19]).

Let $G$ be a graph with two distributions $\mu,\nu$ on $V(G)$ such that $\sum_{w\in V(G)}\mu(w)=\sum_{w\in V(G)}\nu(w)$ . Then

W(\mu,\nu)=\max_{\ell\in L}\sum_{w\in G}\ell(w)(\mu(w)-\nu(w)).

We now seek a way to refer to a pair of random walks on a graph, as these pairs of random walks are the objects we study. The information needed to define such a pair consists of the graph $G$ , the starting vertices $u$ and $v$ of the two random walks, and the respective lazinesses $\alpha$ and $\beta$ of the random walks. We thus define a Guvab comprised of this information.

Definition 2.8.

We define a Guvab to be a tuple $(G,u,v,\alpha,\beta)$ where $G$ is a finite, connected, simple graph, $u,v\in V(G)$ , and $\alpha,\beta\in[0,1]$ with $\alpha\leq\beta$ .

Definition 2.9.

Consider a graph $G$ . For any starting vertex $u\in V(G)$ and laziness $\alpha\in[0,1]$ , consider the random walk $R=\{R_{k}\}_{k=0}^{\infty}$ such that $R_{0}=u$ , and, for $i\geq 1$ , we have $R_{i}=R_{i-1}$ with probability $\alpha$ , and $R_{i}=t$ with probability $\frac{1-\alpha}{\deg(R_{i-1})}$ for any $t\in N(R_{i-1})$ . We say the probability distribution $\mu_{k}$ for $R_{k}$ is a k-step probability measure.

Consider some Guvab $\mathcal{G}=(G,u,v,\alpha,\beta)$ . We let $X(\mathcal{G})=\{X_{k}\}_{k=0}^{\infty}$ be the Markov chain corresponding to a random walk with laziness $\alpha$ starting from vertex $u$ and we let $Y(\mathcal{G})=\{Y_{k}\}_{k=0}^{\infty}$ be the Markov chain corresponding to a random walk with laziness $\beta$ starting from vertex $v$ . When it is clear which Guvab $\mathcal{G}$ we are referring to, we write $X,Y$ instead of $X(\mathcal{G}),Y(\mathcal{G})$ , respectively.

Consider some Guvab $\mathcal{G}=(G,u,v,\alpha,\beta)$ . For all $k\geq 0$ we let $\mu_{k}(\mathcal{G}),\nu_{k}(\mathcal{G})$ be the k-step probability measures of $X(\mathcal{G}),Y(\mathcal{G})$ respectively. We let $\xi_{k}(\mathcal{G})=\mu_{k}(\mathcal{G})-\nu_{k}(\mathcal{G})$ and $W_{k}(\mathcal{G})=W(\mu_{k}(\mathcal{G}),\nu_{k}(\mathcal{G}))$ . When it is clear which Guvab $\mathcal{G}$ we are referring to, we write $\mu_{k},\nu_{k},\xi_{k},W_{k}$ instead of $\mu_{k}(\mathcal{G}),\nu_{k}(\mathcal{G}),\xi_{k}(\mathcal{G}),W_{k}(\mathcal{G})$ , respectively.

Given a Guvab $\mathcal{G}$ , we define $P_{\alpha}$ and $P_{\beta}$ to be the transition probability matrices of $X$ and $Y$ , respectively. In particular, for all $k$ , we have that $\mu_{k}=\mu_{0}P_{\alpha}^{k}$ and $\nu_{k}=\nu_{0}P_{\beta}^{k}$ , where the distributions are row vectors. We also define $P$ to be the transition probability matrix of a random walk with zero laziness on $G$ (note that $P$ does not depend on the starting vertex of the random walk). We note that $P_{\alpha}$ and $P_{\beta}$ only depend on $\alpha$ and $\beta$ , not $u$ and $v$ . In particular, $P_{\alpha}=\alpha I+(1-\alpha)P$ and $P_{\beta}=\beta I+(1-\beta)P$ .

Lemma 2.10.

Let $\{\lambda_{1},\ldots,\lambda_{n}\}$ be the union of the set of eigenvalues of $P_{\alpha}$ and the set of eigenvalues of $P_{\beta}$ . For all vertices $w$ , there exist some constants $c^{w}_{i}$ such that for all $k\geq 1$ , we have $\xi_{k}(w)=\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k}$ .

Proof.

This follows from the fact that $P_{\alpha}$ and $P_{\beta}$ are diagonalizable (since random walks are reversible ([LP17]) and thus have diagonalizable matrices ([LP17], Chapter 12)). Say $P_{\alpha}$ has eigenvalues $\lambda_{1},\ldots,\lambda_{m}$ and $P_{\beta}$ has eigenvalues $\lambda_{m+1},\ldots,\lambda_{m^{\prime}}$ . Since $P_{\alpha}$ is diagonalizable, we can write it as $ADA^{-1}$ for invertible matrix $A$ and diagonal matrix $D$ with diagonal entries $\lambda_{1},\ldots,\lambda_{m}$ .

Then $\mu_{k}=\mu_{0}P_{\alpha}^{k}=\mu_{0}AD^{k}A^{-1}$ , so for all $w$ there exist constants $x^{w}_{1},\ldots,x^{w}_{m}$ such that for all $k\geq 1$ we have $\mu_{k}(w)=\sum_{i=1}^{m}x^{w}_{i}\lambda_{i}^{k}$ . By similar reasoning, for all $w$ there exist constants $y^{w}_{m+1},\ldots,y^{w}_{m^{\prime}}$ such that for all $k\geq 1$ we have $\nu_{k}(w)=\sum_{i=m+1}^{m^{\prime}}y^{w}_{i}\lambda_{i}^{k}$ . Therefore, for all $w$ , there exist some constants $c^{w}_{i}$ such that for all $k\geq 1$ , we have that $\xi_{k}(w)=\sum_{i=1}^{m^{\prime}}c^{w}_{i}\lambda_{i}^{k}$ . If for any $i$ and $j$ we have $\lambda_{i}=\lambda_{j}$ , we can collect these like terms and thus create a list of distinct eigenvalues $\lambda_{1},\ldots,\lambda_{n}$ and constants $c^{w}_{1},\ldots c^{w}_{n}$ such that for all $k\geq 1$ , we have $\xi_{k}(w)=\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k}$ . In particular, $\lambda_{1},\ldots,\lambda_{n}$ will be exactly the elements of the union of the set of eigenvalues of $P_{\alpha}$ and the set of eigenvalues of $P_{\beta}$ . ∎

In the next section, we discuss when and how the Wasserstein distance converges, which is related to the convergence of probability distributions of random walks. Since random walks can be viewed as Markov chains, we reference some classical Markov chain theory, using the same definitions as in [LP17]. We also use the following well-known Markov chain theorem.

Theorem 2.11 (c.f. [LP17]).

Suppose that a Markov chain $X$ is aperiodic and irreducible with probability distributions $(\mu_{0},\mu_{1},\ldots)$ and stationary distribution $\pi$ . Then $\lim_{k\to\infty}\mu_{k}=\pi$ .

Finally, in our discussion of convergence, we encounter cases where the Wasserstein distance is eventually constant. To quantify this precisely, we provide the following definition.

Definition 2.12.

We call an infinite sequence $\{S_{i}\}_{i=0}^{\infty}$ for $S_{i}\in\mathbb{R}$ eventually constant if there exists $N\geq 0$ such that for all $k\geq N$ , we have that $S_{k}=S_{N}$ .

3 Classifying End Behavior of $W_{k}$

In this section, we seek to enumerate the possible end behaviors of the Wasserstein distance for a Guvab. In particular, we prove results about when the Wasserstein distance converges and what it converges to for different Guvabs. The classification of Guvabs by end behavior paves the way for our later discussion of the rate of convergence of the Wasserstein distance.

We begin with a technical lemma showing that the limit of the Wasserstein distance is the Wasserstein distance of the limit, as we expect.

Lemma 3.1.

Let $f:\mathbb{Z}_{\geq 0}\to\mathbb{Z}_{\geq 0}$ be a strictly increasing function. If $\displaystyle\lim_{k\to\infty}\mu_{f(k)}=\mu$ and $\displaystyle\lim_{k\to\infty}\nu_{f(k)}=\nu$ (and, in particular, both limits exist), then

\lim_{k\to\infty}W(\mu_{f(k)},\nu_{f(k)})=W(\mu,\nu).

Proof.

Note that, by the triangle inequality,

W(\mu_{f(k)},\nu_{f(k)})\leq W(\mu_{f(k)},\mu)+W(\mu,\nu)+W(\nu,\nu_{f(k)})

and

W(\mu,\mu_{f(k)})+W(\mu_{f(k)},\nu_{f(k)})+W(\nu_{f(k)},\nu)\geq W(\mu,\nu).

This implies that

	$\displaystyle W(\mu,\nu)-W(\mu,\mu_{f(k)})-W(\nu_{f(k)},\nu)$	$\displaystyle\leq W(\mu_{f(k)},\nu_{f(k)})$
		$\displaystyle\leq W(\mu_{f(k)},\mu)+W(\mu,\nu)+W(\nu,\nu_{f(k)}).$

However,

\displaystyle\lim_{k\to\infty}W(\mu,\mu_{f(k)})=\lim_{k\to\infty}W(\mu-\mu_{f(k)},0)=0

(and similarly for $W(\nu,\nu_{f(k)})$ ). The above inequality implies that

\lim_{k\to\infty}W(\mu_{f(k)},\nu_{f(k)})=W(\mu,\nu),

as desired. ∎

Due to classical Markov chain theory, we expect that in most cases, the probability distributions of both random walks converge to the same stationary distribution, and thus $\lim_{k\to\infty}W_{k}=0$ . The following definition and lemma quantify the stationary distribution that most random walks converge to. The subsequent theorem specifies what the “most cases” in which the distance goes to zero are.

Definition 3.2.

For any graph $G$ , we define the distribution $\pi$ to be such that for any $i\in G$ , we have $\pi_{i}=\displaystyle\frac{\deg(i)}{\sum_{j\in G}\deg(j)}.$

Lemma 3.3.

When $0<\alpha<1$ , the k-step probability measure $\mu_{k}$ converges to the stationary distribution $\pi$ .

Proof.

Recall that $X$ is the Markov chain of the random walk. We have that $X$ is aperiodic (we can return from a vertex to itself in one step) and irreducible ( $G$ is connected). We have that for any vertex $w\in G$ ,

\pi_{w}=\displaystyle\sum_{i\sim w}\pi_{i}\frac{1}{\deg(i)}=\displaystyle\alpha\pi_{w}+\sum_{i\sim w}\pi_{i}\frac{1-\alpha}{\deg(i)}.

Thus, $\pi$ is a stationary distribution of $X$ . Hence, by Theorem 2.11, $\pi$ is a limiting distribution for $X$ and thus $\lim_{k\to\infty}\mu_{k}=\pi$ . ∎

Theorem 3.4.

The value $W(\mu_{k},\nu_{k})$ converges to $0$ as $k\to\infty$ if and only if one of the following conditions is true:

•

$0<\alpha\leq\beta<1$ ,
•

$\alpha=\beta=1$ and $u=v$ ,
•

$G$ is not bipartite and $0=\alpha\leq\beta<1$ ,
•

$\alpha=\beta=0$ and there exists a path from $u$ to $v$ with an even number of steps.

Proof.

Note that for $0<\alpha\leq\beta<1$ , we have by Lemma 3.3 that

\lim_{k\to\infty}\mu_{k}=\lim_{k\to\infty}\nu_{k}=\pi.

Thus, $\lim_{k\to\infty}W(\mu_{k},\nu_{k})=0$ in this case.

We now consider the cases where $\alpha=0$ or $\beta=1$ . If $\beta=1$ , then $Y$ stays at $v$ forever. Thus, in order to have $\lim_{k\to\infty}W(\mu_{k},\nu_{k})=0$ , we need $\alpha=1$ and $u=v$ . This is sufficient to imply $\lim_{k\to\infty}W(\mu_{k},\nu_{k})=0$ .

It remains to look at the case where $\alpha=0$ and $\beta<1$ , which we break into subcases based on whether $G$ is bipartite.

We first tackle the subcase where $G$ is not bipartite, i.e., $G$ contains an odd cycle. Since $\alpha,\beta<1$ , both $X,Y$ are aperiodic (there is a path from any vertex to itself in both an odd number of steps and an even number of steps via the odd cycle) and irreducible ( $G$ is connected). Thus, $\lim_{k\to\infty}\mu_{k}=\lim_{k\to\infty}\nu_{k}=\pi$ and $\lim_{k\to\infty}W(\mu_{k},\nu_{k})=0$ as before.

Finally, we address the subcase where $G$ is bipartite with sides $S_{1},S_{2}$ . Here, $X$ is periodic (with period 2), so $Y$ must be periodic as well to have $\lim_{k\to\infty}W(\mu_{k},\nu_{k})=0$ . Thus, $\beta=0$ . If $u,v$ are on different sides of $G$ , then $X$ and $Y$ will never be on the same side, so we cannot have $\lim_{k\to\infty}W(\mu_{k},\nu_{k})=0$ . Otherwise, without loss of generality let $u,v\in S_{1}$ . Consider the Markov chains $X^{\prime}=\{X_{2k}\}_{k=0}^{\infty}$ and $Y^{\prime}=\{Y_{2k}\}_{k=0}^{\infty}$ with vertex set $S_{1}$ . Since $X^{\prime}$ and $Y^{\prime}$ are aperiodic (we can get from a vertex to itself in one step of $X^{\prime}$ or $Y^{\prime}$ by moving back and forth along the same edge) and irreducible ( $G$ is connected), and they both have the same transition matrix, the Markov chains converge to the same stationary distribution. Similar reasoning applies for $\{X_{2k+1}\}$ and $\{Y_{2k+1}\}$ . This finishes the proof for this case, hence completing the proof of Theorem 3.4. ∎

In the next part of this section, we specify what the stationary distributions look like for any possible Guvab, particularly considering Guvabs with more than one stationary distribution $\pi$ . We show that all Guvabs either have one set of end behaviors they converge to or switch back and forth between two sets of end behaviors.

Suppose $\alpha=0$ . Let $G$ be bipartite with sides $S_{1},S_{2}$ , and without loss of generality let $u\in S_{1}$ . Let $X_{1}=\{X_{2k}\}_{0}^{\infty}$ and $X_{2}=\{X_{2k+1}\}_{0}^{\infty}$ . Let $X_{i}^{\prime}$ denote $X_{i}$ restricted to $S_{i}$ for $i\in\{1,2\}.$ For $i\in\{1,2\}$ , let $\tau_{i}^{\prime}$ be a distribution on $S_{i}$ such that

(\tau_{i}^{\prime})_{w}=\frac{2\deg(w)}{\sum_{j\in G}\deg(j)}

for $w\in S_{i}$ . Further, for $i\in\{1,2\}$ , let $\tau_{i}$ be a distribution on $G$ that is $\tau_{i}^{\prime}$ on $S_{i}$ and has value 0 elsewhere.

Lemma 3.5.

For $i\in\{1,2\}$ , the distribution $\tau_{i}^{\prime}$ is the limiting distribution of $X_{i}^{\prime}$ .

Proof.

First, we claim that $\tau_{1}P^{2}=\tau_{1}$ and $\tau_{2}P^{2}=\tau_{2}$ , where $P$ is the transition matrix of $X$ . Note that for $w\in S_{2}$ , we have

\displaystyle(\tau_{1}P)_{w}=\sum_{i\sim w}\frac{1}{\deg(i)}(\tau_{1})_{i}=\sum_{i\sim w}\left(\frac{1}{\deg(i)}\right)\left(\frac{\deg(i)}{\sum_{j\in G}\deg(j)}\right)=\frac{\deg(w)}{\sum_{j\in G}\deg(j)}=(\tau_{2})_{w}.

This is because for all $i\sim w$ , we have $i\in S_{1}$ , which implies $(\tau_{1})_{i}=\frac{\deg(i)}{\sum_{j\in G}\deg(j)}$ . For $w\in S_{1}$ , we have $\displaystyle(\tau_{1}P)_{w}=\sum_{i\sim w}\frac{1}{\deg(i)}(\tau_{1})_{i}=0=(\tau_{2})_{w}$ . This is because for all $i\sim w$ , we have $i\in S_{2}$ , which implies $(\tau_{1})_{i}=0$ . Hence, $\tau_{1}P=\tau_{2}$ and, by similar reasoning, $\tau_{2}P=\tau_{1}$ . Thus, $\tau_{1}P^{2}=\tau_{1}$ and $\tau_{2}P^{2}=\tau_{2}$ as desired.

Also, we note that $\displaystyle\sum_{w\in S_{i}}(\tau_{i}^{\prime})_{w}=\sum_{w\in S_{i}}\frac{2\deg(w)}{\sum_{j\in G}\deg(j)}=\frac{2|E(G)|}{\sum_{j\in G}\deg(j)}=1.$

We now see that $\tau_{i}^{\prime}$ is a stationary distribution of $X_{i}^{\prime}$ for $i\in\{1,2\}$ . Since $X_{1}^{\prime}$ and $X_{2}^{\prime}$ are irreducible and aperiodic (as shown in the proof of Theorem 3.4), we have that $\tau_{i}^{\prime}$ is a limiting distribution of $X_{i}^{\prime}$ for $i\in\{1,2\}$ . ∎

Corollary 3.6.

If $u\in S_{1}$ , then as $k\to\infty$ , we have $\mu_{2k}$ converges to $\tau_{1}$ and $\mu_{2k+1}$ converges to $\tau_{2}$ . Analogously, if $u\in S_{2}$ then as $k\to\infty$ , we have $\mu_{2k}$ converges to $\tau_{2}$ and $\mu_{2k+1}$ converges to $\tau_{1}$ .

Proof.

Suppose $u\in S_{1}$ ; the proof will proceed analogously if $u\in S_{2}$ . Then, $\mu_{2k}$ will always be 0 on $S_{2}$ and, by Lemma 3.5, it will converge to $\tau_{1}^{\prime}$ on $S_{1}$ because $\mu_{2k}$ is the probability distribution of $X_{1}^{\prime}$ on $S_{1}$ . Similarly, $\mu_{2k+1}$ will always be 0 on $S_{1}$ and it will converge to $\tau_{2}^{\prime}$ on $S_{2}$ . Thus, $\mu_{2k}$ converges to $\tau_{1}$ and $\mu_{2k+1}$ converges to $\tau_{2}$ . ∎

Corollary 3.7.

For any Guvab, $\displaystyle\lim_{k\to\infty}\xi_{2k}$ and $\displaystyle\lim_{k\to\infty}\xi_{2k+1}$ are well-defined.

Proof.

We show that for any $\mu$ , we have that $\lim_{k\to\infty}\mu_{2k}$ and $\lim_{k\to\infty}\mu_{2k+1}$ are well-defined; this implies the statement of the corollary. When $G$ is bipartite and $\alpha=0$ , we know that $\lim_{k\to\infty}\mu_{2k}=\tau_{1}$ and $\lim_{k\to\infty}\mu_{2k+1}=\tau_{2}$ (assuming, without loss of generality, that $u\in S_{1}$ ). When $\alpha=0$ and $G$ is not bipartite or when $0<\alpha<1$ , we have

\lim_{k\to\infty}\mu_{2k}=\lim_{k\to\infty}\mu_{2k+1}=\pi.

Finally, when $\alpha=1$ , we know $\lim_{k\to\infty}\mu_{2k}=\lim_{k\to\infty}\mu_{2k+1}=\mathbbm{1}_{u}$ . This covers all possible cases for $\alpha$ and $G$ , so we are done. ∎

For any Guvab, we refer to $\displaystyle\lim_{k\to\infty}\xi_{2k}$ as $\xi^{0}$ and $\displaystyle\lim_{k\to\infty}\xi_{2k+1}$ as $\xi^{1}$ .

The following corollary is quite important for the rest of this section and the remainder of this paper. Its relevance to this section is that $\lim_{k\to\infty}W_{k}$ will be well-defined unless $\lim_{k\to\infty}W_{2k}\neq\lim_{k\to\infty}W_{2k+1}$ . The corollary is pertinent to the rest of the paper because it indicates that the rates of convergence of $\{W_{2k}\}$ and $\{W_{2k+1}\}$ are always well-defined. Thus, for any possible Guvab, we can study and state results about the rates of convergence of $\{W_{2k}\}$ and $\{W_{2k+1}\}$ .

Corollary 3.8.

We have that $\lim_{k\to\infty}W_{2k}$ and $\lim_{k\to\infty}W_{2k+1}$ are always well-defined.

Proof.

We know that $\lim_{k\to\infty}W_{2k}=W(\xi^{0},\tilde{\textbf{0}})$ and $\lim_{k\to\infty}W_{2k+1}=W(\xi^{1},\tilde{\textbf{0}})$ . ∎

We soon discuss many cases where $\lim_{k\to\infty}W_{k}$ exists, so we designate a way to refer to this limit. For any Guvab $\mathcal{G}$ where $\lim_{k\to\infty}W_{k}$ exists, we denote by $W$ the limit $\lim_{k\to\infty}W_{k}$ .

We can now state and prove our main theorems about whether the Wasserstein distance converges and the values it converges to. For any possible Guvab, Theorem 3.10 allows us to determine whether the Wasserstein distance converges. Furthermore, Theorem 3.9 allows us to, in most cases, quickly and easily determine what value the Wasserstein distance will converge to. Finally, these theorems provide a framework for us to classify the Guvabs into four categories so we can use casework to understand the rate of convergence.

Theorem 3.9.

Unless $G$ is bipartite, $\alpha=0$ , and $\beta=1$ , we have that $\displaystyle W=\lim_{k\to\infty}W(\mu_{k},\nu_{k})$ is always well defined, and furthermore

•

$W=0$ under the conditions specified in Theorem 3.4,
•

$W=1$ if $\alpha=\beta=0$ and $W\neq 0$ ,
•

$W=\frac{1}{2}$ if $0=\alpha<\beta<1$ and $G$ is bipartite.

Proof.

The first condition is clear by Theorem 3.4. Next, we look at the case where $\alpha=\beta=0$ and $W\neq 0$ . By Theorem 3.4, this corresponds to the case where $G$ is bipartite and $u,v$ are on opposite sides of $G$ . Without loss of generality, let $u\in S_{1}$ and $v\in S_{2}$ . Then, as $k\to\infty$ , we have $\mu_{2k}$ converges to $\tau_{1}$ and $\nu_{2k}$ converges to $\tau_{2}$ by Corollary 3.6. Analogously, $\mu_{2k+1}$ converges to $\tau_{2}$ and $\nu_{2k+1}$ converges to $\tau_{1}$ . Thus, $\displaystyle\lim_{k\to\infty}W(\mu_{k},\nu_{k})=W(\tau_{1},\tau_{2})$ . We have that $W(\tau_{1},\tau_{2})\geq 1$ because to get from $\tau_{1}$ to $\tau_{2}$ , we must move all the mass from $S_{1}$ across at least one edge to $S_{2}$ . Also, $W(\tau_{1},\tau_{2})\leq 1$ because we can achieve a distance of 1 by, for any given edge $ab$ with $a\in S_{1}$ and $b\in S_{2}$ , moving a mass of $\displaystyle\frac{2}{\sum_{j\in G}\deg(j)}$ from $a$ to $b$ .

We now consider the case when $0=\alpha<\beta<1$ and $G$ is bipartite. Without loss of generality, let $u\in S_{1}$ . Since $\alpha=0$ , we have that $\lim_{k\to\infty}\mu_{2k}=\tau_{1}$ and $\lim_{k\to\infty}\mu_{2k+1}=\tau_{2}$ . Since $\beta>0$ , we have that $\lim_{k\to\infty}\nu_{k}=\pi$ . Thus, we have that $\displaystyle\lim_{k\to\infty}W(\mu_{2k},\nu_{2k})=W(\tau_{1},\pi)$ and $\displaystyle\lim_{k\to\infty}W(\mu_{2k+1},\nu_{2k+1})=W(\tau_{2},\pi)$ . If we show that $W(\tau_{1},\pi)=W(\tau_{2},\pi)=\frac{1}{2}$ , we will have shown the third condition. We know that $\pi$ will have half its mass on $S_{1}$ and half its mass on $S_{2}$ because

\sum_{v\in S_{1}}\pi_{v}=\sum_{v\in S_{1}}\frac{\deg(v)}{\sum_{j\in G}\deg(j)}=\frac{|E(G)|}{\sum_{j\in G}\deg(j)}=\frac{1}{2}.

Thus, half the mass must move from $S_{2}$ to $S_{1}$ , so $W(\pi,\tau_{1})\geq\frac{1}{2}$ . We can also achieve a distance of exactly $\frac{1}{2}$ from $\pi$ to $\tau_{1}$ by, for any given edge $ab$ with $a\in S_{1}$ and $b\in S_{2}$ , moving $\displaystyle\frac{1}{\sum_{j\in G}\deg(j)}$ mass from $b$ to $a$ . Thus, $W(\pi,\tau_{1})=\frac{1}{2}$ and by an analogous argument, $W(\pi,\tau_{2})=\frac{1}{2}$ .

We have now considered all cases where $\alpha,\beta<1$ and where $\alpha=\beta=1$ . The only case left is where $0<\alpha<1$ and $\beta=1$ . Here, $\lim_{k\to\infty}\mu_{k}=\pi$ and $\nu_{k}=\mathbbm{1}_{v}$ where $\mathbbm{1}_{v}$ is the distribution with 1 at $v$ and 0 elsewhere. Thus, $\lim_{k\to\infty}W(\mu_{k},\nu_{k})=W(\pi,\mathbbm{1}_{v})$ , which is a constant. ∎

Theorem 3.10.

The distance $W(\mu_{k},\nu_{k})$ does not converge as $k\to\infty$ if and only if $G$ is bipartite, $\alpha=0$ and $\beta=1$ , and

\sum_{w\in V(G)}(-1)^{\emph{{d}}(v,w)}\emph{{d}}(v,w)\deg(w)\neq 0.

Proof.

By Theorem 3.9, we know that the only case where it is possible for $W(\mu_{k},\nu_{k})$ not to converge is when $G$ is bipartite, $\alpha=0$ , and $\beta=1$ . In this case, $\nu_{k}=\mathbbm{1}_{v}$ . Additionally, assuming without loss of generality that $u\in S_{1}$ , we have that

\lim_{k\to\infty}\mu_{2k}=\tau_{1}\text{ and }\lim_{k\to\infty}\mu_{2k+1}=\tau_{2}.

Thus, $W(\mu_{k},\nu_{k})$ converges as $k\to\infty$ if and only if $W(\mathbbm{1}_{v},\tau_{1})=W(\mathbbm{1}_{v},\tau_{2})$ .

To calculate $W(\mathbbm{1}_{v},\tau_{1})$ , we note that we must move all the mass of $\tau_{1}$ to vertex $v$ . To move all the mass at some vertex $w$ to $v$ , we necessarily move a mass of $(\tau_{1})_{w}$ over a distance of $\textrm{d}(w,v)$ . Thus the total transportation cost, and thus the total Wasserstein distance $W(\mathbbm{1}_{v},\tau_{1})$ , is given by

	$\displaystyle\sum_{w\in G}\textrm{d}(v,w)(\tau_{1})_{w}$	$\displaystyle=\sum_{w\in S_{1}}\textrm{d}(v,w)\frac{2\deg(w)}{\sum_{j\in G}\deg(j)}+\sum_{w\in S_{2}}\textrm{d}(v,w)\cdot 0$
		$\displaystyle=\frac{2}{\sum_{j\in G}\deg(j)}\sum_{w\in S_{1}}\textrm{d}(v,w)\deg(w).$

By the same reasoning, we have that

W(\mathbbm{1}_{v},\tau_{2})=\frac{2}{\sum_{j\in G}\deg(j)}\sum_{w\in S_{2}}\textrm{d}(v,w)\deg(w).

Given that $G$ is bipartite, $\alpha=0$ , and $\beta=1$ , we know that the Wasserstein distance converges if and only if $W(\mathbbm{1}_{v},\tau_{1})-W(\mathbbm{1}_{v},\tau_{2})=0$ , which is true if and only if

\sum_{w\in V(G)}(-1)^{\textrm{d}(v,w)}\textrm{d}(v,w)\deg(w)=0

since the parity of $\textrm{d}(v,w)$ depends only on the side of $G$ that $w$ is on. Thus, the theorem statement follows. ∎

We now present a table summarizing much of the information about convergence discussed in this section.

Conditions on $\mathcal{G}$	$W=0$	$W=\frac{1}{2}$	$W=1$	$W=C\neq 0,\frac{1}{2},1$	$W_{k}$ does not converge
$G$ bipartite, $\beta=1$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$
$G$ bipartite, $\beta<1$	$\checkmark$	$\checkmark$	$\checkmark$	$\times$	$\times$
$G$ non-bipartite, $\beta=1$	$\checkmark$	$\times$	$\checkmark$	$\checkmark$	$\times$
$G$ non-bipartite, $\beta<1$	$\checkmark$	$\times$	$\times$	$\times$	$\times$

Table 1: Is it possible for the Wasserstein distance to converge to particular limits in different cases of conditions on

\mathcal{G}

Remark 3.11.

We know that the case of $G$ bipartite, $\beta=1$ , and $W=\frac{1}{2}$ is possible by considering a star with $v$ at the center and $0<\alpha<1$ . We know that the case of $G$ non-bipartite, $\beta=1$ , and $W=\frac{1}{2}$ is impossible because in order for it to be possible, $\lim_{k\to\infty}\mu_{k}$ would need half of its mass to be at $v$ . Since mass of $\lim_{k\to\infty}\mu_{k}$ is proportional to degree, every edge would have to be incident to $v$ , making the graph bipartite.

The following corollary provides a categorization of the Guvabs into four types. In the next four sections of this paper, we examine each of these categories in turn.

Corollary 3.12.

Each Guvab satisfies exactly one of the following four conditions:

•

$W=1$ and $\beta<1$ ,
•

$W=\frac{1}{2}$ and $\beta<1$ ,
•

$W=0$ and $\beta<1$ ,
•

$\beta=1$ .

Proof.

If $\beta<1$ , we have $W=0$ under the conditions in Theorem 3.4 and $W=1$ or $W=\frac{1}{2}$ otherwise, since the conditions in Theorem 3.9 cover all possible cases where $\beta<1$ and $W\neq 0$ . ∎

If we understand the convergence of the Wasserstein distance in all four of these cases, then we understand the convergence for all Guvabs. The subsequent four sections each discuss the convergence of the Wasserstein distance in one of these cases. Our two main convergence theorems, presented in Section 8, put together the general results obtained by examining these four cases individually.

4 Convergence when $W=1$

In this section we consider Guvabs with $W=1$ and $\beta<1$ . Recall that these are exactly the Guvabs for which $G$ is bipartite, $u$ and $v$ are on different sides of the bipartite graph, and $\alpha=\beta=0$ . We show that all such Guvabs have a Wasserstein distance that is eventually constant. We also begin to understand how long it takes for the Wasserstein distance to reach constancy.

We first recall that the Wasserstein distance between two distributions $\mu$ and $\nu$ with potentially negative entries is the cost of an optimal transportation plan for moving mass⁴⁴4As discussed in section 2, the mass of a distribution $\mu$ at a vertex $w$ is $\mu(w)$ , the value of the distribution at that vertex. from $\mu$ to $\nu$ . Thus, to prove the eventual constancy of the Wasserstein distance, we construct an algorithm that produces a transportation plan between any two distributions. Then, we show that when certain inequalities are satisfied, this transportation plan has a cost of exactly 1 and is optimal. Finally, we prove that when $\xi_{k}$ is eventually sufficiently close to either of the stationary distributions $\xi^{0}$ or $\xi^{1}$ , these inequalities are satisfied.

We start by constructing the algorithm. Pick a spanning tree $T$ of $G$ and let $L$ be the set of leaves of $T$ . Define a function $r:V(G)\to\mathbb{Z}$ such that $r(w)=\min_{\ell\in L}\textrm{d}(w,\ell)$ .

For any finite set $S$ , let $\textrm{Perm}(S)$ denote the set of all permutations of $S$ . We say that an $r$ -monotone ordering $\mathcal{O}=(w_{1},\ldots,w_{n})\in\textrm{Perm}(V(G))$ is a permutation of $V(G)$ such that $r(w_{1}),\ldots,r(w_{n})$ is a non-decreasing sequence.

Definition 4.1.

Given a graph $G$ , a spanning tree $T$ of $G$ , an $r$ -monotone ordering $\mathcal{O}$ and zero-sum distribution $\xi$ , we define the tree-based transport algorithm, which transports mass from $\xi$ to $\tilde{\textbf{0}}$ , to be an $(n-1)$ -step algorithm in which at the $i$ th step,

•

if the current mass at $w_{i}$ is nonnegative, we distribute it evenly among all $v\sim w_{i}$ with indices greater than $i$ ,
•

if the current mass at $w_{i}$ is negative, we take an equal amount of mass to vertex $w_{i}$ from all $v\in N(w_{i})$ with indices greater than $i$ , so that the mass at $w_{i}$ is now $0$ .

In Lemma 4.2, we see that this algorithm produces a valid transportation plan from $\xi$ to $\tilde{\textbf{0}}$ . We refer to this tree-based transportation plan as $A(G,T,\mathcal{O},\xi)$ . Given $G,T$ and $\mathcal{O}$ , we let $A_{i}(\xi)$ denote the distribution of mass on the vertices of $G$ after $i$ steps of the algorithm.

Lemma 4.2.

The tree-based transport algorithm on $G,T,\mathcal{O},\xi$ always produces a valid transportation plan from $\xi$ to $\tilde{\textbf{0}}$ .

Proof.

After the $i$ th step of the tree-based transport algorithm, the mass at each of the vertices $w_{1},\ldots,w_{i}$ is $0$ , since the mass at $w_{j}$ becomes zero at the $j$ th step, and thereafter no mass is moved to or from $w_{j}$ . Thus, after the $(n-2)$ th step, the only vertices of $G$ with nonzero mass will be $w_{n-1}$ and $w_{n}$ . Since the total mass sums to $0$ and $w_{n-1}$ is adjacent to $w_{n}$ , the $(n-1)$ th step of the algorithm simply moves the positive mass to the negative mass so that all vertices have mass $0$ . ∎

We now prove a useful property of this algorithm.

Lemma 4.3.

Given a graph $G$ , tree $T$ and $r$ -monotone ordering $\mathcal{O}$ , for all $i$ , we have that $A_{i}$ is a linear function on the space of zero-sum distributions.

Proof.

It suffices to show that for any two zero-sum distributions $\xi$ and $\xi^{\prime}$ , we have $A_{i}(\xi+\xi^{\prime})=A_{i}(\xi)+A_{i}(\xi^{\prime})$ . We prove this by induction on $i$ .

Base case: When $i=0$ , we have that $A_{0}(\xi+\xi^{\prime})=\xi+\xi^{\prime}=A_{0}(\xi)+A_{0}(\xi^{\prime})$ .

Inductive step: For the inductive hypothesis, we assume that $A_{i}(\xi+\xi^{\prime})=A_{i}(\xi)+A_{i}(\xi^{\prime})$ . We want to show that $A_{i+1}(\xi+\xi^{\prime})=A_{i+1}(\xi)+A_{i+1}(\xi^{\prime})$ . For any distribution $\xi$ , if $n$ denotes the number of neighbors of $w_{i+1}$ with indices greater than $i+1$ , then all of the following are true:

•

$A_{i+1}(\xi)(w_{i+1})=0$ ,
•

for $w_{j}\in N(w_{i+1})$ with $j>i+1$ , we have $A_{i+1}(\xi)(w_{j})=A_{i}(\xi)(w_{j})+\frac{1}{n}A_{i}(\xi)(w_{i+1})$ ,
•

for all other vertices $w$ , we have $A_{i+1}(\xi)(w)=A_{i}(\xi)(w)$ .

Thus, $A_{i+1}(\xi+\xi^{\prime})(w_{i+1})=0=0+0=A_{i+1}(\xi)(w_{i+1})+A_{i+1}(\xi^{\prime})(w_{i+1})$ . For $w_{j}\in N(w_{i+1})$ with $j>i+1$ , we have that

	$\displaystyle A_{i+1}(\xi+\xi^{\prime})(w_{j})$	$\displaystyle=A_{i}(\xi+\xi^{\prime})(w_{j})+\frac{1}{n}A_{i}(\xi+\xi^{\prime})(w_{i+1})$
		$\displaystyle=A_{i}(\xi)(w_{j})+\frac{1}{n}A_{i}(\xi)(w_{i+1})+A_{i}(\xi^{\prime})(w_{j})+\frac{1}{n}A_{i}(\xi^{\prime})(w_{i+1})$
		$\displaystyle=A_{i+1}(\xi)(w_{j})+A_{i+1}(\xi^{\prime})(w_{j}).$

Finally, for all other vertices $w$ , we have that

A_{i+1}(\xi+\xi^{\prime})(w)=A_{i}(\xi+\xi^{\prime})(w)=A_{i}(\xi)(w)+A_{i}(\xi^{\prime})(w)=A_{i+1}(\xi)(w)+A_{i+1}(\xi^{\prime})(w).

We have shown $A_{i+1}(\xi+\xi^{\prime})=A_{i+1}(\xi)+A_{i+1}(\xi^{\prime})$ for all the vertices, so we have proven the inductive step and thus the lemma. ∎

In Definition 4.4 and the subsequent results, we define the inequalities used in conjunction with the tree-based transport algorithm and show that when these inequalities are satisfied, the Wasserstein distance between $\xi$ and $\tilde{\textbf{0}}$ will be 1.

Definition 4.4.

For any graph $G$ , zero-sum distribution $\xi$ , spanning tree $T$ , and $r$ -monotone ordering $\mathcal{O}=(w_{1},\ldots w_{n})$ on $V(G)$ , define the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi)$ to be the union of the following two sets of inequalities:

•

$I_{1}$ : the set of inequalities of the form $\xi(w_{j})A_{i}(\xi)(w_{j})>0$ for all $0\leq i\leq|V(G)|-2$ and $i<j\leq V(G)$ ,
•

$I_{2}$ : the set of inequalities of the form $\xi(t)\xi(w)<0$ for all $t\sim w$

Lemma 4.5.

If the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi)$ are satisfied, then the cost of the transportation plan is at most the sum of positive mass in $\xi$ , i.e., $\displaystyle C(A(G,T,\mathcal{O},\xi))\leq\frac{1}{2}\sum_{w\in G}|\xi(w)|.$

Proof.

We note that the inequalities in $I_{1}$ mean that for any vertex $w_{j}$ , the sign of $A_{i}(\xi)(w_{j})$ stays the same until the mass becomes 0 at the $j$ th step, at which point it remains 0 for the rest of the algorithm.

Since only positive mass moves, it suffices to show that all the positive mass of $\xi$ moves a distance of at most 1. At each step of the tree-based transport algorithm, any mass that moves must move a distance of exactly 1. Thus, it suffices to show that all mass moves at most one time in $A(G,T,\mathcal{O},\xi)$ .

To show this, we demonstrate that the mass that moves at each step of the algorithm has not moved before, since this means all mass moves at most once overall. We begin by demonstrating that every time positive mass moves, it moves from a vertex $w$ for which $\xi(w)>0$ .

The only way for mass to move is via the $i$ th step of the tree-based transport algorithm, which starts from the distribution $A_{i-1}(\xi)$ . Suppose the vertices are $w_{1},\ldots w_{n}$ . If $A_{i-1}(\xi)(w_{i})$ is zero, then no mass moves on the $i$ th step. If $A_{i-1}(\xi)(w_{i})$ is positive, then at the $i$ th step, mass moves away from $w_{i}$ . In addition, by the inequalities in $I_{1}$ , if $A_{i-1}(\xi)(w_{i})$ is positive then $\xi(w_{i})$ is positive and if $A_{i-1}(\xi)(w_{i})$ is negative then $\xi(w_{i})$ is negative. Thus, by the inequalities in $I_{2}$ , we have $\xi(t)>0$ for all $t\sim w_{i}$ . If $A_{i-1}(\xi)(w_{i})$ is negative, mass moves from these $t\sim w_{i}$ to $w_{i}$ , so all mass movements are from a vertex $t$ for which $\xi(t)>0$ . Thus, in all three of these cases, every time positive mass moves, it moves from a vertex $w$ for which $\xi(w)>0$ .

Thus, consider the $i$ th vertex, call this $w$ , and suppose that $\xi(w)>0$ . Then, by the inequalities in $I_{2}$ , we know $\xi$ has negative mass at the neighbors of $w$ , so throughout all steps of the algorithm, the mass at the neighbors of $w$ was nonpositive. This means that anytime executing a step for one of the neighbors of $w$ changed the mass at $w$ , mass moved from $w$ to its neighbors. Because no mass moved from another vertex to $w$ , any remaining positive mass at $w$ has not yet moved. We also know that the remaining mass at $w$ is always nonnegative by the inequalities in $I_{1}$ . Thus, whenever we execute a step of the algorithm for one of the neighbors of $w$ , the nonnegative mass that moves from $w$ has not yet moved.

Furthermore, during the $i$ th step of the algorithm, all the remaining nonnegative mass at $w$ moves away from it, and this mass has not yet moved. Mass movements due to steps of the algorithm for neighbors of $w$ and due to the $i$ th step, which is for $w$ , make up all possible movements of the mass initially at $w$ . This argument holds for all vertices $w$ for which $\xi(w)>0$ , so all possible movements of positive mass move mass that has not been moved before. Thus, we are done. ∎

Corollary 4.6.

For any graph $G$ and zero-sum distribution $\xi$ , if for some $T$ and $\mathcal{O}$ the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi)$ are satisfied, then

\displaystyle W(\xi,0)=\frac{1}{2}\sum_{w\in G}|\xi(w)|.

Proof.

By Lemma 4.5, we have that $\frac{1}{2}\sum_{w\in G}|\xi(w)|$ , the sum of positive mass, is the upper bound. For the lower bound, we note that all positive mass must move because we only move positive mass. Thus, all positive mass must move at least a distance of 1, so $W(\xi,0)$ will be at least the sum of positive mass. ∎

Corollary 4.7.

For a Guvab $\mathcal{G}$ where $W=1$ and $\beta<1$ , suppose that there exists some spanning tree $T$ of $G$ and $r$ -monotone ordering $\mathcal{O}$ such that $\xi_{k}$ satisfies the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi_{k})$ . Then $W_{k}(\mathcal{G})=1$ .

Proof.

Recall that when $W=1$ and $\beta<1$ , we must have that $G$ is bipartite, $u$ and $v$ are on different sides of the bipartite graph, and $\alpha=\beta=0$ . Thus, for all $k$ we have that $\mu_{k}$ and $\nu_{k}$ are nonzero on disjoint sets of vertices, since at all times $\mu_{k}$ is nonzero only on one side and $\nu_{k}$ is nonzero only on the other side. Thus $\sum_{w\in G}|\xi(w)|=2$ , so by Corollary 4.6 we have that

W_{k}(\mathcal{G})=W(\xi_{k},0)=\frac{1}{2}\sum_{w\in G}|\xi(w)|=1.

∎

Now all that remains to be shown is that once $\xi_{k}$ is sufficiently close to either of the stationary distributions $\xi^{0}$ or $\xi^{1}$ , the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi_{k})$ will be satisfied. To prove this, we will first show that $\xi^{0}$ and $\xi^{1}$ lie on the interior of the region of distributions that satisfy the inequalities. The next lemma helps show that $\xi^{0}$ and $\xi^{1}$ satisfy the inequalities.

Lemma 4.8.

Suppose we have a bipartite graph $G$ with sides $S_{0}$ and $S_{1}$ and a distribution $\xi$ such that for $w\in S_{0}$ we have $\xi(w)=\frac{\deg(w)}{|E(G)|}$ and for $w\in S_{1}$ we have $\xi(w)=-\frac{\deg(w)}{|E(G)|}$ . Then pick an arbitrary spanning tree T and $r$ -monotone ordering $\mathcal{O}$ on $V(G)$ . Consider the tree-based transport plan $A(G,T,\mathcal{O},\xi)$ . After each step, for each $i$ for $i\leq n-2$ , we have that $A_{i}(\xi)(w_{j})\geq\frac{1}{|E(G)|}$ for $w_{j}\in S_{0}$ with $j>i$ and that $A_{i}(\xi)(w_{j})\leq-\frac{1}{|E(G)|}$ for $w_{j}\in S_{1}$ with $j>i$ .

Proof.

We know by Lemma 4.3 that for all $w\in G$ and for all $0\leq i\leq n-2$ , we have $A_{i}(|E(G)|\xi)(w)=|E(G)|((A_{i}(\xi)(w))$ . Thus, it suffices to show that after all steps $i$ for $i\leq n-2$ , we have that $A_{i}(|E(G)|\xi)(w_{j})\geq 1$ for $w_{j}\in S_{0}$ with $j>i$ and that $A_{i}(|E(G)|\xi)(w_{j})\leq-1$ for $w_{j}\in S_{1}$ with $j>i$ .

To prove this, for all $0\leq i\leq n-2$ , we define the graph $G_{i}$ to consist of the vertex set $V(G_{i})=\{w_{i+1},\ldots w_{n}\}$ and all the edges of $E(G)$ that have both endpoints in $V(G_{i})$ . It suffices to show by induction on $i$ that for $i\leq n-2$ , we have $A_{i}(|E(G)|\xi)(w_{j})=\deg_{G_{i}}w_{j}$ for $w_{j}\in S_{0}$ with $j>i$ and we have $A_{i}(|E(G)|\xi)(w_{j})=-\deg_{G_{i}}w_{j}$ for $w_{j}\in S_{1}$ with $j>i$ .

Base case: When $i=0$ , we note that $G_{0}=G$ . When $i=0$ , by the definition of $\xi$ , we have that $A_{i}(|E(G)|\xi)(w_{j})=\deg_{G}w_{j}$ for $w_{j}\in S_{0}$ with $j>i$ and that $A_{i}(|E(G)|\xi)(w_{j})=-\deg_{G}w_{j}$ for $w_{j}\in S_{1}$ with $j>i$ .

Inductive step: The inductive hypothesis is that $A_{i}(|E(G)|\xi)(w_{j})=\deg_{G_{i}}w_{j}$ for $w_{j}\in S_{0}$ with $j>i$ and that $A_{i}(|E(G)|\xi)(w_{j})=-\deg_{G_{i}}w_{j}$ for $w_{j}\in S_{1}$ with $j>i$ . Given that this is true for $i-1$ , we want to show that it is true for $i$ .

We suppose that $w_{i}\in S_{0}$ ; the case where $w_{i}\in S_{1}$ will proceed analogously. After $i-1$ steps, $w_{i}$ has a mass of $\deg_{G_{i-1}}w_{i}$ . During the $i$ th step, this mass is distributed evenly among $w_{j}\sim w_{i}$ with $j>i$ ; we note that there are exactly $\deg_{G_{i-1}}w_{i}$ of these neighbors. Thus, each $w_{j}$ will receive $+1$ mass. By the inductive hypothesis we have that before step $i$ , each of these neighbors $w_{j}$ had $-\deg_{G_{i-1}}w_{j}$ mass, since each of the neighbors of $w_{i}$ is in $S_{1}$ , the opposite side of the bipartite graph. Then, after step $i$ , each $w_{j}$ has mass $-(\deg_{G_{i-1}}w_{j}-1)$ , and the remaining vertices with indices greater than $i$ have the same mass as before. We note that for all $\ell>i$ , if $w_{\ell}\sim w_{i}$ then $\deg_{G_{i}}w_{\ell}=\deg_{G_{i-1}}w_{\ell}-1$ because the edge $\{(w_{\ell},w_{i}\}$ is being removed, and otherwise $\deg_{G_{i}}w_{\ell}=\deg_{G_{i-1}}w_{\ell}$ . We have just shown that this is exactly the mass at all vertices with indices greater than $i$ after the $i$ th step of the algorithm. Thus, at each vertex $w_{\ell}$ with $\ell>i$ , we have that for $w_{\ell}\in S_{0}$ , the mass at $w_{\ell}$ after $i$ steps is $\deg_{G_{i}}w_{\ell}$ and for $w_{\ell}\in S_{1}$ , the mass at $w_{\ell}$ after $i$ steps is $-\deg_{G_{i}}w_{\ell}$ . We proceed analogously in the case where $w_{i}\in S_{1}$ . This proves the inductive hypothesis, and therefore proves the lemma. ∎

We are now ready to show that $\xi^{0}$ and $\xi^{1}$ lie on the interior of the region of distributions that satisfy the inequalities.

Corollary 4.9.

For any Guvab $\mathcal{G}$ where $W=1$ and $\beta<1$ , we have that $\xi^{0}$ and $\xi^{1}$ lie strictly on the interior of the region $R\subset\mathbb{R}^{|V(G)|}$ of distributions $\xi$ that satisfy the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi)$ .

Proof.

We prove this for $\xi^{0}$ ; by symmetry it will hold for $\xi^{1}$ as well since $\xi^{1}=-\xi^{0}$ . If the sides of $G$ are $S_{0}$ and $S_{1}$ , with $u\in S_{0}$ and $v\in S_{1}$ , then for $w\in S_{0}$ , we have that

\xi^{0}(w)=\lim_{k\to\infty}\mu_{2k}(w)-\lim_{k\to\infty}\nu_{2k}(w)=\frac{\deg(w)}{|E(G)|}-0=\frac{\deg(w)}{|E(G)|}

and for $w\in S_{1}$ we have that

\xi^{0}(w)=\lim_{k\to\infty}\mu_{2k+1}(w)-\lim_{k\to\infty}\nu_{2k+1}(w)=0-\frac{\deg(w)}{|E(G)|}=-\frac{\deg(w)}{|E(G)|}.

Then for all $t,w\in G$ such that $t\sim w$ , we have that $\xi(t)\xi(w)<0$ . We also have that by Lemma 4.8, $\xi^{0}(w_{j})A_{i}(\xi^{0})(w_{j})\geq\frac{1}{|E(G)|^{2}}>0$ for all $0\leq i\leq|V(G)|-2$ and $i<j\leq V(G)$ . Thus, $\xi^{0}$ and $\xi^{1}$ lie strictly on the interior of the region $R\subset\mathbb{R}^{|V(G)|}$ of distributions $\xi$ that satisfy the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi)$ . ∎

Using these results, we are now ready to prove the main claim that the Wasserstein distance is eventually constant when $W=1$ and $\beta<1$ .

We first define a variable that corresponds to how long $\{W_{k}\}$ takes to reach constancy. Note that this variable can be infinity if $\{W_{k}\}$ is not eventually constant.

Definition 4.10.

For any Guvab $\mathcal{G}$ where $W_{k}\to 1$ , define $\rho(\mathcal{G})$ to be

\inf\{N\in\mathbb{Z}:\{W(\mu_{k},\nu_{k})\}_{k\geq N}=(1,1,1,\ldots)\}.

Theorem 4.11.

For any Guvab $\mathcal{G}$ with $W=1$ and $\beta<1$ , we have $\rho(\mathcal{G})<\infty$ .

Proof.

Pick an arbitrary spanning tree $T$ of $G$ and $r$ -monotone ordering $\mathcal{O}$ . By Corollary 4.9, $\xi^{0}$ and $\xi^{1}$ are on the interior of the region $R\subset\mathbb{R}^{|V(G)|}$ of distributions $\xi$ that satisfy the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi)$ . We note that all the inequalities in $\mathcal{I}(G,T,\mathcal{O},\xi)$ can be written in the form $f(\xi)>0$ , where $f:\mathbb{R}^{|V(G)|}\to\mathbb{R}$ is a continuous function. Thus, by the definition of a continuous function, there exists some $\varepsilon>0$ such that for all $\xi\in\mathbb{R}^{|V(G)|}$ that satisfy $|\xi(w)-\xi^{0}(w)|<\varepsilon$ for all $w\in G$ or satisfy $|\xi(w)-\xi^{1}(w)|<\varepsilon$ for all $w\in G$ , we have that $\xi\in R$ . We also know, by the formal definition of a limit, that there exists some $N$ such that for all $k\geq N$ and all $w\in G$ , we have $|\xi_{2k}(w)-\xi^{0}(w)|<\varepsilon$ and $|\xi_{2k+1}(w)-\xi^{1}(w)|<\varepsilon$ . Thus, for all $k\geq 2N$ , we have $\xi_{k}\in R$ . By Corollary 4.7, for all $k\geq 2N$ , we have that $W_{k}=1$ . Hence $\rho(\mathcal{G})\leq 2N<\infty$ . ∎

We next hope to characterize how long it takes the Wasserstein distance of these Guvabs with $W=1$ and $\beta<1$ to become constant. In particular, we prove upper and lower bounds for $\rho(\mathcal{G})$ . We start with the upper bound. To prove this upper bound, we first prove a lemma quantifying exactly how close to $\xi^{0}$ or $\xi^{1}$ a distribution must be in order for the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi)$ to be satisfied.

Lemma 4.12.

Consider a Guvab with $W=1$ and $\beta<1$ . Pick an arbitrary spanning tree $T$ and $r$ -monotone ordering $\mathcal{O}$ . Let $\varepsilon(G)=\frac{1}{|V||E|}$ . If for a distribution $\xi$ it is true that for all vertices $w$ we have that $|\xi(w)-\xi^{0}(w)|<\varepsilon(G)$ or it is true that for all vertices $w$ we have that $|\xi(w)-\xi^{1}(w)|<\varepsilon(G)$ , then $\xi$ satisfies the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi)$ .

Proof.

We prove this for $\xi^{0}$ , and an analogous argument will hold for $\xi^{1}$ .

We note that by Lemma 4.8 we have that if we start with $\xi^{0}$ , then at any point in the tree-based transport algorithm through step $|V|-2$ , the absolute value of the mass at any vertex is at least $\frac{1}{|E|}$ . Thus, if at any point $i$ in the algorithm through step $|V|-2$ the mass at a vertex differs by at most $\frac{1}{|E|}$ from $A_{i}(\xi^{0})$ , then the tree-based transport inequalities $\mathcal{I}$ are satisfied because mass is never the wrong sign.

It thus suffices to show that for all $0\leq i\leq|V|-2$ and for all $w\in G$ , we have $|A_{i}(\xi)(w)-A_{i}(\xi^{0})(w)|\leq\frac{1}{|E|}$ . To prove this, we note that $\xi(w)-\xi^{0}(w)$ is a zero-sum distribution, and by Lemma 4.3 for all $i$ and for all $w$ , we have $A_{i}(\xi)(w)=A_{i}(\xi^{0})(w)+A_{i}(\xi-\xi^{0})(w)$ . We consider the quantity $\sum_{w\in G}|A_{i}(\xi)(w)-A_{i}(\xi^{0})(w)|=\sum_{w\in G}|A_{i}(\xi-\xi^{0})(w)|$ . This will be nonincreasing as $i$ gets larger, since at step $i$ of the algorithm the absolute value of the mass at $w_{i}$ decreases by exactly $|A_{i-1}(\xi-\xi^{0})(w_{i})|$ while the sum of absolute values at $w_{i}$ ’s neighbors cannot increase by more than $|A_{i-1}(\xi-\xi^{0})(w_{i})|$ . The maximum value of this sum is $|V|\varepsilon(G)$ (since this is an upper bound for the value at the beginning). We know that $\max_{w\in G}|A_{i}(\xi)(w)-A_{i}(\xi^{0})(w)|\leq\sum_{w\in G}|A_{i}(\xi)(w)-A_{i}(\xi^{0})(w)|$ so $\max_{w\in G}|A_{i}(\xi)(w)-A_{i}(\xi^{0})(w)|\leq|V|\varepsilon(G)=\frac{1}{|E|}$ , which is exactly what we wanted to show, so we are done. ∎

With this lemma established, we can now prove our upper bound for $\rho(\mathcal{G})$ .

Lemma 4.13.

Let $\lambda_{\max}$ be $\displaystyle\max_{|\lambda|\in L:|\lambda|<1}|\lambda|$ where L is the set of all eigenvalues of $X$ and $Y$ . Then for a Guvab $\mathcal{G}$ where $W=1$ and $\beta<1$ , we have $\displaystyle\rho(\mathcal{G})\leq\frac{10\ln|V|}{1-\lambda_{\max}^{2}}$ .

Proof.

We use [DS91, Prop. 3]. The Markov chains $X_{2k}$ and $Y_{2k}$ are both converging to their even stationary distributions $\lim_{k\to\infty}\mu_{2k}$ and $\lim_{k\to\infty}\nu_{2k}$ . For convenience, denote $\lim_{k\to\infty}\mu_{2k}$ by $\gamma_{u}$ and $\lim_{k\to\infty}\nu_{2k}$ by $\gamma_{v}$ . Once at all vertices, $\mu_{2k}$ and $\nu_{2k}$ are both less than or equal to $\frac{1}{2|V||E|}$ away from their respective stationary distributions, $\xi_{2k}$ will satisfy the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi_{2k})$ by Lemma 4.12. Since $X_{2k}$ and $Y_{2k}$ are both Markov chains with limiting distributions, we use notation analogous to that of [Sin92] and say that $\Delta_{\textrm{even}\,u}(k)=\displaystyle\frac{1}{2}\sum_{w\in G}|\mu_{2k}(w)-\gamma_{u}(w)|$ . Similarly, $\Delta_{\textrm{even}\,v}(k)=\displaystyle\frac{1}{2}\sum_{w\in G}|\nu_{2k}(w)-\gamma_{v}(w)|$ . Then for $\varepsilon>0$ and $x\in\{u,v\}$ we let $\tau_{\textrm{even}\,x}(\varepsilon)$ be the minimum nonnegative integer $k$ such that $\Delta_{\textrm{even}\,x}(k^{\prime})\leq\varepsilon$ for all $k^{\prime}\geq k$ . Thus, by [DS91, Prop. 3], the time $\rho_{\textrm{even}}$ it takes for $W_{2k}$ to eventually have distance $1$ satisfies

\rho_{\textrm{even}}\leq\max_{x\in\{u,v\}}2\tau_{\textrm{even}\,x}(\frac{1}{2|V||E|})\leq\max_{x\in\{u,v\}}\frac{2}{1-\lambda_{\max}^{2}}(\ln{\frac{1}{\gamma_{x}(x)}}+\ln{\cfrac{1}{\frac{1}{2|V||E|}}}).

Then we just need to bound the right-hand side above. This gives

	$\displaystyle\displaystyle\rho_{\textrm{even}}$	$\displaystyle\leq\frac{2}{1-\lambda_{\max}^{2}}(\ln{\|E\|}+\ln{2\|V\|\|E\|})=\frac{2}{1-\lambda_{\max}^{2}}(\ln{2\|V\|\|E\|^{2}})$
		$\displaystyle\leq\frac{2}{1-\lambda_{\max}^{2}}(\ln{\|V\|^{3}(\|V\|-1)^{2}})$
		$\displaystyle<\frac{2}{1-\lambda_{\max}^{2}}\cdot 5\ln(\|V\|)=\frac{10\ln\|V\|}{1-\lambda_{\max}^{2}}.$

By similar reasoning, the same bound works for $\rho_{\textrm{odd}}$ , the time it takes for $W_{2k+1}$ to eventually have distance $1$ . Thus, $\displaystyle\frac{10\ln|V|}{1-\lambda_{\max}^{2}}$ is an upper bound for $\rho(\mathcal{G})$ . ∎

We now establish a lower bound for $\rho(\mathcal{G})$ .

Lemma 4.14.

For a Guvab $\mathcal{G}$ where $W=1$ and $\beta<1$ , we have $\displaystyle\rho(\mathcal{G})\geq\frac{\emph{{d}(u,v)}}{2}-1$ .

Proof.

We note that $\mu_{k}(t)=0$ for $t\in V$ if $\textrm{d}(t,u)>k$ . Similarly, $\nu_{k}(w)=0$ for $w\in V$ if $\textrm{d}(w,v)>k$ . Suppose $k<\frac{\textrm{d}(u,v)}{2}-1$ and consider any pair of vertices $t,w$ such that $\mu_{k}(t)>0$ and $\nu_{k}(w)>0$ . Then $\textrm{d}(t,u)\leq\frac{\textrm{d}(u,v)}{2}-1$ and $\textrm{d}(w,v)\leq\frac{\textrm{d}(u,v)}{2}-1$ , so $\textrm{d}(t,w)\geq\textrm{d}(u,v)-(\textrm{d}(t,u)+\textrm{d}(w,v))=2$ . Therefore all mass will have to move a distance of at least 2 to get from $\mu_{k}$ to $\nu_{k}$ , so $W_{k}\geq 2>1$ . ∎

5 Convergence when $W=\frac{1}{2}$

In this section, we consider Guvabs where $W=\frac{1}{2}$ and $\beta<1$ . Recall that these are exactly the Guvabs for which $G$ is bipartite and $0=\alpha<\beta<1$ . As in the previous section, and for similar reasons, the Wasserstein distance will eventually be the sum of positive mass. In this case, however, the Wasserstein distance is not eventually constant but rather an exponential that we can express explicitly. To prove this, we proceed by a similar strategy as in the $W=1$ case. In particular, we show that the tree-based transport inequalities will eventually be satisfied, and compute the Wasserstein distance when these inequalities are satisfied.

In the next three results, we show that the tree-based transport inequalities will eventually be satisfied, and provide an initial expression for what the Wasserstein distance will be when the tree-based transport inequalities are satisfied. Later, we will calculate exactly what this expression for the Wasserstein distance evaluates to.

We begin by showing in the next two results that, analogously to before, $\xi^{0}$ and $\xi^{1}$ lie on the interior of the region of distributions that satisfy the inequalities.

Lemma 5.1.

Suppose we have a bipartite graph $G$ with sides $S_{0}$ and $S_{1}$ and a distribution $\xi$ such that for $w\in S_{0}$ $\xi(w)=\frac{\deg(w)}{2|E(G)|}$ and for $w\in S_{1}$ $\xi(w)=-\frac{\deg(w)}{2|E(G)|}$ . Then pick an arbitrary spanning tree T and $r$ -monotone ordering $\mathcal{O}$ on $V(G)$ . Consider the tree-based transport plan $A(G,T,\mathcal{O},\xi)$ . We have that after each step $i$ for $i\leq n-2$ , for $w_{j}\in S_{0}$ with $j>i$ , we have that $A_{i}(\xi)(w_{j})\geq\frac{1}{2|E(G)|}$ and for $w_{j}\in S_{1}$ with $j>i$ we have that $A_{i}(\xi)(w_{j})\leq-\frac{1}{2|E(G)|}$ .

Proof.

We note that this is nearly the same as Lemma 4.8, but differs by a constant factor of $\frac{1}{2}$ . Given the distribution $\xi$ , we know by Lemma 4.8 that after all steps $i$ for $i\leq n-2$ , for $w_{j}\in S_{0}$ with $j>i$ , we have that $A_{i}(2\xi)(w_{j})\geq\frac{1}{|E(G)|}$ and for $w_{j}\in S_{1}$ with $j>i$ we have that $A_{i}(2\xi)(w_{j})\leq-\frac{1}{|E(G)|}$ . We also know that $A_{i}(2\xi)=2A_{i}(\xi)$ by Lemma 4.3 so this means for $w_{j}\in S_{0}$ with $j>i$ , we have that $2A_{i}(\xi)(w_{j})\geq\frac{1}{|E(G)|}$ and for $w\in S_{1}$ with $j>i$ , we have that $2A_{i}(\xi)(w_{j})\leq-\frac{1}{|E(G)|}$ . Dividing both sides by 2, we get that after all steps $i$ for $i\leq n-2$ , for $w_{j}\in S_{0}$ with $j>i$ , we have that $A_{i}(\xi)(w_{j})\geq\frac{1}{2|E(G)|}$ and for $w_{j}\in S_{1}$ with $j>i$ , we have that $A_{i}(\xi)(w_{j})\leq-\frac{1}{2|E(G)|}$ . ∎

Corollary 5.2.

For any Guvab $\mathcal{G}$ where $W=\frac{1}{2}$ and $\beta<1$ , we have that $\xi^{0}$ and $\xi^{1}$ lie strictly on the interior of the region $R\subset\mathbb{R}^{|V(G)|}$ of distributions $\xi$ that satisfy the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi)$ .

Proof.

We note that when $W=\frac{1}{2}$ and $\beta<1$ , we have that $\beta>0$ so for all $w\in G$ , we have that $\lim_{k\to\infty}\nu_{k}(w)=\frac{\deg(w)}{2|E(G)|}$ . We also know that if $G$ has sides $S_{0}$ and $S_{1}$ with $u\in S_{0}$ , for all $w\in S_{0}$ we have that $\displaystyle\lim_{k\to\infty}\mu_{2k}(w)=\frac{\deg(w)}{|E(G)|}$ and for all $w\in S_{1}$ we have that $\displaystyle\lim_{k\to\infty}\mu_{2k}(w)=0$ . Similarly, for all $w\in S_{1}$ we have that $\displaystyle\lim_{k\to\infty}\mu_{2k+1}(w)=\frac{\deg(w)}{|E(G)|}$ and for all $w\in S_{0}$ we have that $\displaystyle\lim_{k\to\infty}\mu_{2k+1}(w)=0$ . Thus for all $w\in S_{0}$ we have that $\xi^{0}(w)=\frac{\deg(w)}{2|E(G)|}$ and for all $w\in S_{1}$ we have that $\xi^{0}(w)=\frac{-\deg(w)}{2|E(G)|}$ . Also $\xi^{1}=-\xi^{0}$ .

We prove the claim for $\xi^{0}$ - by symmetry it will hold for $\xi^{1}$ as well since $\xi^{1}=-\xi^{0}$ . If the sides of $G$ are $S_{0}$ and $S_{1}$ , then for $w\in S_{0}$ $\xi^{0}(w)=\frac{\deg(w)}{2|E(G)|}$ and for $w\in S_{1}$ $\xi^{0}(w)=-\frac{\deg(w)}{2|E(G)|}$ . Then we have that for all $t,w\in G$ such that $t\sim w$ , the product $\xi(t)\xi(w)<0$ . We also have that by Lemma 5.1, $\xi^{0}(w)A_{i}(\xi^{0})(w)\geq\frac{1}{4|E(G)|^{2}}>0$ holds for all $w\in G$ and for all $0\leq i\leq|V(G)|-2$ . ∎

We now know that $\xi^{0}$ and $\xi^{1}$ are on the interior of the region satisfying the inequalities. We can hence proceed similarly to section 4 to show that $\xi_{k}$ will eventually satisfy the inequalities and thus $W_{k}$ will be the sum of positive mass.

Corollary 5.3.

For any Guvab where $W=\frac{1}{2}$ and $\beta<1$ , there exists $N$ such that for all $k\geq N$ ,

W_{k}=\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|.

Proof.

We know by Corollary 5.2 that $\xi^{0}$ and $\xi^{1}$ lie on the interior of $R$ . Therefore, as in the proof of Theorem 4.11, by the formal definition of a limit there exists some $N$ such that for all $k\geq N$ , we have that $\xi_{k}\in R$ and thus $\mathcal{I}(G,T,\mathcal{O},\xi_{k})$ are satisfied. We note that Corollary 4.6 holds for any Guvab, including the ones we are currently inspecting, so if the tree-based transport inequalities $\mathcal{I}(G,T,\mathcal{O},\xi_{k})$ are satisfied, $W_{k}=\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|$ . Thus for all $k\geq N$ , we have that $W_{k}=\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|.$ ∎

We now know that eventually, the Wasserstein distance will be the sum of positive mass, so it remains to calculate the sum of positive mass. To do this, we will first need to define an auxiliary Markov chain and prove some properties of this Markov chain.

Definition 5.4.

Let $s(\alpha)$ be a two-state Markov chain with states $s_{0}$ and $s_{1}$ , where we start at $s_{0}$ , and at all times we have an $\alpha$ chance of staying at our current state and a $1-\alpha$ chance of switching to the other state. Then define $(\sigma_{\alpha})_{k}$ to be the probability distribution after $k$ steps of this Markov chain.

Lemma 5.5.

For the Markov chain defined above, $(\sigma_{\alpha})_{k}(s_{0})=0.5+0.5(2\alpha-1)^{k}$ and $(\sigma_{\alpha})_{k}(s_{1})=0.5-0.5(2\alpha-1)^{k}$ .

Proof.

We will proceed by induction on $k$ , using the transition probabilities to go from $(\sigma_{\alpha})_{k}$ to $(\sigma_{\alpha})_{k+1}$ .

Base case: When $k=0$ , we know that, since the Markov chain starts at $s_{0}$ , we have $(\sigma_{\alpha})_{0}(s_{0})=1=0.5+0.5(2\alpha-1)^{0}$ and $(\sigma_{\alpha})_{0}(s_{1})=0=0.5-0.5(2\alpha-1)^{0}$ .

Inductive step: Suppose $(\sigma_{\alpha})_{k}(s_{0})=0.5+0.5(2\alpha-1)^{k}$ and $(\sigma_{\alpha})_{k}(s_{1})=0.5-0.5(2\alpha-1)^{k}$ . We know that

	$\displaystyle(\sigma_{\alpha})_{k+1}(s_{0})$	$\displaystyle=\alpha(\sigma_{\alpha})_{k}(s_{0})+(1-\alpha)(\sigma_{\alpha})_{k}(s_{1})$
		$\displaystyle=\alpha(0.5+0.5(2\alpha-1)^{k})+(1-\alpha)(0.5-0.5(2\alpha-1)^{k})$
		$\displaystyle=0.5+(2\alpha-1)\cdot 0.5(2\alpha-1)^{k}$
		$\displaystyle=0.5+0.5(2\alpha-1)^{k+1}.$

Similarly,

	$\displaystyle(\sigma_{\alpha})_{k+1}(s_{1})$	$\displaystyle=(1-\alpha)(\sigma_{\alpha})_{k}(s_{0})+\alpha(\sigma_{\alpha})_{k}(s_{1})$
		$\displaystyle=(1-\alpha)(0.5+0.5(2\alpha-1)^{k})+\alpha(0.5-0.5(2\alpha-1)^{k})$
		$\displaystyle=0.5+(2\alpha-1)\cdot(-0.5)(2\alpha-1)^{k}$
		$\displaystyle=0.5-0.5(2\alpha-1)^{k+1}.$

∎

We now have all the tools we need to explicitly calculate the sum of positive mass. The next lemma tells us what the sum of positive mass will be.

Lemma 5.6.

When $\beta<1$ and $W=\frac{1}{2}$ , there exists some $N$ such that for all $k\geq N$ , we either have that

\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|=0.5+0.5(1-2\beta)^{k}

or that

\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|=0.5-0.5(1-2\beta)^{k}.

Proof.

We know $G$ is bipartite; say it has sides $S_{0}$ and $S_{1}$ . We know $0=\alpha<\beta<1$ . Assume without loss of generality that $v\in S_{0}$ . If $u$ is on side $S_{0}$ , then eventually for $w\in S_{0}$ , we have that $\xi_{2k}(w)$ gets arbitrarily close to $\displaystyle\frac{\deg(w)}{2|E(G)|}$ and for $w\in S_{1}$ , we have that $\xi_{2k}(w)$ gets arbitrarily close to $\displaystyle-\frac{\deg(w)}{2|E(G)|}$ . In particular, for some $N$ , for all $k\geq N$ we have $\xi_{2k}(w)>0$ if and only if $w\in S_{0}$ . Then for some $N$ , for all $k\geq N$ , when $u\in S_{0}$ , the total positive mass of $\xi_{2k}$ is $\displaystyle\sum_{w\in S_{0}}\xi_{2k}(w)$ . Similarly, for some $N$ , for all $k\geq N$ we have $\xi_{2k+1}(w)>0$ if and only if $w\in S_{1}$ . Thus, for some $N$ , for all $k\geq N$ , when $u\in S_{0}$ , the total positive mass of $\xi_{2k+1}$ is $\displaystyle\sum_{w\in S_{1}}\xi_{2k+1}(w)$ .

By an analogous argument, when $u\in S_{1}$ , for some $N$ , for all $k\geq N$ , the total positive mass of $\xi_{2k}$ is $\displaystyle\sum_{w\in S_{0}}\xi_{2k}(w)$ . Similarly, for some $N$ , for all $k\geq N$ , the total positive mass of $\xi_{2k+1}$ is $\displaystyle\sum_{w\in S_{1}}\xi_{2k+1}(w)$ .

Thus, to calculate what the sum of positive mass eventually equals, we simply consider how much mass of $\mu$ and $\nu$ is on each side of the bipartite graph so that we know how much mass of $\xi$ is on each side. We note that for any random walk with laziness $\beta$ , at all steps the mass on a given side has a probability $\beta$ of staying on that side and a probability $1-\beta$ of moving to the other side, since any mass that moves along an edge moves to the other side. Thus the mass of $\nu_{k}$ on $S_{0}$ and $S_{1}$ behaves identically to the mass of $s(\beta)$ on $s_{0}$ and $s_{1}$ . In other words, the amount of mass of $\nu_{k}$ on $S_{0}$ is $(\sigma_{\beta})_{k}(s_{0})=0.5+0.5(2\beta-1)^{k}$ and the amount of mass of $\nu_{k}$ on $S_{1}$ is $(\sigma_{\beta})_{k}(s_{1})=0.5-0.5(2\beta-1)^{k}$ by Lemma 5.5. Similarly, if $u\in S_{0}$ , then the amount of mass of $\mu_{k}$ on $S_{0}$ is $(\sigma_{0})_{k}(s_{0})=0.5+0.5(-1)^{k}$ and the amount of mass of $\mu_{k}$ on $S_{1}$ is $(\sigma_{0})_{k}(s_{1})=0.5-0.5(-1)^{k}$ . By symmetry, if $u\in S_{1}$ , then the amount of mass of $\mu_{k}$ on $S_{0}$ is $(\sigma_{0})_{k}(s_{1})=0.5-0.5(-1)^{k}$ and the amount of mass of $\mu_{k}$ on $S_{1}$ is $(\sigma_{0})_{k}(s_{0})=0.5+0.5(-1)^{k}$ .

This means that if $u\in S_{0}$ ,

	$\displaystyle\sum_{w\in S_{0}}\xi_{k}(w)$	$\displaystyle=\sum_{w\in S_{0}}\mu_{k}(w)-\sum_{w\in S_{0}}\nu_{k}(w)$
		$\displaystyle=(\sigma_{0})_{k}(s_{0})-(\sigma_{\beta})_{k}(s_{0})$
		$\displaystyle=0.5+0.5(-1)^{k}-(0.5+0.5(2\beta-1)^{k})$

and

	$\displaystyle\sum_{w\in S_{1}}\xi_{k}(w)$	$\displaystyle=\sum_{w\in S_{1}}\mu_{k}(w)-\sum_{w\in S_{1}}\nu_{k}(w)$
		$\displaystyle=(\sigma_{0})_{k}(s_{1})-(\sigma_{\beta})_{k}(s_{1})$
		$\displaystyle=0.5-0.5(-1)^{k}-(0.5-0.5(2\beta-1)^{k}).$

Then the total positive mass of $\xi_{2k}$ is

\sum_{w\in S_{0}}\xi_{2k}(w)=0.5+0.5(-1)^{2k}-(0.5+0.5(2\beta-1)^{2k})=0.5-0.5(1-2\beta)^{2k}

and the total positive mass of $\xi_{2k+1}$ is

\sum_{w\in S_{1}}\xi_{2k+1}(w)=0.5-0.5(-1)^{2k+1}-(0.5-0.5(2\beta-1)^{2k+1})=0.5-0.5(1-2\beta)^{2k+1}.

Thus, for some $N$ , the sum of the positive mass of $\xi_{k}$ is $0.5-0.5(1-2\beta)^{k}$ for all $k\geq N$ .

If $u\in S_{1}$ , then we have that $\displaystyle\sum_{w\in S_{0}}\xi_{k}(w)=(\sigma_{0})_{k}(s_{1})-(\sigma_{\beta})_{k}(s_{0})$ and we have that $\displaystyle\sum_{w\in S_{1}}\xi_{k}(w)=(\sigma_{0})_{k}(s_{0})-(\sigma_{\beta})_{k}(s_{1})$ . By calculating this out analogously to above, we see that if $u\in S_{1}$ there exists some $N$ such that the sum of positive mass of $\xi_{k}$ is $0.5+0.5(1-2\beta)^{k}$ for all $k\geq N$ . ∎

We now know that the Wasserstein distance will be the sum of positive mass, and we know exactly what the sum of positive mass will eventually be. Thus, we know exactly what the Wasserstein distance will eventually be. The next theorem therefore states explicitly the rate of convergence of the Wasserstein distance when $\beta<1$ and $W=\frac{1}{2}$ .

Theorem 5.7.

For any Guvab where $W=\frac{1}{2}$ and $\beta<1$ , for some $N$ it will be true that for all $k\geq N$ , we have that $|W_{k}-\frac{1}{2}|=0.5|1-2\beta|^{k}$ .

Proof.

Corollary 5.3 tells us that for some $N_{1}$ , we will have $W_{k}=\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|$ for all $k\geq N_{1}$ . Lemma 5.6 tells us that for some $N_{2}$ , we will have $\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|=0.5+0.5(1-2\beta)^{k}$ for all $k\geq N_{2}$ or we will have $\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|=0.5-0.5(1-2\beta)^{k}$ for all $k\geq N_{2}$ . This means that for all $k\geq N_{2}$ , we have $|(\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|)-\frac{1}{2}|=0.5|1-2\beta|^{k}$ . Thus, for all $k\geq\max(N_{1},N_{2})$ , we have

\left|W_{k}-\frac{1}{2}\right|=\left|\left(\frac{1}{2}\sum_{w\in G}|\xi_{k}(w)|\right)-\frac{1}{2}\right|=0.5|1-2\beta|^{k}.

∎

Finally, we want to characterize when the Wasserstein distance is eventually constant when $W=\frac{1}{2}$ and $\beta<1$ . This will fit into our larger characterization of eventual constancy for all Guvabs with $\beta<1$ .

Corollary 5.8.

When $W=\frac{1}{2}$ and $\beta<1$ , we have that $\rho<\infty$ if and only if $\beta=\frac{1}{2}$ .

Proof.

This follows directly from Theorem 5.7. ∎

6 Convergence when $W=0$

In this section we consider the case of Guvabs where $W=0$ and $\beta<1$ . Recall that these are exactly the Guvabs enumerated in Theorem 3.4 for which $\beta<1$ . We start by showing that the rate of convergence of $\{W_{2k}\}$ is exponential when it is not eventually constant. By an analogous argument, the rate of convergence of $\{W_{2k+1}\}$ is exponential when it is not eventually constant. We will then investigate exactly when $\{W_{k}\}$ is eventually constant.

Theorem 6.3 states that unless it is eventually constant, the rate of convergence of $\{W_{2k}\}$ is exponential, and in particular $W_{2k}\sim c\cdot\lambda_{\textrm{even}}^{2k}$ . We go about proving this by showing in the next two lemmas that $W_{2k}$ must be one of finitely many expressions, all of which are approximately some exponential.

The next lemma shows that $W_{2k}$ must be one of finitely many expressions.

Lemma 6.1.

For any Guvab $\mathcal{G}$ , there exists a finite set $F=\{f_{i},f_{2},\ldots,f_{m}\}$ of 1-Lipschitz functions $f_{i}:V(G)\to\mathbb{R}$ such that for all $k$ there exists $f\in F$ such that $\displaystyle W_{k}=\sum_{w\in G}f(w)\xi_{k}(w).$

Proof.

We consider the set $L$ of possible 1-Lipschitz functions $\ell$ on the graph $G$ such that $\sum_{w\in V(G)}\ell(w)=0$ (any other 1-Lipschitz function can be transformed into such a 1-Lipschitz function by adding some value to all entries). The criteria for a function $\ell$ to be a 1-Lipschitz function are that for each pair of vertices $w_{1}$ and $w_{2}$ we have that $\ell(w_{1})-\ell(w_{2})\leq d(w_{1},w_{2})$ and $\ell(w_{2})-\ell(w_{1})\leq d(w_{1},w_{2})$ . We also have that $\sum_{w\in G}\ell(w)=0$ . These each form hyperplanes in $\mathbb{R}^{|V(G)|}$ . Additionally, from these criteria we know that none of the entries of $\ell$ can be more than $|V(G)|$ because then since the max distance between any two vertices is $|V(G)|$ there would be no negative entries. Thus, the set of 1-Lipschitz functions forms a closed set bounded by a polytope in $\mathbb{R}^{|V(G)|}$ . For any cost function $C$ on the graph $G$ , we have that $\sum_{w\in V(G)}C(w)\ell(w)$ is a linear function on $L$ . Thus $\displaystyle\textrm{argmax}_{\ell\in L}\sum_{w\in V(G)}C(w)\ell(w)$ is one of the corners of the polytope. There are finitely many of these corners, corresponding to finitely many 1-Lipschitz functions $\{f_{1},f_{2},\ldots,f_{m}\}$ . We also know that $W_{k}=\max_{\ell\in L}\sum_{w\in V(G)}\xi_{k}(w)\ell(w)$ , so it is thus maximizing the cost function $\xi$ , and thus for all $k$ there exists $f\in F$ such that $\displaystyle W_{k}=\sum_{w\in G}f(w)\xi_{k}(w)$ . ∎

We now know that $W_{2k}$ will be one of finitely many expressions. The next lemma shows that each of these expressions is approximately exponential.

Lemma 6.2.

For any Guvab $\mathcal{G}$ for which $W=0$ and any 1-Lipschitz function $f$ , there exists some $0<\lambda_{f}<1$ and some constant $c_{f}$ such that

\sum_{w\in G}f(w)\xi_{2k}(w)\sim c_{f}\cdot\lambda_{f}^{2k}

unless there exists some $N$ such that for all $k>N$ we have that $\sum_{w\in G}f(w)\xi_{2k}(w)=0$ .

Proof.

Assume that there does not exist any $N$ such that for all $k>N$ we have that $\sum_{w\in G}f(w)\xi_{2k}(w)=0$ .

We know by Lemma 2.10 that for all vertices $w$ , there exist some constants $c^{w}_{i}$ such that for all $k\geq 1$ ,

\xi_{2k}(w)=\sum_{i=1}^{m}c^{w}_{i}\lambda_{i}^{2k}=\sum_{i=1}^{m}c^{w}_{i}(\lambda_{i}^{2})^{k}=\sum_{i=1}^{n}c^{w}_{i}(\lambda_{i}^{2})^{k}

where in the last sum the $\lambda_{i}^{2}$ are all distinct positive constants (by combining like terms in the sum with $m$ terms to get a sum with $n$ terms). Then

\sum_{w\in G}f(w)\xi_{2k}(w)=\sum_{w\in G}f(w)\sum_{i=1}^{n}c^{w}_{i}(\lambda_{i}^{2})^{k}.

Thus, there exist constants $c_{f}^{1},\ldots c_{f}^{n}$ such that $\sum_{w\in G}f(w)\xi_{2k}(w)=\sum_{i=1}^{n}c_{f}^{i}(\lambda_{i}^{2})^{k}.$ Let $\lambda_{f}^{2}=\max_{i,c_{f}^{i}\neq 0}\lambda_{i}^{2}$ (this is well-defined since if it wasn’t well defined we would have $\sum_{w\in G}f(w)\xi_{2k}(w)=0$ for all $k\geq 1$ ). Let $c_{f}$ be the constant corresponding to this $\lambda_{f}^{2}$ . Then

\frac{\sum_{w\in G}f(w)\xi_{2k}(w)}{c_{f}\cdot\lambda_{f}^{2k}}=\frac{\sum_{i=1}^{n}c_{f}^{i}(\lambda_{i}^{2})^{k}}{c_{f}\cdot\lambda_{f}^{2k}}=1+O(c^{2k}),

where $0<c<1$ . Thus we have that

\sum_{w\in G}f(w)\xi_{2k}(w)\sim c_{f}\cdot\lambda_{f}^{2k}.

∎

We now have all the pieces we need to show that $W_{2k}$ is approximately some exponential. The following theorem finishes off the proof.

Theorem 6.3.

For any Guvab $\mathcal{G}$ for which $W=0$ and $\{W_{2k}\}$ is not eventually constant, we have that there exists some $0<\lambda_{\emph{{even}}}<1$ and some $c>0$ , such that $W_{2k}\sim c\cdot\lambda_{\emph{{even}}}^{2k}$ .

Proof.

By Lemma 6.1, there exists some set $F=\{f_{1},\ldots f_{n}\}$ of 1-Lipschitz functions $f_{i}:V(G)\to\mathbb{R}$ such that for all $k$ there exists $f\in F$ such that $\displaystyle W_{2k}=\sum_{w\in G}f(w)\xi_{2k}(w)$ . Furthermore, by Lemma 6.2, for each of these $f\in F$ there exists some $\lambda_{f}$ and some positive constant $c_{f}$ such that $\sum_{w\in G}f(w)\xi_{2k}(w)\sim c_{f}\cdot\lambda_{f}^{2k}$ , unless there exists some $N$ such that for all $k>N$ we have that $\sum_{w\in G}f(w)\xi_{2k}(w)=0$ . If for all $f\in F$ , there exists some $N$ such that for all $k>N$ we have $\sum_{w\in G}f(w)\xi_{2k}(w)=0$ , then we have that $\{W_{2k}\}$ is eventually constant at 0. Otherwise, let $\tilde{F}$ be the set of functions $f$ for which $\lambda_{f}$ is well-defined. Then let $\lambda_{\textrm{even}}$ be $\max_{f\in\tilde{F}}\lambda_{f}$ . Let $F^{\prime}\subset\tilde{F}$ be the set of $f$ such that $\lambda_{f}=\lambda_{\textrm{even}}$ , and let $c$ be $\max_{f\in F^{\prime}}c_{f}$ . Finally, let $\mathcal{F}\subset\tilde{F}$ be the set of $f\in\tilde{F}$ such that $\lambda_{f}=\lambda_{\textrm{even}}$ and $c_{f}=c$ . Then for all $f\in F$ such that $f\notin\mathcal{F}$ , there exists some $N$ such that for all $k\geq N$ we have

\sum_{w\in G}f(w)\xi_{2k}(w)<\max_{f\in\mathcal{F}}\sum_{w\in G}f(w)\xi_{2k}(w)\leq W_{2k}.

Thus, since $W_{2k}$ must be the output of some 1-Lipschitz function, there exists some $N$ such that for all $k\geq N$ we have $W_{2k}=\sum_{w\in G}f(w)\xi_{2k}(w)$ for some $f\in\mathcal{F}$ , since it cannot be the output of any 1-Lipschitz function $f\notin\mathcal{F}$ . However, for all $f\in\mathcal{F}$ , we have that $\sum_{w\in G}f(w)\xi_{2k}(w)\sim c\cdot\lambda_{\textrm{even}}^{2k}$ . Thus for all $k\geq N$ , we have that $W_{2k}\sim c\cdot\lambda_{\textrm{even}}^{2k}$ . Hence $W_{2k}\sim c\cdot\lambda_{\textrm{even}}^{2k}$ . ∎

Remark 6.4.

Analogously, for any Guvab $\mathcal{G}$ for which $W=0$ and $\{W_{2k+1}\}$ is not eventually constant, we have that there exists some $0<\lambda_{\emph{{odd}}}<1$ and some $c>0$ , such that $W_{2k+1}\sim c\cdot\lambda_{\emph{{odd}}}^{2k+1}$ .

We now seek to explicitly characterize all the cases where $W=0$ and $W_{k}$ is eventually constant. We start by understanding why we only need to consider the first few terms of $\{W_{k}\}$ to characterize all of these cases.

Lemma 6.5.

When $\displaystyle\lim_{k\to\infty}W_{k}=0$ , if there exists some $N\geq 0$ such that $\{W(\mu_{k},\nu_{k})\}_{k\geq N}$ is a constant sequence, then $\displaystyle\{W(\mu_{k},\nu_{k})\}_{k\geq 1}$ is also a constant sequence.

Proof.

By Lemma 2.10, if we let the distinct eigenvalues of the transition matrices be $\lambda_{1},\ldots,\lambda_{n}$ , then for any vertex $w$ and for any $k\geq 1$ we can write $(\mu_{k}-\nu_{k})_{w}=\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k}$ for some constants $c^{w}_{1},\ldots,c^{w}_{n}$ . Note that if $\{W(\mu_{k},\nu_{k})\}_{k\geq N}$ is a constant sequence, $0=\lim_{k\to\infty}W_{k}=W(\mu_{k},\nu_{k})$ for all $k\geq N$ . Thus for any vertex $w$ , we will have $\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k}=0$ for all $k\geq N$ .

Suppose that for some $i$ , we have that $c^{w}_{i}$ and $\lambda_{i}$ are nonzero. Then let $\Lambda$ be the set of all $\lambda_{i}$ for which $c^{w}_{i}$ and $\lambda_{i}$ are nonzero. Then let $\lambda_{m}=\max_{\lambda\in\Lambda}|\lambda|$ . If there is only one $\lambda_{i}\in\Lambda$ such that $|\lambda_{i}|=\lambda_{m}$ , then for some $N$ , for all $k>N$ we will have that $|c^{w}_{i}\lambda_{i}^{k}|>\sum_{j\neq i}|c^{w}_{j}\lambda_{j}^{k}|$ so the left-hand-side term will dominate and $\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k}$ will be nonzero. Then $0\neq\lim_{k\to\infty}W_{k}$ . If there is more than one $\lambda\in\Lambda$ such that $|\lambda|=\lambda_{m}$ , then those two $\lambda$ s will be $\lambda_{m}$ and $\lambda_{m^{\prime}}=-\lambda_{m}$ , since those are the only two numbers with absolute value $\lambda_{m}$ . We know that $c^{w}_{m}\lambda_{m}^{k}$ will stay the same sign regardless of $k$ , while $c^{w}_{m^{\prime}}\lambda_{m^{\prime}}^{k}$ will switch sign with parity. Thus, for one of the parities, $c^{w}_{m}\lambda_{m}^{k}$ and $c^{w}_{m^{\prime}}\lambda_{m^{\prime}}^{k}$ will have the same sign. Thus, for some $N$ , either for all even $k>N$ or for all odd $k>N$ , we will have that $|c^{w}_{m}\lambda_{m}^{k}+c^{w}_{m^{\prime}}\lambda_{m^{\prime}}^{k}|>\sum_{j\neq m,m^{\prime}}|c^{w}_{j}\lambda_{j}^{k}|$ , so the left-hand-side term will dominate and $\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k}$ will be nonzero. Then $0\neq\lim_{k\to\infty}W_{k}$ . Thus, we must have for all $1\leq i\leq n$ that either $c^{w}_{i}$ or $\lambda_{i}$ is 0.

This means that for all $1\leq i\leq n$ , either $c_{i}^{w}$ or $\lambda_{i}$ is 0. Thus, for all $k\geq 1$ , we have that $c_{i}^{w}\lambda_{i}^{k}=0$ . Therefore $(\mu_{k}-\nu_{k})_{w}=\sum_{i=1}^{n}c^{w}_{i}\lambda_{i}^{k}=0$ , so $\mu_{k}-\nu_{k}$ will be 0 at all vertices, so $W(\mu_{k},\nu_{k})=0$ for all $k\geq 1$ . ∎

With this lemma established, we proceed to characterize all the cases when the Wasserstein distance is eventually constant in the case where $W=0$ .

Theorem 6.6.

When $\lim_{k\to\infty}W_{k}=0$ , we have that $W_{k}$ is eventually constant if and only if one of the following holds:

•

$\alpha=\beta=0$ and $N(u)=N(v)$ ,
•

$\displaystyle\alpha=\beta=\frac{1}{\deg u+1}$ , the edge $\{u,v\}\in E(G)$ , and if the edge $\{u,v\}$ were removed from $E(G)$ then $u,v$ would have $N(u)=N(v)$ ,
•

$\alpha=\beta$ and $u=v$ .

Proof.

We know by Lemma 6.5 that if $\lim_{k\to\infty}W_{k}=0$ and $W_{k}$ is eventually always 0, then $\mu_{1}=\nu_{1}$ and $\mu_{2}=\nu_{2}$ . Let $\mu_{1}=\nu_{1}$ be $\phi$ . Recall that $P_{\alpha}$ is the transition matrix for $X$ and $P_{\beta}$ is the transition matrix for $Y$ . Further recall that $P_{\alpha}=\alpha I+(1-\alpha)P$ and $P_{\beta}=\beta I+(1-\beta)P$ . Then we have $\phi P_{\alpha}=\phi P_{\beta}$ , so $\phi(\alpha I+(1-\alpha)P)=\phi(\beta I+(1-\beta)P)$ . Then $\phi((\beta-\alpha)I+(\alpha-\beta)P)=0$ .

If $\alpha\neq\beta$ then dividing out by $(\alpha-\beta)$ , we get $\phi P=\phi$ . If such a $\phi$ exists, it must be the stationary distribution $\pi$ . Then $\phi=\mu_{0}P=\mathbbm{1}_{u}P=\pi$ . However, $(\mathbbm{1}_{u}P)_{u}=P_{u,u}=0$ by definition of $P$ , and $\pi_{u}=\frac{\deg(u)}{2|E(G)|}>0$ . Thus, we cannot have $\mathbbm{1}_{u}P=\pi$ , so we cannot have $\alpha\neq\beta$ .

This means we have that $\alpha=\beta$ .

We also know that, given that $\lim_{k\to\infty}W_{k}=0$ , if $\mu_{1}=\nu_{1}$ and $\alpha=\beta$ , then $\mu_{k}=\mu_{1}(P_{\alpha})^{k-1}=\nu_{1}(P_{\alpha})^{k-1}=\nu_{k}$ for all $k\geq 1$ so $W_{k}$ is eventually always 0.

It therefore suffices to characterize the cases where $\lim_{k\to\infty}W_{k}=0$ and $\mu_{1}=\nu_{1}$ and $\alpha=\beta$ . We first note that if $u=v$ and $\alpha=\beta$ , we are done. Otherwise, we assume that $\alpha=\beta$ and casework on the values of $\alpha$ to determine which cases yield $\lim_{k\to\infty}W_{k}=0$ and $\mu_{1}=\nu_{1}$ .

If $\alpha=0$ , we need that $u$ and $v$ have the same neighbor set, since if $u$ had some neighbor $n$ that was not adjacent to $v$ then $\mu_{1}$ would have nonzero mass at $n$ and $\nu_{1}$ would not. We will also show that this is a sufficient condition. If $u$ and $v$ have the same neighbor set then $\deg(u)=\deg(v)$ . For each neighbor $n$ of $u$ and $v$ , we have that $\mu_{1}(n)=\frac{1-\alpha}{\deg(u)}=\frac{1-\beta}{\deg(v)}=\nu_{1}(n)$ and for all other vertices $w$ , we have that $\mu_{1}(w)=\nu_{1}(w)=0$ . Thus $\mu_{1}=\nu_{1}$ . We also know that $\lim_{k\to\infty}W_{k}=0$ by Theorem 3.4 since $\alpha=\beta=0$ and for any neighbor $n$ of $u$ , the path $u\to n\to v$ has an even number of steps.

If $0<\alpha<1$ , we first note that we need $u$ and $v$ to be adjacent, since $\mu_{1}(u)=\alpha>0$ and $\nu_{1}(u)=0$ if $u$ and $v$ are not adjacent. When $u$ and $v$ are adjacent we have that $\nu_{1}(u)=\frac{1-\alpha}{\deg(u)}$ , so since $\nu_{1}(u)=\mu_{1}(u)$ we have $\frac{1-\alpha}{\deg(u)}=\alpha$ , which yields $\alpha=\frac{1}{\deg u+1}$ . We also note that, similarly to before, aside from the edge $\{u,v\}$ we have that $u$ and $v$ need to have the same set of neighbors because if there was some vertex $n\neq u,v$ such that $n\sim u$ and $n\not\sim v$ , then $\mu_{1}$ would have nonzero mass at $n$ and $\nu_{1}$ would not. We will finish by showing that if $\alpha,\beta,u,v$ satisfy these conditions, then $\mu_{1}=\nu_{1}$ and $\lim_{k\to\infty}W_{k}=0$ .

Suppose that the conditions are satisfied. We know that $\deg(u)=\deg(v)$ , so $\mu_{1}(u)=\alpha=\frac{1-\alpha}{\deg(u)}=\nu_{1}(u)$ and similarly $\mu_{1}(v)=\nu_{1}(v)$ . We also know that for all $n\neq u,v$ such that $n\sim u$ and $n\sim v$ , we have that $\mu_{1}(n)=\frac{1-\alpha}{\deg(u)}=\frac{1-\beta}{\deg(v)}=\nu_{1}(n)$ and for all other vertices $w$ , we have that $\mu_{1}(w)=\nu_{1}(w)=0$ . Thus $\mu_{1}=\nu_{1}$ . Also, $\lim_{k\to\infty}W_{k}=0$ by Theorem 3.4 since $0<\alpha\leq\beta<1$ . ∎

7 Convergence when $\beta=1$

We next consider the case of Guvabs where $\beta=1$ . Similarly to the $W=0$ case, we show that the rate of convergence is exponential unless the distance is eventually constant. Furthermore, when the Wasserstein distance is eventually constant, it is constant after exactly 1 step.

We first show that the rate of convergence is exponential unless the distance is eventually constant.

Lemma 7.1.

Consider a Guvab where $\beta=1$ . Either $\{W_{2k}\}$ is eventually constant, or for some $c_{e}$ and some $\lambda_{e}$ , we have that $|W_{2k}-\lim_{k\to\infty}W_{2k}|\sim c_{e}\cdot\lambda_{e}^{2k}$ . Also, either $\{W_{2k+1}\}$ is eventually constant, or for some $c_{o}$ and some $\lambda_{o}$ , we have that $|W_{2k+1}-\lim_{k\to\infty}W_{2k+1}|\sim c_{o}\cdot\lambda_{o}^{2k+1}$ .

Proof.

When $\beta=1$ , we know that $W_{k}=\sum_{w\in G}\mu_{k}(w)\textrm{d}(w,v)=\sum c_{i}\cdot\lambda_{i}^{k}$ for some constants $c_{i}$ and $\lambda_{i}$ . Using the same reasoning as in the proof of Lemma 6.2, we know that (unless $\sum c_{i}\cdot\lambda_{i}^{2k}$ is eventually constant) $\sum c_{i}\cdot\lambda_{i}^{2k}\sim c_{e}\cdot\lambda_{e}^{2k}$ for some $c_{e},\lambda_{e}$ . We also know that (unless $\sum c_{i}\cdot\lambda_{i}^{2k+1}$ is eventually constant) $\sum c_{i}\cdot\lambda_{i}^{2k+1}\sim c_{o}\cdot\lambda_{o}^{2k+1}$ for some $c_{o},\lambda_{o}$ . Thus, we attain the desired result. ∎

We now show that if the distance is eventually constant, it is constant after 1 step.

Lemma 7.2.

When $\beta=1$ , if there exists some $N\geq 0$ such that $\{W(\mu_{n},\nu_{n})\}_{n\geq N}$ is a constant sequence, then $\{W(\mu_{n},\nu_{n})\}_{n\geq 1}$ is also a constant sequence.

Proof.

When $\beta=1$ , we have that $W_{k}$ is $\sum_{w\in G}\mu_{k}(w)\textrm{d}(w,v)=\sum c_{i}\cdot\lambda_{i}^{k}$ for some constants $c_{i}$ and $\lambda_{i}$ . Thus, for similar reasons as in the proof of Lemma 6.5, all the $c_{i}$ for $\lambda_{i}\neq 0,1$ are 0 so $\{W(\mu_{n},\nu_{n})\}_{n\geq 1}$ is constant. ∎

When $W=0$ , using a lemma similar to Lemma 7.2 we were able to explicitly characterize exactly when $W_{k}$ was eventually constant. Lemma 7.2 provides an important step towards making a similar characterization when $\beta=1$ . To exemplify how a characterization could be made when $\beta=1$ , we provide a family of examples of Guvabs where $W_{k}$ is eventually constant.

Definition 7.3.

We define a Gluvab $\mathcal{J}$ to be a Guvab that satisfies all of the following conditions:

•

$\beta=1$ ,
•

$2\textrm{d}(u,v)=\max_{w\in G}\textrm{d}(w,v)$ ,
•

if $\textrm{d}(x,v)=\max_{w\in G}\textrm{d}(w,v)$ , then for all $n\sim x$ we have that $\textrm{d}(n,v)<\textrm{d}(x,v)$ ,
•

if $0<\textrm{d}(x,v)<\max_{w\in G}\textrm{d}(w,v)$ , then for exactly half of the neighbors $n\sim x$ we have that $\textrm{d}(n,v)<\textrm{d}(x,v)$ , and for exactly the other half we have that $\textrm{d}(n,v)>\textrm{d}(x,v)$ .

Example 7.4.

Consider a Guvab with $G=P_{3}$ (where $P_{3}$ is the path graph with $3$ vertices), $v$ is the vertex of $P_{3}$ with degree 2, $u$ is either of the other two vertices, $\alpha=\frac{1}{3}$ , and $\beta=1$ . One can check that this Guvab is a Gluvab.

Lemma 7.5.

Any Gluvab $\mathcal{J}$ satisfies $W_{0}=W_{1}=\cdots$ .

Proof.

We aim to prove this lemma by essentially reducing each Gluvab to a random walk on a path graph. In particular, each vertex $m_{i}$ in the path corresponds to the set of vertices $\{w\in G:d(w,v)=i\}$ at a given distance $i$ from $v$ . After this, the desired result follows without much difficulty.

Construct the Markov chain $M$ that is simply a random walk with laziness $\alpha$ on a path of length $2\textrm{d}(u,v)$ with vertices $m_{0},m_{1},\ldots,m_{\textrm{d}(u,v)},\ldots,m_{2\textrm{d}(u,v)}$ . We let the starting point of this Markov chain be $m_{\textrm{d}(u,v)}$ . It suffices to show that for all $i$ , we have that $\displaystyle\sum_{w\in G,\,\textrm{d}(w,v)=i}\mu_{k}(w)=M_{k}(m_{i})$ , because that would mean that the distribution is always symmetric about $u$ so $\mu_{k}$ always has the same average distance $\textrm{d}(u,v)$ .

We will show by induction on $k$ that for all $i$ ,

\sum_{w\in G,\,\textrm{d}(w,v)=i}\mu_{k}(w)=M_{k}(m_{i}).

Base case: At $k=0$ , we have that $\mu_{k}$ is only nonzero at $u$ and that $M_{k}$ is only nonzero at $m_{\textrm{d}(u,v)}$ , so the claim holds.

Inductive step: We suppose that this claim holds for $k$ . We will show that it holds for $k+1$ . We know the following facts about $M$ :

•

$M_{k+1}(m_{0})=\frac{1-\alpha}{2}M_{k}(m_{1})+\alpha M_{k}(m_{0})$ ,
•

$M_{k+1}(m_{2\textrm{d}(u,v)})=\frac{1-\alpha}{2}M_{k}(m_{2\textrm{d}(u,v)-1})+\alpha M_{k}(m_{2\textrm{d}(u,v)})$ ,
•

$M_{k+1}(m_{1})=\frac{1-\alpha}{2}M_{k}(m_{2})+\alpha M_{k}(m_{1})+(1-\alpha)M_{k}(m_{0})$ ,
•

$M_{k+1}(m_{2\textrm{d}(u,v)-1})=\frac{1-\alpha}{2}M_{k}(m_{2\textrm{d}(u,v)-2})+\alpha M_{k}(m_{2\textrm{d}(u,v)-1})+(1-\alpha)M_{k}(m_{2\textrm{d}(u,v)})$ ,
•

for $1<i<2\textrm{d}(u,v)-1$ , we have that $M_{k+1}(m_{i})=\alpha M_{k}(m_{i})+\frac{1-\alpha}{2}(M_{k}(m_{i-1})+M_{k}(m_{i+1})).$

We now examine $\mu_{k+1}$ , and in particular the amount of mass of $\mu_{k+1}$ at each level. We let $S_{k}(i)$ denote the mass of $\mu_{k}$ at the $i$ th level; in other words,

S_{k}(i)=\displaystyle\sum_{w\in G,\,d(w,v)=i}\mu_{k}(w).

For all $i$ , we can calculate $S_{k+1}(i)$ by considering the $i$ th level and considering how much mass from each level from $S_{k}$ goes to the $i$ th level. This is possible because all vertices at the same level will have indistinguishable behavior with respect to their contribution to the $i$ th level. By calculating the contribution of each different level to the $i$ th level, we can check that

•

$S_{k+1}(0)=\frac{1-\alpha}{2}S_{k}(1)+\alpha S_{k}(0)$ ,
•

$S_{k+1}(2\textrm{d}(u,v))=\frac{1-\alpha}{2}S_{k}(2\textrm{d}(u,v)-1)+\alpha S_{k}(2\textrm{d}(u,v))$ ,
•

$S_{k+1}(1)=\frac{1-\alpha}{2}S_{k}(2)+\alpha S_{k}(1)+(1-\alpha)S_{k}(0)$ ,
•

$S_{k+1}(2\textrm{d}(u,v)-1)=\frac{1-\alpha}{2}S_{k}(2\textrm{d}(u,v)-2)+\alpha S_{k}(2\textrm{d}(u,v)-1)+(1-\alpha)S_{k}(2\textrm{d}(u,v))$ ,
•

for $1<i<2\textrm{d}(u,v)-1$ , we have that $S_{k+1}(i)=\alpha S_{k}(i)+\frac{1-\alpha}{2}(S_{k}(i-1)+S_{k}(i+1)).$

This lines up exactly with our characterization of $M_{k+1}$ , so for all $i$ we have

\displaystyle\sum_{w\in G,\textrm{d}(w,v)=i}\mu_{k+1}(w)=M_{k+1}(m_{i}).

∎

8 Main Convergence Theorems

Since we have shown that all Guvabs have $W=1$ or $W=\frac{1}{2}$ or $W=0$ or $\beta=1$ , and we have some understanding of the rate of convergence of the Wasserstein distance in each of these cases, we make some general statements about convergence that apply to all Guvabs. The following theorems sum up the general convergence results obtained from considering the each of the cases $W=1$ , $W=\frac{1}{2}$ , $W=0$ and $\beta=1$ in the previous sections.

The first theorem states that the rate of convergence of $\{W_{2k}\}$ and $\{W_{2k+1}\}$ is exponential unless it is eventually constant.

Theorem 8.1.

For any Guvab, we have that

•

either $\{W_{2k}\}$ is eventually constant, or there exists a constant $\lambda_{\emph{{even}}}\in(-1,1)$ and a positive constant $c_{\emph{{even}}}>0$ such that $|W_{2k}-\lim_{k\to\infty}W_{2k}|\sim c_{\emph{{even}}}\cdot|\lambda_{\emph{{even}}}|^{2k}$ ,
•

either $\{W_{2k+1}\}$ is eventually constant, or there exists a constant $\lambda_{\emph{{odd}}}\in(-1,1)$ and a positive constant $c_{\emph{{odd}}}>0$ such that $|W_{2k+1}-\lim_{k\to\infty}W_{2k+1}|\sim c_{\emph{{odd}}}\cdot|\lambda_{\emph{{odd}}}|^{2k+1}$

Proof.

To begin, note that when $\beta<1$ , we have that $W_{k}$ converges and $W\in\{1,\frac{1}{2},0\}$ by Corollary 3.12. Further, when $\beta=1$ , Lemma 7.1 implies exactly that the desired result holds. Thus, it suffices to consider each of these cases $W=1$ , $W=\frac{1}{2}$ , and $W=0$ separately.

First, when $W=1,$ Theorem 4.11 implies $\{W_{k}\}$ is eventually constant (and hence, we have the same for $\{W_{2k}\}$ and $\{W_{2k+1}\}$ ). This gives the desired result in the case $W=1$ .

When $W=\frac{1}{2},$ Theorem 5.7 implies that either $\beta=\frac{1}{2}$ and $\{W_{k}\}$ is eventually constant or else $|W_{k}-\lim_{k\to\infty}W_{k}|\sim 0.5\cdot|1-2\beta|^{2k}$ (and hence, we have the same for $\{W_{2k}\}$ and $\{W_{2k+1}\}$ ). This gives the desired result for $W=\frac{1}{2}.$

Finally, we note that when $W=0$ , Theorem 6.3 (and Remark 6.4) gives exactly the desired result. Thus, having checked each case, we conclude the proof. ∎

The second theorem provides a characterization of when $\{W_{k}\}$ is eventually constant when $\beta<1$ .

Theorem 8.2.

When $\beta<1$ , we have that $\{W_{k}\}$ is eventually constant if and only if one of the following holds:

•

$\alpha=\beta=0$ , the graph $G$ is bipartite, and $\textrm{d}(u,v)$ is odd,
•

$\alpha=0$ and $\beta=\frac{1}{2}$ , and $G$ is bipartite,
•

$\alpha=\beta=0$ and $N(u)=N(v)$ ,
•

$\displaystyle\alpha=\beta=\frac{1}{\deg u+1}$ , the edge $\{u,v\}\in E(G)$ , and if the edge $\{u,v\}$ were removed from $E(G)$ then $u,v$ would have $N(u)=N(v)$ ,
•

$\alpha=\beta$ and $u=v$ .

Proof.

To begin, note that when $\beta<1$ , we have that $W_{k}$ converges and $W\in\{1,\frac{1}{2},0\}$ by Corollary 3.12. Thus, it suffices to consider each of these cases where $W=1$ , $W=\frac{1}{2}$ and $W=0$ separately.

First, we look at the case where $W=1$ . Note that, in this case, Theorem 4.11 implies that $\{W_{k}\}$ is always eventually constant. Further, by Theorem 3.9, we see this case is equivalent to $\alpha=\beta=0$ and $W\neq 0$ . Further, by Theorem 3.4, this case occurs exactly when $\alpha=\beta=0$ , the graph $G$ is bipartite, and $\textrm{d}(u,v)$ is odd (i.e., the first item of the theorem statement).

Next, when $W=\frac{1}{2}$ , Corollary 5.8 implies $\{W_{k}\}$ is eventually constant exactly when $\beta=\frac{1}{2}$ . By Theorem 3.9, this case occurs exactly when $\alpha=0$ and $\beta=\frac{1}{2}$ , and $G$ is bipartite (i.e., the second item of the theorem statement).

Finally, when $W=0$ , we see that Theorem 6.6 implies $\{W_{k}\}$ is eventually constant exactly when one of the following holds:

•

$\alpha=\beta=0$ and $N(u)=N(v)$ ,
•

$\displaystyle\alpha=\beta=\frac{1}{\deg u+1}$ , the edge $\{u,v\}\in E(G)$ , and if the edge $\{u,v\}$ were removed from $E(G)$ then $u,v$ would have $N(u)=N(v)$ ,
•

$\alpha=\beta$ and $u=v$ .

Note that, each of these cases is indeed a case where $W=0$ by Theorem 3.4, so this case is equivalent to the final three items of the theorem statement.

Thus, considering each of these cases together, we obtain the desired result. ∎

9 Open Problems

The theorems presented in this paper open up several new questions and directions for further research, which the reader is invited to consider. Specifically, given Theorem 8.1, the remaining questions regarding the behavior of Guvabs can be broken into three main categories: 1) determining when $\{W_{2k}\}$ and $\{W_{2k+1}\}$ are eventually constant, 2) in cases $\{W_{2k}\}$ and $\{W_{2k+1}\}$ are eventually constant, determining how long they take to become constant, and 3) determining $c$ and $\lambda$ when $\{W_{2k}\}$ and $\{W_{2k+1}\}$ are not eventually constant. In this section, we break down what we have shown and what is left to be done regarding each of these questions.

By Theorem 8.2, we have characterized the cases where $\{W_{k}\}$ is constant in all cases where $\beta<1$ . Furthermore, in the cases of $W=1$ and $W=\frac{1}{2}$ , we know that $\{W_{2k}\}$ is eventually constant if and only if $\{W_{k}\}$ is eventually constant, and similarly $\{W_{2k+1}\}$ is eventually constant if and only if $\{W_{k}\}$ is eventually constant. In the case of $W=0$ , it remains to characterize the cases where either $\{W_{2k}\}$ or $\{W_{2k+1}\}$ individually are eventually constant, but $\{W_{k}\}$ is not. Further, in the $\beta=1$ case we lack a complete characterization of when $\{W_{k}\}$ is eventually constant.

Question $2)$ remains largely unanswered and is a promising direction for future work. The progress so far in this paper is restricted to fairly weak upper and lower bounds when $W=1$ , and characterizations of when $\{W_{k}\}$ is eventually constant when $W=0$ and $\beta=1$ . One interesting problem is that of tighter bounds for the case where $W=1$ , and similar bounds for the case when $W=\frac{1}{2}$ and $W$ is eventually constant. Also, depending on the answers to Question 1, there may be Guvabs where only one of $\{W_{2k}\}$ and $\{W_{2k+1}\}$ is eventually constant. If we find a specific Guvab that satisfies these criteria, it will be interesting to determine how long this Guvab takes to have either $\{W_{2k}\}$ or $\{W_{2k+1}\}$ be eventually constant.

Answering question $3)$ will require specific knowledge of eigenvectors and eigenvalues. In full generality, this is difficult, so a potential direction for future work would be addressing it in specific examples.

10 Acknowledgements

We would like to thank our mentor, Pakawut Jiradilok, for providing us with important knowledge, guidance, and assistance throughout our project. We would also like to thank Supanat Kamtue for the problem idea and helpful thoughts and guidance. Finally, we would like to thank the PRIMES-USA program for making this project possible.

References

[BCL⁺18] David P Bourne, David Cushing, Shiping Liu, F Münch, and Norbert Peyerimhoff. Ollivier–ricci idleness functions of graphs. SIAM J. Discrete Math., 32(2):1408–1424, 2018.
[CK19] David Cushing and Supanat Kamtue. Long-scale ollivier ricci curvature of graphs:. Anal. Geom. Metr. Spaces, 7(1):22–44, 2019.
[CKK⁺20] David Cushing, Supanat Kamtue, Jack Koolen, Shiping Liu, Florentin Münch, and Norbert Peyerimhoff. Rigidity of the bonnet-myers inequality for graphs with respect to ollivier ricci curvature. Adv Math, 369:107188, 2020.
[DS91] Persi Diaconis and Daniel Stroock. Geometric bounds for eigenvalues of markov chains. Ann Appl Probab, pages 36–61, 1991.
[FZM⁺15] Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, and Tomaso Poggio. Learning with a wasserstein loss. arXiv preprint arXiv:1506.05439, 2015.
[JK21] Pakawut Jiradilok and Supanat Kamtue. Transportation distance between probability measures on the infinite regular tree. arXiv preprint arXiv:2107.09876, 2021.
[Kan06] Leonid V Kantorovich. On the translocation of masses. J. Math. Sci., 133(4):1381–1382, 2006.
[LLY11] Yong Lin, Linyuan Lu, and Shing-Tung Yau. Ricci curvature of graphs. Tohoku Math. J., 63(4):605–627, 2011.
[LP17] David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
[Oll09] Yann Ollivier. Ricci curvature of markov chains on metric spaces. J Funct Anal, 256(3):810–864, 2009.
[Oll11] Yann Ollivier. A visual introduction to riemannian curvatures and some discrete generalizations. Anal. Geom. Metr. Spaces, 56:197–219, 2011.
[PC19] Gabriel Peyré and Marco Cuturi. Computational optimal transport. Found. Trends Mach. Learn., 11(5-6):355–607, 2019.
[RTG00] Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis., 40(2):99–121, 2000.
[SGR⁺15] Romeil Sandhu, Tryphon Georgiou, Ed Reznik, Liangjia Zhu, Ivan Kolesov, Yasin Senbabaoglu, and Allen Tannenbaum. Graph curvature for differentiating cancer networks. Sci. Rep., 5(1):1–13, 2015.
[SGT16] Romeil S Sandhu, Tryphon T Georgiou, and Allen R Tannenbaum. Ricci curvature: An economic indicator for market fragility and systemic risk. Sci. Adv., 2(5):e1501495, 2016.
[Sin92] Alistair Sinclair. Improved bounds for mixing rates of markov chains and multicommodity flow. Comb. Probab. Comput., 1:351–370, 1992.
[SJB19] Jayson Sia, Edmond Jonckheere, and Paul Bogdan. Ollivier-ricci curvature-based method to community detection in complex networks. Sci. Rep., 9(1):1–12, 2019.
[vdHCL⁺21] Pim van der Hoorn, William J Cunningham, Gabor Lippner, Carlo Trugenberger, and Dmitri Krioukov. Ollivier-ricci curvature convergence in random geometric graphs. Phys. Rev. Res., 3(1):013211, 2021.
[WJB16] Chi Wang, Edmond Jonckheere, and Reza Banirazi. Interference constrained network control based on curvature. In 2016 American Control Conference (ACC), pages 6036–6041. IEEE, 2016.
[WX21] JunJie Wee and Kelin Xia. Ollivier persistent ricci curvature-based machine learning for the protein–ligand binding affinity prediction. J Chem Inf Model, 61(4):1617–1626, 2021.

	$\displaystyle\displaystyle\rho_{\textrm{even}}$	$\displaystyle\leq\frac{2}{1-\lambda_{\max}^{2}}(\ln{\|E\|}+\ln{2\|V\|\|E\|})=\frac{2}{1-\lambda_{\max}^{2}}(\ln{2\|V\|\|E\|^{2}})$
		$\displaystyle\leq\frac{2}{1-\lambda_{\max}^{2}}(\ln{\|V\|^{3}(\|V\|-1)^{2}})$
		$\displaystyle<\frac{2}{1-\lambda_{\max}^{2}}\cdot 5\ln(\|V\|)=\frac{10\ln\|V\|}{1-\lambda_{\max}^{2}}.$

On the Wasserstein Distance Between kk-Step Probability Measures on Finite Graphs

Abstract

1 Introduction

2 Preliminaries

Definition 2.1.

Definition 2.2 (c.f. [PC19]).

Definition 2.3 (c.f. [PC19]).

Definition 2.4 (c.f. [PC19]).

Remark 2.5.

Definition 2.6 (c.f. [PC19]).

Theorem 2.7 (Kantorovich Duality, c.f. [PC19]).

Definition 2.8.

Definition 2.9.

Lemma 2.10.

Proof.

Theorem 2.11 (c.f. [LP17]).

Definition 2.12.

3 Classifying End Behavior of WkW_{k}

Lemma 3.1.

Proof.

Definition 3.2.

Lemma 3.3.

Proof.

Theorem 3.4.

Proof.

Lemma 3.5.

Proof.

Corollary 3.6.

Proof.

Corollary 3.7.

Proof.

Corollary 3.8.

Proof.

Theorem 3.9.

Proof.

Theorem 3.10.

Proof.

Remark 3.11.

Corollary 3.12.

Proof.

4 Convergence when W=1W=1

Definition 4.1.

Lemma 4.2.

Proof.

Lemma 4.3.

Proof.

Definition 4.4.

Lemma 4.5.

Proof.

Corollary 4.6.

Proof.

Corollary 4.7.

Proof.

Lemma 4.8.

Proof.

Corollary 4.9.

Proof.

Definition 4.10.

Theorem 4.11.

Proof.

Lemma 4.12.

Proof.

Lemma 4.13.

Proof.

Lemma 4.14.

Proof.

5 Convergence when W=12W=\frac{1}{2}

Lemma 5.1.

Proof.

Corollary 5.2.

Proof.

Corollary 5.3.

Proof.

Definition 5.4.

Lemma 5.5.

Proof.

Lemma 5.6.

Proof.

Theorem 5.7.

Proof.

On the Wasserstein Distance Between $k$ -Step Probability Measures on Finite Graphs

3 Classifying End Behavior of $W_{k}$

4 Convergence when $W=1$

5 Convergence when $W=\frac{1}{2}$

6 Convergence when $W=0$

7 Convergence when $\beta=1$