Convergence of a particle method for gradient flows on the -Wasserstein space
Abstract
We study the particle method to approximate the gradient flow on the -Wasserstein space. This method relies on the discretization of the energy introduced by CPSW via nonoverlapping balls centered at the particles and preserves the gradient flow structure at the particle level. We prove the convergence of the discrete gradient flow to the continuum gradient flow on the -Wasserstein space over , specifically to the doubly nonlinear diffusion equation in one dimension.
1 Introduction
In 1998, Jordan, Kinderlehrer and Otto JKO found that entropy is the natural free energy functional in the study of Fokker-Planck equations in non-equilibrium statistical mechanics and proved the Fokker-Planck equation can be viewed as a steepest descent for the associated free energy with respect to the -Wasserstein metric by using the minimizing movements method introduced by De Giorgi DeG . In 2001, Otto Otto2001 established the infinite dimensional Riemannian geometric structure on the -Wasserstein space and proved that the heat equation and the porous medium equation (or the fast diffusion equation) on Euclidean space are the gradient flows of the Boltzmann entropy and the Rényi entropy, respectively, on the -Wasserstein space with the infinite dimensional Riemannian metric introduced in Otto2001 . More generally, the gradient flow of energy on the -Wasserstein space has the form of continuity equation with the vector field given by , where is the -derivative of . See e.g. AGS ; V1 ; V2 .
Using De Giorgi’s minimizing movements method, a differentiable structure on the underlying space is not required to define the gradient flow, allowing it to be established on general metric spaces (see AGS ). Consequently, gradient flows can be defined on the -Wasserstein space—the space of all probability measures with finite -th moment, equipped with the -distance. A notable example of the gradient flow on the -Wasserstein space is the following Leibenson’s equation (see Leib ; IMJ ; Vazquez )
| (1) |
which describe the filtration of turbulent compressible fluid through a porous medium, where and the parameter characterizes the turbulence of the flow while is the index of polytropy of the fluid which determines the relation between volume and pressure . Equation (1) is a so-called doubly nonlinear diffusion equation. Particularly, when , it becomes the -heat equation
| (2) |
where . In general, the gradient flow of on -Wasserstein space is formally given by
| (3) |
For the rigorous definition, see Section 2. In view of this, the energy corresponding to (1) is given by
| (4) |
The discrete approximation to the doubly nonlinear degenerate parabolic equations such as (1) is intensively studied. For example, see EH ; ABK and the reference therein. See also DEGH for the gradient discretization method based on some discrete spaces and mappings for nonlinear and nonlocal parabolic equations. In RMS , Rossi-Mielke-Savaré studied the approximation scheme by time discretization to approach the abstract doubly nonlinear equations in reflexive Banach spaces. On the other hand, in the perspective of gradient flows, Serfaty Serfaty proved the -convergence of gradient flows on metric spaces. By using this result, Carrillo-Patacchini-Sternberg-Wolansky CPSW proved the convergence of a particle method to approximate the solutions to the continuum gradient flow on the -Wasserstein space, which restrict the continuum gradient flow to the discrete setting of atomic measures, while keeping the gradient flow structure at the discrete level via a suitable approximation of the energy on finite numbers of Dirac masses. The numerical study of this method is also discussed by Carrillo-Huang-Patacchini-Wolansky in CHPW . The aim of this paper is to extend this particle method to the approximation of the gradient flows on the -Wasserstein space.
Denote either the closure of a bounded connected domain of or itself (we simply write when ). Let us consider the energy functional is given by
| (5) |
where denotes the space of all probability measures on with density functions with respect to the Lebesgue measure and with finite -th moment where . Moreover, we assume that satisfies the following hypothesis:
Hypothesis 1.1
Assume is a proper, convex, non-negative function in with superlinear growth at infinity and . Moreover,
-
•
satisfies the doubling condition: there exists a constant such that
-
•
The function is strictly convex and non-increasing on .
Note that the property of superlinear growth at infinity implies that is lower semi-continuous with respect to the narrow convergence, see (AGS, , Remark 9.3.8). The assumption that and is convex and non-increasing implies that the energy is displacement convex, which is introduced by McCann McCann1997 , see also AGS ; V1 ; V2 .
Hypothesis 1.2
for all and there exists a continuous function such that and
The rest of this paper is organized as follows. In Section 2, we introduce some notations and give the necessary background to introduce the continuum gradient flow and discrete gradient flow. In Section 3, we state the main result and then prove it. In Section 4, we prove the -convergence of the discrete energy.
2 Notations and Preliminaries
We first recall some basic knowledge about the Wasserstein space and the curve of maximal slope which is used to define the gradient flow.
Let be a Polish space. For , the space of all probability measures on with finite -th moment is denoted by . The (-th) Wasserstein distance between two probability measures is given by
where
We call the -Wasserstein space (or briefly space). Moreover, when the probability measures have smooth densities with respect to some reference measure , we denote the space by . Note that is a complete metric space if is complete. For more properties of the Wasserstein space, one can refer to AGS ; V1 ; V2 .
For the case of , Otto Otto2001 introduced the tangent space and the Riemannian metric on the -Wasserstein space, which make the -Wasserstein space becomes an infinite dimensional Riemannian manifold. In general, for , the tangent space of can be defined as
where is the conjugate exponent of , i.e., . That is the tangent space of can be identified as (see AGS )
where is defined by
Let be the energy given by (5). If is regular enough, in view of the definition of the tangent space, the gradient flow on can be defined by a solution to
That is the gradient flow is given by (3). In general, the gradient flow can be defined with the notion of subdifferential. See Definition 7 below. On the other hand, as mentioned in CPSW , the gradient flow formulation given in (3) is not the one allows the use of (Serfaty, , Theorem 2), which is the key tool of our proof. Thus we employ a more weaker notation of gradient flows which only uses a metric structure. The notion replacing gradient flows is then that of “curves of maximal slope”, which was introduced in DeGMT . We follow here the self-contained presentation in AGS . In the following, let be a complete metric space. denotes a proper functional from to , i.e.
and denotes a bounded subinterval of .
Definition 1 (Absolute continuity)
We say that is a -absolutely continuous curve if there exists such that
In this case we denote and if .
Definition 2 (Metric derivative)
Let be a -absolutely continuous curve, then
exists for almost every and is called the metric derivative of . Moreover, is the smallest admissible function in Definition 1.
Note that is equivalent to .
Definition 3 (Strong upper gradient)
We call a strong upper gradient for if for every , we have that is a Borel function and
A candidate to be an upper gradient of is its slope:
Definition 4 (Local slope)
We define the local slope of at by
where the subscript denotes the positive part.
If is -geodesically convex for some and lower semi-continuous, then the local slope is a strong upper gradient for . See (AGS, , Corollary 2.4.10). The energy functional (5) satisfying Hypothesis 1.1 is -geodesically convex on the -Wasserstein space. Thus is a strong upper gradient for .
Definition 5 (Curve of maximal slope)
Let be a strong upper gradient for . We say that is a -curve of maximal slope for with respect to , if is almost everywhere equal to a non-increasing function and
where is the conjugate exponent of .
Remark 1
When is a -curve of maximal slope for a strong upper gradient , we have , , for all , and for almost every (see (AGS, , Remark 1.3.3)).
2.1 Continuum gradient flow
Let the energy functional satisfy Hypothesis 1.1. We now define the continuum gradient flow on .
Definition 6 (Continuum gradient flow)
We say that is a continuum gradient flow solution with initial condition if it is a -curve of maximal slope for with respect to and .
We recall another common way of defining a continuum gradient flow, which involves the notion of subdifferential. For the subdifferential calculus on the -Wasserstein space, see (AGS, , Section 10.3).
Definition 7
We say that is a solution to the gradient flow, if there exists a Borel vector field such that for -a.e. , , the continuity equation
holds in the sense of distribution, and
where means the subdifferential of at .
The energy functional satisfying Hypothesis 1.1 is regular in the sense of the Definition 10.3.9 of AGS . By (AGS, , Theorem 11.1.3), the definition of curves of maximal slope (Definition 5) coincides with the gradient flow defined by Definition 7 on the -Wasserstein space. The displacement convexity and lower semi-continuity of imply the existence of such gradient flows (see (AGS, , Theorem 11.3.2)). Moreover, the tangent vector to satisfies the minimal selection principle, i.e.,
where denotes the subset of elements of minimal norm in , which reduces to a single point since the -norm is strictly convex if .
2.2 Discrete gradient flow
Following the particle method used in CPSW to approximate the continuum gradient flow, we take any particles in , , and we denote . Define the set of empirical measures by
Let be the open balls of centre with radius , where is the standard Euclidean distance between and on . Define
where is the volume of with respect to the Lebesgue measure on .
Definition 8 (Discrete energy)
We define the discrete energy for all with particles by
| (6) |
We define the discrete energy equivalently as a function of by
Now we introduce the weighted -norm on . That is, for any , for ,
Then becomes a Banach space with the dual space , where . Moreover, the pair of and is given by
Now we define the discrete gradient flow. As mentioned in CPSW , to obtain the well-posedness of the discrete gradient flow, we restrict the framework to the case of . In this case, the discrete energy is convex on , which makes sure that the local slope is a strong upper gradient.
By convention, in the rest of the paper, whenever particles are considered, they are assumed to be distinct and sorted increasingly, i.e., for all . We also assume the same boundary condition as in CPSW , which construct two fictitious particles and to make sure the real particles stay in . We denote for .
Definition 9 (Discrete gradient flow)
We say that is a discrete gradient flow solution with initial condition , if it is a -curve of maximal slope for with respect to , and if .
By (CPSW, , Proposition 2.13), it is equivalent to define the discrete gradient flow as a solution to
| (7) |
where and stands for the subdifferential of :
Moreover, by (AGS, , Proposition 1.4.4, Corollary 2.4.12), we have the following
Proposition 1
There exists a solution to the discrete gradient flow inclusion (7). Furthermore, any solution satisfies for almost every .
3 main results
Before stating the main result, we recall some definitions introduced in CPSW .
Definition 10 (Smooth set)
We define the subset of as follows. We write if there exists such that all the items below hold:
-
•
; ; ;
-
•
if , then .
Definition 11 (recovery sequence and well-preparedness)
Let . Any with for all such that narrowly as and is said to be a recovery sequence for . Let be the particles of . We say that is well-prepared for if it is a recovery sequence for and there exist such that for all and all ; if , we moreover require .
Now we state our main result.
Theorem 3.1 (Main theorem)
Assume satisfies Hypothesis 1.1. Suppose , with particles , is a discrete gradient flow solution with initial condition , with particles . Let , and assume that is well-prepared for according to Definition 11. Then is tight and there exists a subsequence and a probability measure such that
as for all . Moreover, if and satisfies Hypothesis 1.2, then is a continuum gradient flow and it holds111When we write the metric derivative with respect to the -distance, we omit the subscript.
| (8) |
In particular, assuming . If or is given by (4) with , then narrowly converges to for all and (8) holds for whole sequence and .
Remark 2
The first part of this theorem is the tightness of which will be proved in Proposition 2. Then by Prohorov’s theorem, is narrowly sequentially compact. Using Theorem 3.2, we prove that the sequential limit is a solution to the continuum gradient flow. Moreover, if we know the uniqueness of the solution to the continuum gradient flow, which is established for the case of (see (AGS, , Theorem 11.1.4)), -heat equation (2) (see (Vazquez, , Chapter 11) and Kell ) and the Leibenson’s equation (1) (see IMJ ), then we can obtain that the whole sequence of narrowly converges to and (8) holds for whole sequence and .
Remark 3
To prove this theorem, we use the following theorem proved by Serfaty in Serfaty , which is the -convergence of gradient flow on metric spaces.
Theorem 3.2 (Serfaty )
Let be a discrete gradient flow. Assume that narrowly as for all for some . Furthermore, suppose that is a recovery sequence for , and that the following conditions hold for all .
-
(C1)
.
-
(C2)
.
-
(C3)
.
Then is a continuum gradient flow and (8) holds for and .
3.1 Tightness and condition on the metric derivatives
Now we prove the first part of the Main theorem.
Proposition 2
Proof
For any , let be an absolutely continuous curve on with velocity . Then the differentiability of gives
where is an optimal transport plan form to . Applying the Hölder’s inequality, it holds
Let be a -curve of maximal slope for with respect to . Then
By (AGS, , Remark 2.4.17), we have the following estimate
Since is a strong upper gradient of , it holds
Thus we have
Then we can derive
In particular, we choose and , then
Let be a gradient flow of with initial value . Then
Assuming there exists a constant such that , then
Note that
where denotes the -th moment. Assume there exists a constant such that , and we have
| (9) |
That is have bounded -th moment () uniformly in and , then the Chebyshev’s inequality gives the uniformly integrability of , which imply the tightness of . By Prohorov’s theorem, there exist a subsequence and such that narrowly converges to as for all .
Now we show that is actually in . By (AGS, , Theorem 11.3.2),
which means the metric derivative is bounded in , thus it is -weakly convergent to some up to a subsequence (still denoted by ). In particular, we can choose the test function by the characteristic function , then we have
| (10) |
Note that is -absolutely continuous, by definition of the metric derivative,
Then, by (10) and the narrow lower semi-continuity of (see (AGS, , Proposition 7.1.3)),
Therefore . Moreover, for almost every . By the weak lower semi-continuity of the -norm, this gives
which is .
3.2 Condition on the energy
Now we verify the “lower semi-continuity” conditions on the energies and the slopes of the energies.
Proof
From now on, we denote the subsequence in Theorem 3.1 by . By definition, . Since is narrowly lower semi-continuous, we need to prove narrowly converges to as . By tightness, narrowly, thus we only need to prove narrowly as . By the density of Lipschitz function in , we only test against with Lipschitz constant . Compute
By (9), we have
The last step is to verify the condition on the local slopes. In this section, we denote and , and we take . In this case, the local slope of is given in the lemma below.
Lemma 1
Now we verify the condition .
Proposition 4
First, we compute explicitly the local slope of . Denote for . By the definition of local slope, we have
We use the notation and strategy in CPSW to describe whether the closest neighbour to that particle is to the right and to characterize . Given , we write if
for all , with the convention that and .
Lemma 2
Take . We have
where for all .
Proof
Denote for . We have
where the function is a smooth convex and non-increasing function on . Moreover, one can check that is Lipschitz continuous around . Thus we can use the chain rule of subdifferential (see (Mor, , Theorem 1.110)) and obtain
which gives the conclusion.
By the same method in CPSW and going through each case of the triplets , we have the following Lemma 3, Lemma 4 and Lemma 5. We assume the same boundary condition as in CPSW . Since the proofs of these lemmas can be modified using the same strategy as in CPSW , we omit details here.
Lemma 3
Let . Then .
Lemma 4
Let be as assumed in Theorem 3.1. Then for all .
Lemma 5
The following lemma is the key to prove the convergence in .
Lemma 6
Suppose that is finite for all . Then
| (11) |
Proof
Proof (Proof of Proposition 4)
We omit the time dependence. First we define the interpolation to approximate . Using the same method and notations as in CPSW , we introduce the function by
Clearly . For , we introduce the monotone function by
Obviously , . Since is strictly increasing, is strictly decreasing and therefore invertible222Since the function in CPSW is also required to be invertible, the strictly convexity of function in CPSW is also necessary. . Define
| (12) |
where is the normalization constant to make belong to . One can check that narrowly convergent to as for all . By the monotonicity of and , we have
| (13) |
By Lemma 5, this yields
Now the proof reduces to show that gives a good estimate of and , that is
| (14) |
where is the sequence associated to defined as in (12). The second inequality above is due to narrowly and the narrow lower semi-continuity of , see (AGS, , Corollary 2.4.10). Now we check the first inequality. Let us denote . Noticing that , we have
By , it holds
By Lemma 6, we have as uniformly for . Thus for any , there exists large enough such that for all and . For such we obtain
By taking the limits and in this order, we get
| (15) |
In order to prove (14), we only need to show that . Compute
where is as in Hypothesis 1.2. Since narrowly as for all , we have as . This completes the proof.
4 -convergence of the discrete energy.
We show that the discrete energy is -convergent to the continuum energy with respect to the -distance.
Definition 12 (-convergence)
We say that the discrete energy is -convergent to the continuum energy with respect to -distance if the following two conditions hold for all :
-
(i)
(”liminf” condition) All sequences with such that as satisfy .
-
(ii)
(”limsup” condition) There exists a recovery sequence with respect to for .
To obtain the -convergence, we require that satisfies the following additional condition: there exist continuous functions such that and , and
| (16) |
This is still satisfied by typical energy such as (4).
Proof
We follow the same strategy as in CPSW . The ”liminf” condition can be obtained from proved in Proposition 3. To prove the ”limsup” condition, we need to find a recovery sequence for any with respect to the -distance. This is done by two steps. First, the recovery sequence is constructed for any , and then relax this assumption on and prove the general result for any by a density argument. Here we do the construction of the recovery sequence for as in (CPSW, , Lemma 5.5), which replies on the pseudo-inverse of the distribution function of . And then by the same argument as in (CPSW, , Lemma 6.6), we can extend this result to any . We omit details here.
Acknowledgements.
The author is supported by JSPS Grant-in-Aid for Transformative Research Areas(B) No. 23H03798. The author would also like to thank Professor Jun Masamune for giving the support and encouragement.References
- (1) Ambrosio, L, Gigli, N, Savaré, G.: Gradient Flows in Metric Spaces and in the Space of Probability Measures. Birkhäuser Basel, 2005.
- (2) Andreianov, B., Bendahmane, M., Karlsen, K. H.: Discrete duality finite volume schemes for doubly nonlinear degenerate hyperbolic-parabolic equations. J. Hyperbolic Differ. Equ. 7(01), 1-67 (2010)
- (3) Carrillo, J. A., Patacchini, F. S., Sternberg, P., Wolansky, G.: Convergence of a particle method for diffusive gradient flows in one dimension. SIAM J. Math. Anal. 48(6), 3708-3741 (2016)
- (4) Carrillo, J. A., Huang, Y., Patacchini, F. S., Wolansky, G.: Numerical study of a particle method for gradient flows. arXiv preprint arXiv:1512.03029 (2015)
- (5) Droniou, J., Eymard, R., Gallouet, T., Herbin, R.: Gradient schemes: a generic framework for the discretisation of linear, nonlinear and nonlocal elliptic and parabolic equations. Math. Models Methods Appl. Sci. 23(13), 2395-2432 (2013)
- (6) De Giorgi, E.: New problems on minimizing movements. Boundary value problems for partial differential equations and applications, 81–98, RMA Res. Notes Appl. Math., 29, Masson, Paris, 1993.
- (7) De Giorgi, E., Marino, A., Tosques, M.: Problems of evolution in metric spaces and maximal decreasing curve. Att Accad Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. (8)68, 180-187 (1980)
- (8) Evje, S., Hvistendahl Karlsen, K.: Discrete approximations of BV solutions to doubly nonlinear degenerate parabolic equations. Numer. Math. 86(3), 377-417 (2000)
- (9) Ivanov, A.V., Mkrtychyan, P.Z., Jäger, W.: Existence and uniqueness of a regular solution of the Cauchy–Dirichlet problem for a class of doubly nonlinear parabolic equations. J. Math. Sci. 1(84), 845–855 (1997)
- (10) Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1-17 (1998)
- (11) Kell, M.: -heat flow and the gradient flow of the Rényi entropy in the -Wasserstein space. J. Funct. Anal. 271(8), 2045-2089 (2016)
- (12) Leibenson, L.: General problem of the movement of a compressible fluid in a porous medium. izv akad. nauk sssr. Geography and Geophysics, 9, 7–10 (1945)
- (13) McCann, R. J.: A convexity principle for interacting gases, Adv. Math. 128(1), 153–179 (1997)
- (14) Mordukhovich, B. S.: Variational Analysis and Generalized Differentiation I: Basic Theory, Grundlehren Math. Wiss. 330, Springer, Berlin, Heidelberg, 2006.
- (15) Otto, F.: The geometry of dissipative evolution equations: The porous medium equation, Commun. Partial Differ. Equ. 26(1-2), 101–174 (2001)
- (16) Rossi, R., Mielke, A., Savaré, G.: A metric approach to a class of doubly nonlinear evolution equations and applications. Annali della Scuola Normale Superiore di Pisa-Classe di Scienze. 7(1), 97-169 (2008)
- (17) Serfaty, S.: Gamma-convergence of gradient flows on Hilbert and metric spaces and applications. Discrete Contin. Dyn. Syst. 31(4), 1427–1451 (2011)
- (18) Vázquez, J.L.: Smoothing and Decay Estimates for Nonlinear Diffusion Equations, in: Equations of Porous Medium Type, in: Oxford Lecture Series in Mathematics and Its Applications, vol. 33, Oxford University Press, Oxford (2006)
- (19) Villani, C.: Topics in Mass Transportation, Graduate Studies in Mathematics, American Mathematical Society, Providence, 2003.
- (20) Villani, C.: Optimal Transport, Old and New, Springer, Berlin, 2008.