Propagation of chaos in path spaces via information theory
Abstract
Propagation of chaos for interacting particle systems has been an active research topic over decades. We propose an alternative approach to study the mean-field limit of the stochastic interacting particle systems via tools from information theory. In our framework, the propagation of chaos is reduced to the space for driving processes with possible lower dimension. Indeed, after applying the data processing inequality, one only needs to estimate the difference between the drifts of the particle system and the mean-field Mckean stochastic differential equation. This point is particularly useful in situations where the discrepancy in the driving processes is more apparent than the investigated processes. We will take the second order system as well as other examples for the illustration of how our framework could be used. This approach allows us to focus on probability measures in path spaces for the driving processes, avoiding using the usual hypocoercivity technique or taking the pseudo-inverse of the diffusion matrix, which might be more stable for numerical computation. Our framework is different from current approaches in literature and could provide new insight into the study of interacting particle systems.
keywords:
mean-field limit, interacting particle systems, relative entropy, data processing inequality, Girsanov theorem.35Q70; 60J60; 82C22
1 Introduction
The interacting particle system, mostly built upon basic physical laws including Newton’s second law, has received growing popularity recent years in the study of both natural and social sciences. Practical application of such large-scale interacting particle systems includes groups of birds [11], consensus clusters in opinion dynamics [41], chemotaxis of bacteria [23], etc. Despite its strong applicability, the theoretical analysis and practical computation for the interacting particle system is rather complicated, mainly due to the fact that the particle number is very large in many practical settings. One classical strategy to reduce this complexity is to study instead the “mean-field” regime. The limiting partial differential equation (mean-field equation) is used to describe the behavior of the particle system as . This approximation allows one to obtain a one-body model instead of the original many-body one. For instance, Jeans proposed a mean-field equation to study the galactic dynamics in 1915 [28]. Much work has been done to study the mean-field behaviors of various kinds of interacting particle systems [15, 33, 39, 18, 43] in the past decades.
Here, let us take the second order system as the example to explain the concepts of mean field limit and propagation of chaos. The second-order system is described by Newton’s second law for point particles driven by 2-body interaction forces and Brownian motions, satisfying the following system of stochastic differential equations (SDE):
(1.1) |
where and represent the mass and friction coefficient respectively, . The processes () are independent Brownian motions in , and : is the interaction kernel. We assume that the initial data are drawn from some initial law independent of the Brownian motions. Denote , and the corresponding joint law
(1.2) |
where denotes the probability measure space on Then, the evolution of the density satisfies a Liouville’s equation [16, 17]:
(1.3) |
with Note that the matrix is defined by . Here, “” means the Hilbert-Schmidt inner product so that . As the particle number tends to infinity, the correlation between any two focused particles through the weak interaction is expected to vanish. Hence, if two particles are initially independent, then they are expected to be independent as at any fixed time point . This is the so-called propagation of chaos. Due to the asymptotic independence, a fixed particle with position and velocity is then expected to satisfy the following mean field Mckean SDE system:
(1.4) |
where is the law, and is its marginal. The law is then expected to satisfy the following mean field kinetic Fokker-Planck equation [24, 25]:
(1.5) |
Rigorous justification of this mean limit, or the propagation of chaos, has then become an active research topic.
The prevalent method in analyzing mean-field limits is based on Dobrushin’s Estimate, which is proposed in 1979 by Dobrushin etc. [13], to study the stability of the mean-field characteristic flow in terms of Wasserstein distances. Dobrushin-type analysis has now been a classical tool in mean-field limits for Valsov-type equations during these decades. Based on Dobrushin-type analysis, one can then prove the mean-field limit for the deterministic system in a finite time interval in terms of Wasserstein distances [3, 44, 18]. Another way is to compare the stochastic trajectories through certain coupling technique. By considering trajectory controls, the mean-field limit for stochastic systems with Lipschitz kernel has been established [50, 19, 21].
Another class of methods is to compare the laws directly. What has become popular recently on chaos qualification is given by the analysis of relative entropy (also called Kullback-Leibler divergence, KL-divergence) between and tensorized product of , for . The analysis could also be performed on the laws on path space with and being their time marginals. Some early results in path space using the relative entropy have been achieved in the last century (e.g. [2, 1]). For time marginal distributions, Jabin et. al. proved the propagation of chaos for Vlasov-type systems with bound, assuming the interaction kernel is bounded, and the propagation of chaos for first order systems with singular kernels [26]. For results in path space, Lacker obtained the propagation of chaos relying on Girsanov’s and Sanov’s theorem [30] and the BBGKY hierarchy [31, 32]. The approach in [31, 32] yields an bound of the relative entropy between the marginal law of particles and its limiting product measure. For singular -interactions, Tomašević et. al. used the the partial Girsanov transform to derive the propagation of chaos in [27, 51]. Recently, Hao et. al. further showed the strong convergence of the propagation of chaos with singular -interactions in [22]. Also, based on Lacker’s approach, Cattiaux gave an estimate on the path space in [6], by using the invariance of relative entropy under time reversal [5]. The results in [12] and [20] are uniform in time for the Coulomb and the Biot-Savart kernel, respectively. There is a vast literature on this topic, and we provide recent review articles [7, 8] for the convenience of readers.
In this work, we propose to use the information theory to study the propagation of chaos by comparing the discrepancy between the joint law of the particle system and the corresponding mean-field equation in terms of KL-divergence defined by
(1.6) |
where and are two probability measures over some appropriate space . In our framework, the propagation of chaos is reduced to the space for driving processes with possible lower dimension. We will mainly take the second-order systems as the example, which avoids using the usual hypocoercivity technique or taking the pseudo-inverse of the diffusion matrix. We remark that the bounds under relative entropy for the second order system can be obtained by direct Girsanov transform if one takes the pseudo-inverse of the degenerate diffusion matrix as mentioned in [31, Remark 4.5]. Nevertheless, we believe our approach is still of significance as there is no degeneracy in diffusion if we look at the measures in the space for driving processes, which could be more stable for numerical computation. We will also look at the application of our framework to other illustrating examples.
We focus an estimate for the KL-divergence between the laws in path space, in particular . Here and are probability distributions in the path space (for fixed time interval ) corresponding to the SDE systems (1.1) and independent copies of (1.4) respectively. Denoting , in the path space, the path measures satisfy , and ( is the original probability measure such that is a Brownian motion). With this setting, is the time marginal of , and is then the time marginal of . We then regard the process of the mean-field McKean SDEs and the interacting particle systems as the same dynamical system with different driving processes (input signals). Then, applying the data processing inequality, we can work on probability measures in the space for the input signals instead of the space for the particles. The former space is sometimes easier to deal with than the latter as one may avoid the degeneracy of the diffusion. Moreover, the dimension could be lower. This has similarity with the so-called latent space in machine learning [38]. Moreover, we will also present the applications of the framework onto neural networks and numerical analysis to illustrate this point.
The rest of the paper is organized as follows: In Section 2, we present our main ideas. The result (Theorem 3.6) on the propagation of chaos for the second-order system in path space is shown in Section 3 for both bounded kernels (not necessarily smooth) or Lipschitz kernels (not necessarily bounded) with the necessary assumptions and auxiliary lemmas. In Section 4, we provide two applications of our approach on numerical analysis and neural networks. Lastly in Section 5, we perform a discussion on the reversed relative entropy and mass-independence.
2 The main idea of the new framework
In this section, taking the second order system as the example, we present the main ideas without rigorous proof. The rigorous mathematical setup, assumptions and proof will be given in the next section.
For fixed , let be the law of the trajectories of the following Mckean SDE system (1.4). Then the tensorized distribution is the law of trajectories of the following system:
(2.7) |
and the particles , are independent.
The key idea of this work is rewriting (1.1) above into:
(2.8) |
where the process is defined by
(2.9) |
Here,
(2.10) |
We also denote
(2.11) |
Based on (2.8) and (2.7), formally, we write the generalized dynamics
(2.12) |
Here, is a driving process. In (2.8), the driving process is taken as the noise process while in (2.7) is taken as For fixed initial data, as shown in (2.13), the driving process can be viewed as an input, then through the equation (2.12), the particle trajectory is obtained as an output.
(2.13) |
From this perspective, a natural guess is that, if there is only slight difference between two driving processes, the difference between the outputs might be not large. Luckily, if the mean field McKean SDE (1.4) has pathwise uniqueness, the following well-known data processing inequality [9] can help to establish such intuition.
Lemma 2.1 (data processing inequality).
Consider a given conditional probability and that is produced by given . If is the distribution of when is generated by , and is the distribution of when is generated by , then for any convex function satisfying and being strictly convex at , it holds
(2.14) |
where the -divergence is defined by
(2.15) |
Remark 2.2.
Taking , the -divergence is the famous KL-divergence. In this paper, we focus on this special case.
Remark 2.3.
The data processing inequality is also well-known in probability and statistics (e.g. [31]), which states that for any probability measures on a common measurable space and any measurable function into another measurable space.
Now, by the data processing inequality, we can control the KL-divergence between the output into that between the input. In this respect, we change our problem from the trajectory space into the space for the driving process . Exactly, we find that
where we recall and are path measures introduced in Section 1 and we denote to be the path measures for
To compute the latter relative entropy, we rewrite the equation for by
(2.16) |
Then, satisfies an SDE in the space of the driving process, with a dimension smaller than that of . Then, by Girsanov’s transform, it holds
(2.17) |
Note that this reduction avoids the degeneracy of the diffusion coefficient. Though the degeneracy can be treated by using the pseudo-inverse as remarked in [31], such a reduction could be helpful for practical estimates using numerical computations. We will give more details in the next sections.
Let us discuss the choice of the noise and dynamical system. One may be tempted to rewrite the mean-field McKean SDE into
with
Then, the -body interacting particle system is given by
with .
The two systems are also the same dynamical system with difference driving noises
At first glance, this formulation seems good since the drift in involves only the solution to the mean-field McKean SDE. Then, one may apply the law of large numbers. However, this is not the case. In fact, applying the data processing inequality, one has
where is the law for . We consider
Here, the mapping is the solution map for the -body interacting dynamical system and is the time marginal. This is again an SDE in the space for the driving process. Then,
The point is that the Radon-Nykodym derivative is integrated against . The Girsanov’s transform then gives that
(2.18) |
where the inside is changed from to ! The eventual result is the same as (2.17).
3 The application to the second order systems
In this section, we establish the propagation of chaos in path space for the second order systems using the framework of information theory, in particular the data processing inequality.
We first present our assumptions on the kernels and coefficients. The first set of assumptions requires that is bounded. {assumption}
-
(a)
The kernel has finite essential bound, namely, .
-
(b)
The matrix is non-degenerate with minimum eigenvalue
Remark 3.1.
In our main text, the matrix is a constant matrix for notational convenience. However, a time- and state-dependent diffusion is allowed as long as the spectrum of is uniformly bounded above and away from zero and the well-posedness results in the following subsection preserves. It is similar with [31, Remark 4.5].
The boundedness condition for the interaction kernel (condition (a) in Assumption 3 above) sometimes is strong in practice. Here, if we assume that the initial distribution has a fast decaying tail, we can allow a Lipschitz kernel. In fact, we will assume also alternatively the followings: {assumption}
-
(a)
The initial space-marginal distribution of the Mckean SDE (1.4) is sub-Gaussian, namely, there exists such that for any , .
-
(b)
The interaction kernel is -Lipschitz, namely, , .
-
(c)
The matrix is non-degenerate with minimum eigenvalue
3.1 The well-posedness of the mean field McKean SDE.
Under either Assumption 3 or 3, we are able to establish the propagation of chaos using nearly the same method. As a first step, we consider the solution map of (1.4). For fixed initial data, we rewrite it as
(3.19) |
We first have the following observation.
Lemma 3.2.
The result under Assumption 3 is very standard because the corresponding SDE system even has strong solutions. For the first, the well-posedness under some more general singular kernels have been established as well. One may refer to [29, 56, 22] for related discussion.
As soon as we have the well-posedness for the nonlinear Fokker-Planck equation, then is smooth for any , and thus locally Lipschitz. Now, we take as given. We conclude the following.
Lemma 3.3.
For the uniqueness, it is relatively straightforward. In fact, for any two continuous solutions and given , they stay in a compact set. On this compact set, is Lipschitz on for any . The integral on can be made arbitrarily small. The uniqueness can then be obtained by direct comparison. For the existence, one may consider the regularized equation where is redefined to be for . The obtained solution can be shown to be uniformly bounded. Then, it is not hard to show they are relatively compact in by the Arzela-Ascoli criterion, with any limit point being a solution of the integral equation.
With the above fact, the mean field McKean SDE (1.4) actually has a unique strong solution. For a fixed time , we may introduce the mapping
(3.21) |
where is a generic driving process, and is the solution of the dynamical system (3.19).
For fixed , only depends on for . If we change , the solution process will clearly agree on the common subinterval. Below, we will consider varying , but we will not change the notation for convenience. Moreover, the dependence on the initial data is also not written out explicitly for clarity. Consequently, recalling the definitions , and , , then one has
(3.22) |
With the conditions above, next we establish the propagation of chaos result for distributions starting from a chaotic configuration (i.e., ).
3.2 Propagation of chaos in path space and the corollaries.
We again note a fact from standard SDE theory.
Lemma 3.4.
The existence of weak solution for bounded follows from a standard Girsanov transform (see e.g. [45, Theorem 8.6.5], [49, Theorem 27.1], [34, Theorem 2.1]). The uniqueness in law for bounded kernels is also standard and one may refer to the discussion in [49, page 155, Chapter 4, Section 18].
The weak well-posedness of the SDE implies that the Liouville equation (1.3) has weak solutions. The uniqueness of the Liouville equation (1.3) can also be established with the bounded or Lipschitz assumption on (see e.g. [48]). It is straightforward to see that if the initial is symmetric, is symmetric due to the fact that satisfies the same Liouville equation as , where is an arbitrary permutation for (see, for instance, a similar argument in [42]). Similar argument also applies to the law in the path space. In fact, for any weak solution , it is not hard to see is also a weak solution. Then, the uniqueness in law implies that the law in the path space is symmetric. This in fact arises from the exchangeability of the particle systems.
Next, we have the following result under Assumption 3.
Lemma 3.5.
The first claim can be verified by calculating via Itô’s formula. The second one is actually also obvious by the first-order moment bound for , which is obvious under Assumption 3. Below, we present and prove the main result in this section.
Theorem 3.6.
Proof 3.7.
Recall equations (2.7)-(2.11). Note that we consider the weak solution to (1.1). Hence, the Brownian motions are not necessarily in the same space. However, since the McKean SDE has a strong solution, we may without loss of generality to take the Brownian motions in (2.7) to be the ones used for the weak solutions of (1.1), without altering the laws.
The corresponding driving process in the path space are
Let denote the law of (recall that ) with initial data and is similarly defined. Then, for initial data obeying the distribution , one has
(3.25) |
By the data processing inequality (Lemma 2.1), one has that
(3.26) |
where , are path measures generated by and , respectively, corresponding to the time interval . Namely, , and . By definition of the process , , and the Radon-Nikodym derivative exists. One can find the expression of this Radon-Nikodym derivative explicitly by Girsanov’s transform. In fact, denote the -dimensional vector with
Note that
where is defined in (3.21), and maps in path space to its time marginal, namely, . Then the Girsanov’s transform asserts that the Radon-Nikodym derivative in the path space satisfies
(3.27) |
In Appendix A, we present a formal derivation of the details for (3.7). The strict proof can be found in many text books, e.g. [45, Theorem 8.6.5], [49, Theorem 27.1], [34, Theorem 2.1]. Since
one has by combining (3.26) and (3.7) that
(3.28) |
Moreover, due to the fact (3.25) and the convexity of the KL-divergence, one has by Jensen’s inequality that
(3.29) |
where the expectation on the right hand is now the full expectation.
Next, we estimate (3.29). We separately estimate this under Assumption 3 (bounded ) or Assumption 3 (unbounded ).
Case 1: Under Assumption 3.
We first split the right hand side into (3.29) into
where is defined by
Since by Assumption 3, it is easy to see that for , the first term above is bounded by . For the second term, for any fixed , choosing (the time marginal distribution for particle position at time ) and in Lemma 3.13 (as we shall present in Section 3.3), for any we have
where is defined by
Consider the map : , by the data processing inequality (Lemma 2.1) we know that
Also, Lemma 3.14 in Section 3.3 states that for ,
Hence, considering the averaged summation for and combining all the above, one obtains
(3.30) |
where . The result (3.23) is obtained after the Grönwall’s inequality:
where is a positive constant independent of the particle number and the particle mass . For instance, if we choose , then we can choose with and .
Case 2: Under Assumption 3.
Now we consider the case for the unbounded interaction kernel. First, for fixed , still by Lemma 3.13, for any , we have (recalling the notations and above)
(3.31) |
Now note that
(3.32) |
Moreover, under Assumption 3, is a sub-Gaussian random variable, and
(3.33) |
Therefore, the conditions required in Lemma 3.15 are satisfied. Consequently, we have the similar estimate under Assumption 3:
(3.34) |
where , are positive constant independent of and . Therefore the -upper bound for is obtained due to Gröwnwall’s inequality.
The results above are all about path measures. In fact, we can extend this to the time marginal case, which is commonly studied in related literature.
Corollary 3.8 (time marginal).
Proof 3.9.
For any , consider the path measures , corresponding to the time interval . Then by Theorem 3.6,
Remark 3.10.
The fact that the KL-divergence between path measures can control that between time marginals can actually be proved without data processing inequality, In fact, for , the Radon-Nikodym derivative in terms of time marginal distributions has the following formula: (see, for instance, Appendix A in [36])
(3.39) |
Then by Jensen’s inequality, we directly conclude that
In fact, these two approaches are essentially the same, since they are all due to Jensen’s inequality.
Based on Theorem 3.6 and Pinsker’s inequality [46], we are able to extend the propagation of chaos to that under total variation (TV) distance defined by
(3.40) |
for two probability measures , defined on .
Corollary 3.11.
Remark 3.12.
Our approach can be applied to the following first-order system without difficulty
(3.43) |
where is the non-interaction drift and the setting of , , is same as the second-order case. We skip the proof for this case.
3.3 Some auxiliary lemmas.
In this subsection we present some auxiliary lemmas used in our proof. The detailed proof of Lemma 3.14 is moved to the Appendix.
Near the end of the proof of Theorem 3.6, in order to estimate the difference between the two drifts
we need the following two lemmas, where a type of Fenchel-Young’s inequality along with an exponential concentration estimate are needed. In fact, the Fenchel-Young type inequality ([26, Lemma 1]) states that:
Lemma 3.13.
For any two probability measures and on a Polish space and some test function , one has that ,
We also need the following exponential concentration estimate. Similar results can be found in related literature like [37, 26]. For the convenience of the readers, we also attach a proof in Appendix B.
Lemma 3.14.
When the interaction kernel is bounded, Lemma 3.13, Lemma 3.14 along with other previous analysis enable one to obtain an -upper bound for , and it is easy to see that the bound is independent of the particle mass . When is not bounded, we make use of Lemma 3.13 and Lemma 3.15 below instead:
Lemma 3.15.
[14, Lemma 3.3], Consider and satisfying and for the universal constant in the Hoeffding’s inequality, the following holds
(3.44) |
Then,
(3.45) |
For readers’ convenience, here we briefly introduce the Hoeffding bound used in the statement (as well as its proof) of Lemma 3.15 above. The Hoeffding inequality [52] claims that for independent centered real random variables , there exists a universal constant such that
(3.46) |
where the norm (or the Orlicz norm with ) for some sub-Gaussian random variable is given by
(3.47) |
The following well-known linear scaling property of the relative entropy is useful for controlling the marginal distribution. (See e.g. [40, Lemma 3.9], [10, Equation (2.10), page 772].)
Lemma 3.16 (linear scaling for KL-divergence).
Let be a symmetric distribution over some space tensorized space and . For , define its -th marginal by
(3.48) |
Assume that for any Then it holds that
(3.49) |
4 Other applications
In this section, we show two application of our approach in neural networks and numerical analysis respectively.
4.1 Application in neural networks.
An interesting application is on neural networks. To show the characteristics of our approach, we use an artificial single-layer neural network as an example:
(4.50) |
where denotes input features and denotes certain activate function. The means the output. This model can be viewed as a single-layer variant with noise from the model mentioned in [54]. Our approach can be directly applied into (4.50) and transform the original problem in the space of into the space of the driving process
similarly to the discussion in Section 2. The existence of the activate function make it impossible to use Girsanov’s theorem directly, while our approach works in this case as well. Also, if one uses the second-order dynamics to update the features, that is,
the uniformity in mass is not a direct byproduct of Girsanov’s theorem.
4.2 Application in numerical analysis.
Our approach can be applied in numerical analysis directly. For example, take the following scheme of SDE (1.1) with time step . Without loss of generality, we set and . Assume that is globally Lipschitz continuous with a constant , and the second moment of the initial data is finite:
(4.51) |
Define
We use to denote the numerical solution. For (), is defined by
For and similar to the proof of Theorem 3.6, one has
(4.52) | ||||
Consider equation (4.52), by Itô’s calculus and the assumption on , one has
By the exchangeability, One has
By the Grönwall inequality and the assumption (4.51), it holds that
(4.53) |
Hence,
(4.54) |
Then, combining (4.52) and (4.54), one obtains
(4.55) | ||||
5 More discussions
Here we present brief discussions on the reversed relative entropy and the mass independence phenomenon.
5.1 Discussion on the reversed relative entropy.
In section 3, we estimated the relative entropy . If we consider the reversed relative entropy, by the data processing inequality, one would obtain that
(5.56) |
Since
one thus finds that
Here, is the position process for the mean-field McKean SDE, whose components are i.i.d.. Hence, the right hand side can be estimated by
(5.57) |
where is independent of and . The result linearly depending on is similar with [31, Lemma 4.11]. This is an interesting observation, though the consequence of such a relative entropy estimate is unclear.
5.2 Discussion on the mass-independence.
Denote the marginal distributions in the -direction:
(5.58) |
It is not difficult to see from the proof of Theorem 3.6 that the KL-divergence in the -direction has an upper-bound, and the bound is independent of the particle mass . The mass-independence result is particularly interesting from a physical perspective. Additionally, when conducting numerical simulations in the regime of large friction, such as in viscous fluids, this phenomenon must be taken into account. Some researchers [55, 4, 53] focus on the zero mass limit under various conditions. If the propagation of chaos can be shown to be uniform in mass, then the result is asymptotically preserving in the overdamped limit.
However, the mass independence result is not very natural from a physical perspective. For fixed mass and fixed initial data, considering the mapping the limiting behavior as is poor and the norm of (or ) usually diverges. On the other hand, under our framework, the dependence of in the mapping is not important when applying the data processing inequality. This may indicates the KL divergence is a suitable tool to obtain a rate independent of the mass. To illustrate this, we provide a simple example. Consider the channel , where . Then, if we simply consider the Gaussian data , , the inequality for the KL-divergence between their distributions , still holds for any : . In fact, direct calculation gives , and , since , . However, it is easy to check that the norm of single data may blow up as tends to zero, since the variance of is just .
Acknowledgement
This work is financially supported by the National Key R&D Program of China, Project Number 2021YFA1002800 and Project Number 2020YFA0712000. The work of L. Li was partially supported by NSFC 12371400 and 12031013, Shanghai Science and Technology Commission (Grant No. 21JC1403700, 20JC144100, 21JC1402900), the Strategic Priority Research Program of Chinese Academy of Sciences, Grant No. XDA25010403, Shanghai Municipal Science and Technology Major Project 2021SHZDZX0102. We thank Zhenfu Wang and the anonymous referees for some helpful comments.
Appendix A Basics on path measure and Girsanov’s transform
Here we present a formal derivation of Girsanov’s transform. Note that the derivation here is never meant to be a proof. We present it here for the convenience of readers for intuitive understanding. Consider the following two SDEs in with different predictable drifts but the same diffusion , which we assume are weakly well-posed.
(1.59) |
Here is a standard Brownian motion under the probability measure (the same for the two systems), and is a common, but random, initial position. Here, the drift depends on the path for .
For a fixed time interval the two processes and naturally induce two probability measures in the path space , denoted by and respectively.
Define the process
(1.60) |
where . By Girsanov theorem, under the probability measure satisfying
(1.61) |
the law of is the same as the law of under . In other words, for any Borel measurable set
Since and are the laws of and respectively, then one has
It follows that the Radon-Nikodym derivative satisfies
(1.62) |
which is a martingale under and its natural filtration
Below, for the reader’s convenience, we give a simple derivation for the formulas (1.61) (or (1.62)) from a discrete perspective. This is not a rigorous proof but it is illustrating for the Girsanov’s transform. For simplicity, let and be a scalar. The general derivation can be performed similarly.
Consider
where , where is some interpolation using the data , and under probability measure .
Clearly the posterior distribution is Gaussian, so one can calculate the joint distribution of :
Suppose there is another probability measure such that the law of is the same as the law of under , where one can similarly introduce the discrete version
and the joint distribution
Then by change of measure, for any measurable , it holds
namely,
So clearly , where
Letting , we are expected to have
Taking into account (recall , ), we derive that
Also, since the two measures , are equivalent, is well defined and can be derived in the exactly same way. Here we directly present its expression
Appendix B Proof of Lemma 3.14
Here we prove Lemma 3.14 in Section 3.3. The critical point of the proof is the usage of he Marcinkiewicz-Zygmund type inequality (see for instance, Theorem 2.1 in [47], Lemma 5.2 in [37], or Lemma 3.3 in [35]).
Proof B.1.
(Proof of Lemma 3.14.) Fix and fix . For define
Then
Clearly, since (, , ) by independency, and since is uniformly upper-bounded by by Assumption 3, we know that is a sequence of -martingale differences () with respect to the filtration That is, for each is -measurable, and This then enables one to apply the Marcinkiewicz-Zygmund type inequality, and to obtain
Moreover, for each , define the sequence
Clearly, , is a sequence of -martingale differences () with respect to the filtration , and . Using the Marcinkiewicz-Zygmund type inequality again, one obtains
Now Taylor’s expansion gives
Note that all norm above is associated with the conditional expectation . For , . Moreover, by Stirling’s formula, there exists such that
Hence, if we choose ,
References
- [1] G Ben Arous and Ofer Zeitouni. Increasing propagation of chaos for mean field models. In Annales de l’institut Henri Poincare (B) Probability and Statistics, volume 35, pages 85–102. Elsevier, 1999.
- [2] Gérard Ben Arous and Marc Brunaud. Methode de laplace: etude variationnelle des fluctuations de diffusions de type. Communications in Statistics-Simulation and Computation, 31(1-4):79–144, 1990.
- [3] Werner Braun and Klaus Hepp. The Vlasov dynamics and its fluctuations in the 1/N limit of interacting classical particles. Communications in mathematical physics, 56(2):101–113, 1977.
- [4] José A Carrillo and Young-Pil Choi. Mean-field limits: from particle descriptions to macroscopic equations. Archive for Rational Mechanics and Analysis, 241:1529–1573, 2021.
- [5] Patrick Cattiaux. Singular diffusion processes and applications. 2013.
- [6] Patrick Cattiaux. Entropy on the path space and application to singular diffusions and mean-field models. arXiv preprint arXiv:2404.09552, 2024.
- [7] Louis-Pierre Chaintron and Antoine Diez. Propagation of chaos: a review of models, methods and applications. i. models and methods. arXiv preprint arXiv:2203.00446, 2022.
- [8] Louis-Pierre Chaintron and Antoine Diez. Propagation of chaos: a review of models, methods and applications. ii. applications. arXiv preprint arXiv:2106.14812, 2022.
- [9] Thomas M. Cover and Joy A. Thomas. Entropy, Relative Entropy, and Mutual Information, chapter 2, pages 13–55. John Wiley & Sons, Ltd, 2005.
- [10] Imre Csiszár. Sanov property, generalized i-projection and a conditional limit theorem. The Annals of Probability, pages 768–793, 1984.
- [11] Felipe Cucker and Steve Smale. Emergent behavior in flocks. IEEE Transactions on automatic control, 52(5):852–862, 2007.
- [12] François Delarue and Alvin Tse. Uniform in time weak propagation of chaos on the torus. arXiv preprint arXiv:2104.14973, 2021.
- [13] Roland L’vovich Dobrushin. Vlasov equations. Funktsional’nyi Analiz i ego Prilozheniya, 13(2):48–58, 1979.
- [14] Kai Du and Lei Li. A collision-oriented interacting particle system for landau-type equations and the molecular chaos. arXiv preprint arXiv:2408.16252, 2024.
- [15] Antoine Georges, Gabriel Kotliar, Werner Krauth, and Marcelo J Rozenberg. Dynamical mean-field theory of strongly correlated fermion systems and the limit of infinite dimensions. Reviews of Modern Physics, 68(1):13, 1996.
- [16] J Willard Gibbs. On the fundamental formulae of dynamics. American Journal of Mathematics, 2(1):49–64, 1879.
- [17] Josiah Willard Gibbs. Elementary principles in statistical mechanics: developed with especial reference to the rational foundations of thermodynamics. C. Scribner’s sons, 1902.
- [18] François Golse, Clément Mouhot, and Thierry Paul. On the mean field and classical limits of quantum mechanics. Communications in Mathematical Physics, 343:165–205, 2016.
- [19] Carl Graham, Thomas G Kurtz, Sylvie Méléard, Philip E Protter, Mario Pulvirenti, Denis Talay, and Sylvie Méléard. Asymptotic behaviour of some interacting particle systems; McKean-Vlasov and Boltzmann models. Probabilistic Models for Nonlinear Partial Differential Equations: Lectures given at the 1st Session of the Centro Internazionale Matematico Estivo (CIME) held in Montecatini Terme, Italy, May 22–30, 1995, pages 42–95, 1996.
- [20] Arnaud Guillin, Pierre Le Bris, and Pierre Monmarché. Uniform in time propagation of chaos for the 2d vortex model and other singular stochastic systems. Journal of the European Mathematical Society, 2024.
- [21] Arnaud Guillin, Wei Liu, Liming Wu, and Chaoen Zhang. The kinetic fokker-planck equation with mean field interaction. Journal de Mathématiques Pures et Appliquées, 150:1–23, 2021.
- [22] Zimo Hao, Michael Röckner, and Xicheng Zhang. Strong convergence of propagation of chaos for mckean–vlasov sdes with singular interactions. SIAM Journal on Mathematical Analysis, 56(2):2661–2713, 2024.
- [23] Dirk Horstmann. From 1970 until present : the Keller-Segel model in chemotaxis and its consequences I. Jahresbericht der Deutschen Mathematiker-Vereinigung, 105(3):103–165, 2003.
- [24] Pierre-Emmanuel Jabin. A review of the mean field limits for Vlasov equations. Kinetic and Related models, 7(4):661–711, 2014.
- [25] Pierre-Emmanuel Jabin and Zhenfu Wang. Mean field limit for stochastic particle systems. Active Particles, Volume 1: Advances in Theory, Models, and Applications, pages 379–402, 2017.
- [26] Pierre-Emmanuel Jabin and Zhenfu Wang. Quantitative estimates of propagation of chaos for stochastic systems with kernels. Inventiones mathematicae, 214:523–591, 2018.
- [27] Jean-Francois Jabir, Denis Talay, and Milica Tomašević. Mean-field limit of a particle approximation of the one-dimensional parabolic–parabolic keller-segel model without smoothing. Electronic Communications in Probability, 23(84):14, 2018.
- [28] James H Jeans. On the theory of star-streaming and the structure of the universe. Monthly Notices of the Royal Astronomical Society, Vol. 76, p. 70-84, 76:70–84, 1915.
- [29] Nicolai V Krylov and Michael Röckner. Strong solutions of stochastic equations with singular time dependent drift. Probability theory and related fields, 131:154–196, 2005.
- [30] Daniel Lacker. On a strong form of propagation of chaos for McKean-Vlasov equations. Electronic Communications in Probability, 23(none):1 – 11, 2018.
- [31] Daniel Lacker. Hierarchies, entropy, and quantitative propagation of chaos for mean field diffusions. Probability and Mathematical Physics, 4(2):377–432, 2023.
- [32] Daniel Lacker and Luc Le Flem. Sharp uniform-in-time propagation of chaos. Probability Theory and Related Fields, 187(1-2):443–480, 2023.
- [33] Jean-Michel Lasry and Pierre-Louis Lions. Mean field games. Japanese journal of mathematics, 2(1):229–260, 2007.
- [34] Christian Léonard. Girsanov theory under a finite entropy condition. In Séminaire de Probabilités XLIV, pages 429–465. Springer, 2012.
- [35] Lei Li, Yijia Tang, and Jingtong Zhang. Solving stationary nonlinear Fokker-Planck equations via sampling. arXiv preprint arXiv:2310.00544, 2023.
- [36] Lei Li and Yuliang Wang. A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics. arXiv preprint arXiv:2207.09304, 2022.
- [37] Tau Shean Lim, Yulong Lu, and James H Nolen. Quantitative propagation of chaos in a bimolecular chemical reaction-diffusion model. SIAM Journal on Mathematical Analysis, 52(2):2098–2133, 2020.
- [38] Yang Liu, Eunice Jun, Qisheng Li, and Jeffrey Heer. Latent space cartography: Visual analysis of vector space embeddings. In Computer graphics forum, volume 38, pages 67–78. Wiley Online Library, 2019.
- [39] Yulong Lu. Two-scale gradient descent ascent dynamics finds mixed nash equilibria of continuous games: A mean-field perspective. In International Conference on Machine Learning, pages 22790–22811. PMLR, 2023.
- [40] Laurent Miclo and Pierre Del Moral. Genealogies and increasing propagation of chaos for feynman-kac and genetic models. The Annals of Applied Probability, 11(4):1166–1198, 2001.
- [41] Sebastien Motsch and Eitan Tadmor. Heterophilious dynamics enhances consensus. SIAM review, 56(4):577–621, 2014.
- [42] Adrian Muntean, Jens Rademacher, and Antonios Zagaris. Macroscopic and large scale phenomena: coarse graining, mean field limits and ergodicity. Springer, 2016.
- [43] Roberto Natalini and Thierry Paul. On the mean field limit for Cucker-Smale models. arXiv preprint arXiv:2011.12584, 2020.
- [44] Helmut Neunzert and Joachim Wick. Die approximation der lösung von integro-differentialgleichungen durch endliche punktmengen. In Numerische Behandlung nichtlinearer Integrodifferential-und Differentialgleichungen: Vorträge einer Tagung im Mathematischen Forschungsinstitut Oberwolfach, 2. 12.–7. 12. 1973, pages 275–290. Springer, 2006.
- [45] Bernt Oksendal. Stochastic differential equations: an introduction with applications. Springer Science & Business Media, 2013.
- [46] Mark S Pinsker. Information and information stability of random variables and processes. Holden-Day, 1964.
- [47] Emmanuel Rio. Moment inequalities for sums of dependent random variables under projective conditions. Journal of Theoretical Probability, 22(1):146–163, 2009.
- [48] Michael Röckner and Xicheng Zhang. Weak uniqueness of fokker–planck equations with degenerate and bounded coefficients. Comptes Rendus. Mathématique, 348(7-8):435–438, 2010.
- [49] L Chris G Rogers and David Williams. Diffusions, Markov processes, and martingales: Itô calculus, volume 2. Cambridge university press, 2000.
- [50] Alain-Sol Sznitman. Topics in propagation of chaos. Lecture notes in mathematics, pages 165–251, 1991.
- [51] Milica Tomašević. Propagation of chaos for stochastic particle systems with singular mean-field interaction of l q- l p type. Electronic Communications in Probability, 28:1–13, 2023.
- [52] Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- [53] Wei Wang, Guangying Lv, and Jinglong Wei. Small mass limit in mean field theory for stochastic n particle system. Journal of Mathematical Physics, 63(8), 2022.
- [54] Yuelin Wang, Kai Yi, Xinliang Liu, Yu Guang Wang, and Shi Jin. ACMP: Allen-cahn message passing with attractive and repulsive forces for graph neural networks. In ICLR, 2023.
- [55] Zibo Wang, Li Lv, Yanjie Zhang, Jinqiao Duan, and Wei Wang. Small mass limit for stochastic interacting particle systems with Lévy noise and linear alignment force. Chaos: An Interdisciplinary Journal of Nonlinear Science, 34(2), 2024.
- [56] Xicheng Zhang. Stochastic volterra equations in banach spaces and stochastic partial differential equation. Journal of Functional Analysis, 258(4):1361–1425, 2010.