Concentration bounds for stochastic systems with singular kernels

Joe Jackson J. Jackson, Department of Mathematics, University of Chicago,
J. Jackson 5734 S. University Avenue, Chicago, Illinois 60637 USA jsjackson@uchicago.edu and Antonios Zitridis A. Zitridis, Department of Mathematics, University of Chicago,
A. Zitridis 5734 S. University Avenue, Chicago, Illinois 60637 USA zitridisa@uchicago.edu

Abstract.

This note is concerned with weakly interacting stochastic particle systems with possibly singular pairwise interactions. In this setting, we observe a connection between entropic propagation of chaos and exponential concentration bounds for the empirical measure of the system. In particular, we establish a variational upper bound for the probability of a certain rare event, and then use this upper bound to show that “controlled” entropic propagation of chaos implies an exponential concentration bound for the empirical measure. This connection allows us to infer concentration bounds for a class of singular stochastic systems through a simple adaptation of the arguments developed in [JW18].

1. Introduction

1.1. Stochastic particle systems

Let $N,d\in\mathbb{N}$ and $K:\mathbb{T}^{d}\rightarrow\mathbb{R}^{d}$ be a vector field defined on the $d-$ dimensional flat torus $\mathbb{T}^{d}$ . We consider the system of $N$ particles described by the dynamics

dX_{t}^{i}=\Big{(}F(X_{t}^{i})+\frac{1}{N}\sum_{j\neq i}K(X_{t}^{i}-X_{t}^{j})\Big{)}dt+\sqrt{2}dW_{t}^{i},\;\;t\in[0,T],\,\,i=1,2,...,N,

(1.1)

where the $W^{i}$ are $N$ independent standard Wiener processes defined on a standard filtered probability space $\big{(}\Omega,\mathbb{F}=(\mathcal{F}_{t})_{0\leq t\leq T},\mathbb{P})$ . We work on a finite time horizon $[0,T]$ , and we assume that the initial positions of the particles are i.i.d., i.e.

\displaystyle\bm{X}_{0}=(X_{0}^{1},...,X_{0}^{N})\sim\rho_{0}^{\otimes N}\text{ for some $\rho_{0}\in\mathcal{P}(\mathbb{T}^{d})$.}

Thus the data used to define our particle system consists of the time horizon $T>0$ , the maps $F,K:\mathbb{T}^{d}\to\mathbb{R}^{d}$ , and the initial distribution $\rho_{0}\in\mathcal{P}(\mathbb{T}^{d})$ . Under these conditions, the law $\rho^{N}_{t}=\mathcal{L}(\bm{X}_{t})$ ¹¹1Here and throughout the note we use the notation $\rho^{N}_{t}$ to indicate a curve $[0,T]\;\reflectbox{$\in$}\;t\mapsto\rho^{N}_{t}\in\mathcal{P}(\mathbb{T}^{dN})$ , and we abuse notation by writing $\rho^{N}(t,\cdot)$ for the density of $\rho^{N}_{t}$ when it exists. of the particles is formally described by the Liouiville equation

\displaystyle\partial_{t}\rho^{N}+\sum_{i=1}^{N}\text{div}_{x^{i}}\bigg{(}\rho^{N}\Big{(}F(x^{i})+\frac{1}{N}\sum_{j\neq i}K(x^{i}-x^{j})\Big{)}\bigg{)}-\sum_{i=1}^{N}\Delta_{x_{i}}\rho^{N}=0,\qquad\rho^{N}_{0}=\rho_{0}^{\otimes N},

(1.2)

where $\bm{x}=(x^{1},...,x^{N})\in\mathbb{T}^{dN}$ . When $F$ and $K$ are smooth (or at least Lipschitz), it is well known that the asymptotic behaviour of the particle system (1.1) is described by the non-local Fokker-Planck equation

\partial_{t}\overline{\rho}+\text{div}\big{(}\overline{\rho}\,(F+K*\overline{\rho})\big{)}-\Delta\overline{\rho}=0,\qquad\overline{\rho}_{0}=\rho_{0}.

(1.3)

More precisely, it is known that in an appropriate sense we have

\displaystyle m_{\bm{X}_{t}}^{N}=\frac{1}{N}\sum_{i=1}^{N}\delta_{X_{t}^{i}}\xrightarrow{N\to\infty}\overline{\rho}_{t},\quad\mathcal{L}(X_{t}^{1},...,X_{t}^{k})\xrightarrow{N\to\infty}\overline{\rho}_{t}^{\otimes k},\text{ for each fixed k}.

The first statement above confirms that $\overline{\rho}_{t}$ is the mean field limit of the empirical measures $m_{\bm{X}_{t}}^{N}$ , while the second statement is referred to as propagation of chaos. Quantitative versions of these statements, as well as finer results like central limit theorems, large deviations principles, and concentration bounds for the empirical measures $m_{\bm{X}_{t}}^{N}$ have also been obtained for regular $K$ . We do not make any attempt to summarize this literature, but refer to survey articles like [JW17, CD22] for an introduction. Extending such results to more singular kernels $K$ , which are often met in applications, is a very active area of research. We mention in particular the recent flurry of activity around singular kernels which derive from Riesz potentials (see e.g. [Ser20, BJW20, RS23, CdCRS23] and the references therein) and some recent efforts aimed at singular attractive kernels (see e.g. [BJW23, CdCRS23]). More relevant to the present note are the slightly less recent contributions of [JW16] and [JW18], where quantitative propagation of chaos was established by estimating the relative entropy between a solution of the Liouiville equation (1.2) and the tensor product $\overline{\rho}^{N}=\overline{\rho}^{\otimes N}$ of the solution of (1.3). More precisely, [JW18] established the following quantitative version of “entropic propagation of chaos”:

H_{N}(\rho^{N}_{T}|\overline{\rho}^{N}_{T})\leq H_{N}(\rho_{0}^{N}|\overline{\rho}_{0}^{N})+C/N,

(1.4)

for some constant $C$ independent of $N$ , where $H$ denotes the rescaled relative entropy

H_{N}(\rho^{N}_{t}|\overline{\rho}^{N}_{t}):=\frac{1}{N}H(\rho^{N}_{t}|\overline{\rho}^{N}_{t})=\frac{1}{N}\int_{\mathbb{T}^{dN}}\rho^{N}(t,\bm{x})\log\frac{\rho^{N}(t,\bm{x})}{\overline{\rho}^{N}(t,\bm{x})}d\bm{x}.

We note that (1.4) is established under a general initial condition for (1.2), but if $\rho^{N}_{0}=\overline{\rho}_{0}^{\otimes N}$ then the first term on the right-hand side of (1.4) vanishes, leading to $H_{N}(\rho^{N}_{T}|\overline{\rho}^{N}_{T})=O(1/N)$ , which by a classical subadditivity property of relative entropy gives $H(\rho^{N,k}_{T}|\overline{\rho}^{\otimes k}_{T})=O(k/N)$ , where $\rho^{N,k}$ denotes the $k$ -particle marginal of $\rho^{N}$ , and in particular $H(\rho^{N,k}_{T}|\overline{\rho}^{\otimes k}_{T})\to 0$ as $N\to\infty$ for each fixed $k$ , which is what we mean by entropic propagation of chaos. We note also that establishing (1.4) is not the only way to obtain quantitative entropic propagation of chaos - we refer to [Lac23] for a more detailed discussion and for an approach which leads to optimal bounds on $H(\rho_{T}^{N,k}|\overline{\rho}_{T}^{\otimes k})$ .

1.2. Our results

Our goal in this note is to point out a connection between entropic propagation of chaos and concentration inequalities for the empirical measure of the particle system (1.1). In particular, for $1\leq p<\infty$ , we are interested in bounds of the form

\displaystyle\mathbb{P}\Big{[}{\bf d}_{p}(m^{N}_{\bm{X}_{T}},\overline{\rho}_{T})>\epsilon\Big{]}\leq C_{\text{con,p}}\exp(-C_{\text{con,p}}^{-1}a_{p}(\epsilon)N),\quad a_{p}(\epsilon)\coloneqq\begin{cases}\epsilon^{2p}&p>d/2,\\ \epsilon^{2p}/\big{(}\log(2+1/\epsilon^{p})\big{)}^{2}&p=d/2,\\ \epsilon^{d}&p<d/2,\end{cases}

(1.5)

where ${\bf d}_{p}$ denotes the $p$ -Wasserstein distance. That is, we want to show that the empirical measure for the particle system admits concentration bounds of the same type available for i.i.d. samples, as established in [FG15]. When written in terms of the Liouville equations (which is often necessary for technical reasons when $K$ is singular), (1.5) becomes

\displaystyle\rho^{N}_{T}(A^{p}_{N,\epsilon})\leq C_{\text{con,p}}\exp(-C_{\text{con,p}}^{-1}a_{p}(\epsilon)N),\text{ where }A^{p}_{N,\epsilon}=\bigg{\{}\bm{x}\in\mathbb{T}^{dN}\bigg{|}{\bf d}_{p}(m^{N}_{\bm{x}},\overline{\rho}^{N}_{T})>\epsilon\bigg{\}}.

(1.6)

Such concentration inequalities were obtained for regular interactions by several authors. We highlight in particular [BGV05], where concentration inequalities for the system (1.1) were inferred from concentration inequalities for i.i.d. random variables via the so-called “synchronous coupling” method originally due to McKean [McK69] and popularized by Sznitman [Szn91], and [DLR18], where a more general concentration of measure result was obtained by exploiting certain (uniform in $N$ ) functional inequalities for the law of $\bm{X}=(X^{1},...,X^{N})$ . When $K$ is singular, the approaches in [BGV05] and [DLR18] break down, and the only estimate similar to (1.5) we are aware of is Proposition 5.3 of [HC23], where the structure of the repulsive Riesz interactions is leveraged to get bounds somewhat similar to (1.5).

To explain our main idea, we must introduce the “controlled Liouville equation”:

\partial_{t}f^{N}+\sum_{i=1}^{N}\text{div}_{x^{i}}\bigg{(}f^{N}\Big{(}\alpha^{i}(t,\bm{x})+F(x^{i})+\frac{1}{N}\sum_{j\neq i}K(x^{i}-x^{j})\Big{)}\bigg{)}-\sum_{i=1}^{N}\Delta_{x^{i}}f^{N}=0,

(1.7)

where $\bm{\alpha}=(\alpha^{1},...,\alpha^{N}):[0,T]\times(\mathbb{T}^{d})^{N}\to(\mathbb{R}^{d})^{N}$ is a measurable map which we view as a control. For the so-called $W^{-1,\infty}$ kernels of [JW18], one can (as is explained in more detail in the proof of Proposition 1.6) easily generalize the original entropy estimate (1.4) of Jabin and Wang to get an estimate of the form²²2The factor of $1/4$ in (1.8) is purely aesthetic, and is included to make (1.8) more consistent with Proposition 3.1 below.

H_{N}(f^{N}_{T}|\overline{\rho}^{N}_{T})\leq H_{N}(f^{N}_{0}|\overline{\rho}^{N}_{0})+C_{\text{ent}}\bigg{(}\frac{1}{N}+\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{d}}|\alpha^{i}(t,\bm{x})|^{2}df^{N}_{t}(\bm{x})dt\bigg{)},

(1.8)

for any solution $f^{N}$ to (1.7). We emphasize that here $f^{N}_{0}$ may be different from $\rho_{0}^{\otimes N}$ . We think of (1.8) as a “controlled” or “perturbed” version of the entropic propagation of chaos estimate (1.4) of Jabin and Wang. Our main observation is that an estimate of the form (1.8) in fact implies a concentration bound of the form (1.5). For technical reasons, we first make this observation precise through the following a-priori estimate, which requires $K$ and $\overline{\rho}^{0}$ to be bounded.

Theorem 1.1.

There are constants $C_{d,p}$ depending only on $d$ and $p$ such that the following holds: Suppose that $K$ and $F$ are bounded, the initial condition $\rho_{0}$ has a bounded density, and that there exists a solution $\overline{\rho}$ to (1.3) and a constant $C_{\text{ent}}\geq 1$ such that (1.8) holds for all $\bm{\alpha}$ and $f^{N}$ such that $f^{N}$ is an entropy solution of (1.7) as in Definition 2.2 below. Then the concentration bound (1.5) holds for each $p$ , with $C_{\text{con,p}}=C_{d,p}C_{\text{ent}}$ .

Remark 1.2.

We note that because $K$ and $F$ are bounded and there is non-degenerate idiosyncratic noise, there is no problem making sense of the SDE (1.1) or the Liouville equation (1.2).

Remark 1.3.

While $K$ is required to be bounded for technical reasons, the key point of the a-priori estimate is that $C_{\text{con,p}}$ depends on $K$ only through the constant $C_{\text{ent}}$ appearing in (1.8), and not on $\|K\|_{\infty}$ . Similarly, there is no explicit dependence of our a-priori estimate on $T$ , so if one establishes (1.4) with a constant $C_{\text{ent}}$ independent of $T$ (which can be done for the 2-D stochastic vortex model with the techniques of [GBM24]) then one gets a uniform in time concentration bound.

Remark 1.4.

Notice that in Theorem 1.1, we only require (1.4) to hold for entropy solutions of (1.7), rather than for all weak solutions. This creates some technical challenges in the proof, but it is essential for the application to $W^{-1,\infty}$ kernels below.

As mentioned above, we can verify that (1.8) holds under roughly the same conditions as in [JW18]. Thus we make use of the following assumption (and refer to the notation section below for the definition of $W^{-1,\infty}$ ):

Assumption 1.5.

For some $\beta\in(0,1)$ , $C_{0}>0$ , we have $F\in C^{1,\beta}$ , $\rho_{0}\in C^{2,\beta}$ , $K$ , $\text{div}\,K\in W^{-1,\infty}$ , and

\displaystyle\big{(}\inf_{x}\rho_{0}(x)\big{)}^{-1}+\|\rho_{0}\|_{C^{2,\beta}}+\|F\|_{C^{1,\beta}}+\|K\|_{-1,\infty}+\|K\|_{L^{1}}+\|\text{div}\,K\|_{-1,\infty}\leq C_{0}.

Proposition 1.6.

Let Assumption 1.5 hold. Then there is a unique classical solution $\overline{\rho}$ to (1.3), and for each $N\in\mathbb{N}$ there exists an admissible entropy solution $\rho^{N}$ to (1.2) in the sense of Definition 2.4. Moreover, for each $p\in[1,\infty)$ there is a constant $C_{\text{con,p}}$ which depends only on $d$ , $p$ , $T$ and $C_{0}$ , such that any admissible entropy solution $\rho^{N}$ of (1.2) satisfies (1.6).

Remark 1.7.

Regarding possible extensions: one can easily extend Theorem 1.1 by replacing $\mathbb{T}^{d}$ by $\mathbb{R}^{d}$ , but the concentration bound will then depend on the (exponential) moments of the limit $\overline{\rho}$ . We focus on the periodic case because in this case we can adapt the arguments of [JW18] to obtain (1.4). However, the recent preprint [FW23] shows that the program of [FW23] can be carried out in the whole space for the 2-D viscous vortex model, so the (non-periodic analogue of) our Theorem 1.1 can be combined with the argument in [FW23] to obtain an anologue of Proposition 1.6 in this setting. Likewise, one can easily adapt Theorem 1.1 to the kinetic setting, in which case the “controlled Liouiville equation” will involve a control only in the velocity (as is easily seen from an inspection of the proof of Lemma 3.1, in particular the application of Girsanov’s Theorem). By combining such a result with the techniques of [JW16], one can easily derive an anologue of Proposition 1.6 for kinetic particle systems with bounded forces, i.e. in the same setting as [JW16].

2. Notation and Preliminaries

2.1. Notation and some definitions

Throughout the paper $T$ is a positive real number and $\mathbb{T}^{d}$ is the $d-$ dimensional flat torus. $f*g$ is the convolution between the functions $f,g$ . If a function $u:[0,T]\times\mathbb{T}^{d}\rightarrow\mathbb{R}$ is sufficiently regular, then we denote by $\partial_{t}u,\;D_{x_{i}}u$ the partial derivative with respect to $t$ and the partial derivative with respect to $x_{i}$ (the $i$ -th coordinate) for $i=1,2,...,d$ , respectively. For $p\in[1,+\infty]$ , we use the symbol $L^{p}_{t,x}$ (resp. $L^{p}_{x})$ for the space of functions $f$ such that $\int|f(t,x)|^{p}dtdx<+\infty$ (resp. $\int|f(x)|^{p}dx<+\infty$ ). For $k\geq 1$ , $p\in[1,+\infty]$ and $\beta\in(0,1)$ , we denote by $W^{k,p}$ , $C^{k,\beta}$ the Sobolev ( $k$ weak derivatives in $L^{p}$ ) and Hölder spaces ( $k$ derivatives which are $\beta$ -Hölder continuous) on $\mathbb{T}^{d}$ , respectively. Also, we write $C^{k,\beta}_{t,x}$ for the standard parabolic Hölder spaces on $[0,T]\times\mathbb{T}^{d}$ , e.g. $C^{2,\beta}_{t,x}$ will be the space of functions $f(t,x)$ such that $\partial_{t}f$ , $Df$ , $D^{2}f$ exist, and are $\beta/2$ -Hölder in $t$ and $\beta$ -Hölder in $x$ . Similarly, we will use $W^{k,p}_{t,x}$ for the standard parabolic Hölder spaces, e.g. $W^{2,p}_{t,x}$ will indicate the space of functions $f(t,x)$ such that $\partial_{t}f$ , $Df$ , $D^{2}f$ are in $L^{p}$ .

$\mathcal{P}(\mathbb{T}^{d})$ (resp. $\mathcal{P}(X)$ ) is the space of probability measures over $\mathbb{T}^{d}$ (resp. a Polish space $X$ ), and $\mathcal{P}_{\text{ac}}(\mathbb{T}^{dN})\subset\mathcal{P}(\mathbb{T}^{dN})$ is the set of probability measures which admit a density with respect to the Lebesgue measure. For $p\geq 1$ , the $p-$ Wasserstein space is denoted by $\mathcal{P}_{p}(\mathbb{T}^{d})$ and its metric is ${\bf d}_{p}$ . Given a curve $[0,T]\;\reflectbox{$\in$}\;t\mapsto m_{t}\in\mathcal{P}(\mathbb{T}^{d})$ , we write $L^{2}_{dt\otimes m_{t}}([0,T]\times\mathbb{T}^{d},\mathbb{R}^{d})$ for the space of $\mathbb{R}^{d}$ -valued $m_{t}\otimes dt$ -square integrable functions over $[0,T]\times\mathbb{T}^{d}$ . The space of $\mathbb{R}^{d}$ -values Borel measures over $[0,T]\times\mathbb{T}^{d}$ with finite total variation is denoted by $\mathcal{M}([0,T]\times\mathbb{T}^{d},\mathbb{R}^{d})$ . For any $N\in\mathbb{N}$ , $\mathbb{U}_{N}$ is the uniform distribution on $\mathbb{T}^{dN}$ ; we will write $\mathbb{U}$ when there is no confusion. The relative entropy $H(\mu|\nu)$ of two probability measures $\mu,\nu$ is defined as follows

H(\mu|\nu)=\begin{cases}\int_{\mathbb{T}^{d}}\frac{d\mu}{d\nu}\log(\frac{d\mu}{d\nu})d\nu(x),&\text{ if }\mu\ll\nu,\\ +\infty,&\text{ otherwise.}\end{cases}

For simplicity, we write $H(\mu)=H(\mu|\mathbb{U})$ . For solutions of the Liouville equation (1.2) we will be using the symbol $\rho^{N}_{t}$ or $\rho^{N}(t)$ to denote an element of $\mathcal{P}_{\text{ac}}(\mathbb{T}^{d})$ and the symbol $\rho^{N}(t,x)$ for the corresponding density, which we can view as an element of $L^{\infty}_{t}L^{1}_{x}$ . We use the analogous notation for $f^{N}$ , a solution of the “perturbed” Liouville equation (1.7). We will be using the notation $I(\rho)=\int_{\mathbb{T}^{dN}}\frac{|D_{\bm{x}}\rho|^{2}(\bm{x})}{\rho(\bm{x})}d\bm{x}$ for the Fisher information and, for $j=1,...,N$ , $I_{j}(\rho)=\int_{\mathbb{T}^{dN}}\frac{|D_{x^{j}}\rho|^{2}(\bm{x})}{\rho(\bm{x})}d\bm{x}$ .

We now recall the precise definition of the space $W^{-1,\infty}$ .

Definition 2.1.

(i) A function $f$ with $\int_{\mathbb{T}^{d}}f=0$ belongs to $W^{-1,\infty}(\mathbb{T}^{d})$ if and only if there exists a vector field $V\in L^{\infty}(\mathbb{T}^{d})$ such that $f=\text{div}\,V$ . We denote

\|f\|_{-1,\infty}=\inf_{V}\|V\|_{L^{\infty}},\;\text{ where }f=\text{div}\,V.

(ii) A vector field $K$ with $\int_{\mathbb{T}^{d}}K=0$ belongs to $W^{-1,\infty}(\mathbb{T}^{d})$ if and only if there exists a matrix field $V\in L^{\infty}(\mathbb{T}^{d})$ such that $K=\text{div}\,V$ in the sense that $K_{i}=\sum_{j}\partial_{j}V_{ij}$ . We denote

\|K\|_{-1,\infty}=\inf_{V}\|V\|_{L^{\infty}},\;\text{ where }K=\text{div}\,V.

As in [JW18], since $K$ (and possibly $\bm{\alpha}$ ) are not smooth, in order to get controlled entropy bounds between $\rho^{N}$ and $\overline{\rho}^{N}=\overline{\rho}^{\otimes N}$ we must work with entropy solutions.

Definition 2.2.

Suppose that $K\in W^{-1,\infty}$ and $\text{div}\,K\in W^{-1,\infty}$ , so that there exists a vector field $V\in L^{\infty}(\mathbb{T}^{d})$ such that $\text{div}\,K=\text{div}\,V$ . A continuous map $[0,T]\;\reflectbox{$\in$}\;t\mapsto f_{t}^{N}\in\mathcal{P}_{\text{ac}}(\mathbb{T}^{dN})$ is an entropy solution to (1.7) on the time interval $[0,T]$ if $f^{N}$ solves (1.7) in the sense of distributions, $D_{\bm{x}}f^{N}(t,\cdot)$ exists in the weak sense for a.e $t\leq T$ and for each $0\leq t\leq T$ ,

	$\displaystyle H(f_{t}^{N})+\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\frac{\|D_{x^{i}}f^{N}(s,\bm{x})\|^{2}}{f^{N}(s,\bm{x})}d\bm{x}ds$
	$\displaystyle\qquad\leq H(f_{0}^{N})+\frac{1}{N}\sum_{i,j=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}(\alpha^{i}(s,\bm{x})+F(x^{i})+V(x^{i}-x^{j}))\cdot D_{x^{i}}f^{N}_{s}d\bm{x}ds.$		(2.1)

To indicate the dependence on $\bm{\alpha},\;F$ and $K$ we call such a solution an $(\bm{\alpha},F,K)$ -entropy solution. When $\bm{\alpha}=0$ and $f_{0}^{N}=\rho_{0}^{\otimes N}$ , we simply say that $f^{N}$ is an entropy solution of (1.2).

Remark 2.3.

(i) If $\alpha=0$ , then we have the definition of entropy solution introduced in [JW18].
(ii) If $f^{N}$ satisfies (1.7) in the classical sense, then it is also an entropy solution.
(iii) It follows that any $(\alpha,F,K)-$ entropy solution is also an $(\alpha+\tilde{\beta},F,K-\beta)$ -entropy solution, for any bounded vector field $\beta$ , where $\tilde{\beta}$ is a vector field such that $\tilde{\beta}^{i}(\bm{x})=\frac{1}{N}\sum_{j\neq i}\beta(x^{i}-x^{j})$ .
(iv) By virtue of Proposition 1 from [JW18]³³3Actually, regularity in time is not addressed in Proposition 1 of [JW18], but it is straightforward to check using the assumptions on $K$ that any entropy solution in the sense of Jabin and Wang admits a version in $C([0,T];\mathcal{P}_{\text{ac}}(\mathbb{T}^{dN}))$ ., if $F$ and $\text{div}F$ are bounded and $K,\text{div}K\in W^{-1,\infty}$ , then there exists a $(0,F,K)-$ entropy solution.

For technical reasons, we at times need to work specifically with entropy solutions of (1.2) which arise via a suitable mollification procedure. In particular, we fix throughout the paper a standard mollifier $(\rho_{\delta})_{\delta>0}$ on $\mathbb{T}^{d}$ , and we define $K_{\delta}=K*\rho_{\delta}$ , $F_{\delta}=F*\rho_{\delta}$ . Then we make the following definition.

Definition 2.4.

We say that $\rho^{N}$ is an admissible entropy solution of (1.2) if it is an entropy solution, and for some $\delta_{k}\downarrow 0$ , we have $\rho^{N,\delta_{k}}_{t}\xrightarrow{k\to\infty}\rho^{N}_{t}$ weakly for each fixed $t$ , where $\rho^{N,\delta}$ denotes the unique classical solution of (1.2) with $F$ replaced by $F_{\delta}$ and $K$ replaced by $K_{\delta}$ .

2.2. Preliminary Results

In this subsection we state and prove some results that will be useful in the paper. The first is a refinement of a compactness result borrowed from [Dau23, Proposition 1.2] for solutions $(m,\alpha)$ to the Fokker-Planck equation:

\partial_{t}m+\text{div}(\alpha m)-\Delta m=0,

(2.2)

where $m\in\mathcal{C}([0,T],\mathcal{P}_{2}(\mathbb{T}^{d}))$ and $\alpha\in L^{2}_{dt\otimes m(t)}\left([0,T]\times\mathbb{T}^{d},\mathbb{R}^{d}\right)$ .

Proposition 2.5.

Assume that, for all $k\geq 1$ , $(m_{k},\alpha_{k})$ solves the Fokker-Planck equation (2.2) starting from $m_{0}$ and satisfies the uniform energy estimate

\int_{0}^{T}\int_{\mathbb{T}^{d}}|\alpha_{k}(t,x)|^{2}dm_{k}(t)(x)dt\leq C,

(2.3)

for some constant $C>0$ independent of $k$ . Then, for any $\delta\in(0,1)$ , up to taking a subsequence, $(m_{k},\alpha_{k}m_{k})$ converges in $\mathcal{C}^{\frac{1-\delta}{2}}\left([0,T];\mathcal{P}_{2-\delta}(\mathbb{T}^{d})\right)\times\mathcal{M}([0,T]\times\mathbb{T}^{d},\mathbb{R}^{d})$ to some $(m,w)$ . The curve $m$ is in $\mathcal{C}^{1/2}\left([0,T],\mathcal{P}_{2}(\mathbb{T}^{d})\right)$ , $w$ is absolutely continuous with respect to $m(t)\otimes dt$ , for any $t_{1},t_{2}\in[0,T]$ such that $t_{1}<t_{2}$ it holds that

\int_{t_{1}}^{t_{2}}\int_{\mathbb{T}^{d}}\bigg{|}\frac{dw}{dm(t)\otimes dt}(t,x)\bigg{|}^{2}dm(t)(x)dt\leq\liminf_{k\rightarrow+\infty}\int_{t_{1}}^{t_{2}}\int_{\mathbb{T}^{d}}|\alpha_{k}(t,x)|^{2}dm_{k}(t)(x)dt

(2.4)

and $(m,\frac{dw}{dm(t)\otimes dt})$ solves (2.2) starting from $m_{0}$ .

Proof.

Let $w_{n}=\alpha_{n}m_{n}$ . For the total variation $|w_{n}|$ we have

|w_{n}|\leq\int_{0}^{T}\int_{\mathbb{T}^{d}}\bigg{|}\frac{dw_{n}}{dm_{n}(t)\otimes dt}(t,x)\bigg{|}dm_{n}(t)(x)dt\leq\sqrt{T}\bigg{(}\int_{0}^{T}\int_{\mathbb{T}^{d}}\bigg{|}\frac{dw_{n}}{dm_{n}(t)\otimes dt}(t,x)\bigg{|}^{2}dm_{n}(t)(x)dt\bigg{)}^{1/2},

therefore $|w_{n}|\leq\sqrt{CT}$ . We can therefore use Banach-Alaoglu theorem and, due to standard estimates for the Fokker-Planck equation, we can deduce that, for any $r\in(1,2)$ , $(m_{n},w_{n})$ converges in $\mathcal{C}([0,T],\mathcal{P}_{r}(\mathbb{T}^{d}))\times\mathcal{M}([0,T]\times\mathbb{T}^{d},\mathbb{R}^{d})$ to some element $(m,w)$ . In fact, elementary arguments yield that the convergence also holds also in $\mathcal{C}([t_{1},t_{2}],\mathcal{P}_{r}(\mathbb{T}^{d}))\times\mathcal{M}([t_{1},t_{2}]\times\mathbb{T}^{d},\mathbb{R}^{d})$ .

It is straightforward, because of the convergence, that $(m,w)$ satisfies the Fokker-Planck equation starting from $m_{0}$ . By Theorem 2.34 of [AFP00] we discover that $w$ is absolutely continuous with respect to $m(t)\otimes dt$ and that (2.4) holds. The bound that (2.4) provides (when $t_{1}=0$ and $t_{2}=T$ ), yields $m\in\mathcal{C}^{1/2}\left([0,T],\mathcal{P}_{2}(\mathbb{T}^{d})\right)$ , due to standard estimates for the Fokker-Planck equation. ∎

Remark 2.6.

As an artifact of the proof of the above proposition, we get that if $(m_{k},\alpha_{k})$ satisfies (2.3) and the sequence $(m_{k})_{k\in\mathbb{N}}$ is known to converge in $\mathcal{C}^{\frac{1-\delta}{2}}([0,T];\mathcal{P}_{2-\delta}(\mathbb{T}^{d}))$ , then $w_{k}=\alpha_{k}m_{k}$ converges in $\mathcal{M}([0,T]\times\mathbb{T}^{d};\mathbb{R}^{d})$ and (2.4) holds as well.

We are also going to need the following two elementary lemmas, the first of which can be inferred [DE11, Proposition 2.4.2] (notice that the boundness assumption on $f$ there is not necessary) and the second of which is standard.

Lemma 2.7.

Let $X$ be a Polish space, $\mu\in\mathcal{P}(X)$ and a function $f:X\rightarrow\mathbb{R}\cup\{-\infty,+\infty\}$ such that $e^{f}\in L^{1}(\mu)$ . Then, the following equality holds

\inf_{\nu\in\mathcal{P}(X)}\bigg{\{}H(\nu|\mu)-\int_{X}f(x)d\nu(x)\bigg{\}}=-\log\bigg{(}\int_{X}e^{f(x)}d\mu(x)\bigg{)}.

(2.5)

In particular, if $A\subset X$ is any Borel set, then

\displaystyle-\log\mu(A)=\inf_{\nu\in\mathcal{P}(X),\,\nu(A)=1}H(\nu|\mu).

(2.6)

Lemma 2.8.

Let $(\mu_{n})_{n\in\mathbb{N}}$ be a sequence of probability measures over $\mathbb{T}^{dN}$ with densities converging weakly in $L^{1}$ to the probability measure $\mu$ . We also suppose that $(A_{n})_{n\in\mathbb{N}}$ is an increasing sequence of subsets of $\mathbb{T}^{dN}$ converging to a set $A$ . Then, $\lim_{n}\mu_{n}(A_{n})=\mu(A)$ .

3. Proofs of the main results

3.1. Proof of Theorem 1.1

To prove Theorem 1.1, we are going to need the following variational lower bound for $(0,K)$ -entropy solutions.

Proposition 3.1.

Let the hypotheses of Theorem 1.1 hold. Let $\rho^{N}$ be an entropy solution of (1.2) in the sense of Definition 2.2 and $A$ an open subset of $\mathbb{T}^{dN}$ . Then, the following inequality holds

-\frac{1}{N}\log\rho^{N}_{T}(A)\geq\inf_{\bm{\alpha},f:f_{T}(A)=1}\bigg{\{}\frac{1}{N}H(f_{0}|\rho_{0}^{N})+\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{d}}|\alpha^{i}(t,\bm{x})|^{2}df_{t}(\bm{x})dt\bigg{\}},

(3.1)

where the infimum is taken over all $f\in C([0,T];\mathcal{P}(\mathbb{T}^{d}))$ and $\bm{\alpha}=(\alpha^{1},...,\alpha^{N}):[0,T]\times\mathbb{T}^{dN}\rightarrow\mathbb{R}^{dN}$ such that $f$ is a $(\bm{\alpha},K)$ -entropy solutions in the sene of Definition 2.2.

Proof.

We give the proof only when $F=0$ for simplicity, because the proof with $F\neq 0$ is the same but more notationally cumbersome.

Step 1 ( $K$ is smooth and terminal condition is mollified). Assume that $K$ is smooth. We consider the function $G$ with $G(x)=0$ if $x\in A$ and $G(x)=+\infty$ if $x\in A^{c}$ , and, for $\delta\in(0,1)$ , let $G_{\delta}$ be a smooth function such that $G_{\delta}(x)=0$ on $A$ and $\frac{2}{\delta}\geq G_{\delta}(x)>1/\delta$ if $\text{dist}(A)>\delta$ .

Since $K$ is assumed to be smooth, (1.1) is strongly uniquely solvable, therefore we can consider its solution ${\bf X}_{\cdot}^{N}$ and its law (under $\mathbb{P}$ ) $\mathcal{L}({\bf X}_{\cdot}^{N})$ in the path space $\mathcal{C}([0,T];\mathbb{T}^{dN})$ . By using Lemma 2.7 and Girsanov’s Theorem as in the proof of Theorem 4.1 of [BD98]⁴⁴4The only difference is that we have a random initial condition, which leads to the additional term in the right-hand side of (3.1), and the fact that we work on the torus, which is no problem because a version of (3.1) on the torus easily follows from the corresponding version on $\mathbb{R}^{d}$ . , we have

	$\displaystyle-\frac{1}{N}\log\mathbb{E}\bigg{[}e^{-G_{\delta}(X_{T})}\bigg{]}$	$\displaystyle=\frac{1}{N}\inf_{Q\ll\mathcal{L}({\bf X}_{\cdot})}\bigg{\{}H(Q\|\mathcal{L}({\bf X}_{\cdot}))+\int_{\mathcal{C}([0,T];\mathbb{T}^{dN})}G_{\delta}(\omega_{T})dQ(\omega)\bigg{\}}$
		$\displaystyle=\frac{1}{N}\inf_{\bm{\alpha},\bm{Y}}\bigg{\{}H\big{(}\mathcal{L}(\bm{Y}_{0})\|\mathcal{L}(\bm{X}_{0})\big{)}+\mathbb{E}\Big{[}\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\|\alpha_{t}^{i}\|^{2}dt+G_{\delta}(\bm{Y}_{T})\Big{]}\bigg{\}},$		(3.2)

where the second infimum is taken over all pairs consisting of a square-integrable adapted $\mathbb{R}^{dN}$ -valued process $\bm{\alpha}=(\alpha^{1},...,\alpha^{N})$ and a continuous process $\bm{Y}$ satisfying

\displaystyle dY_{t}^{i}=\bigg{(}\alpha_{t}^{i}+\frac{1}{N}\sum_{k\neq i}K(Y_{t}^{i}-Y_{t}^{j})\bigg{)}dt+\sqrt{2}dW_{t}^{i},\quad i=1,...,N.

Furthermore, by the mimicking theorem [BS13, Corollary 3.7], for any such $(\bm{\alpha},\bm{Y})$ , there exists a measurable function $\bm{\alpha}=(\alpha^{1},...,\alpha^{N}):[0,T]\times\mathbb{T}^{dN}\rightarrow\mathbb{R}^{dN}$ and a process $\tilde{\bm{Y}}$ such that $\tilde{\bm{Y}}$ is a weak solution of

\displaystyle d\tilde{Y}_{t}^{i}=\bigg{(}\alpha^{i}(t,\tilde{\bm{Y}}_{t})+\frac{1}{N}\sum_{j\neq i}K(\tilde{Y}_{t}^{i}-\tilde{Y}_{t}^{j})\bigg{)}dt+\sqrt{2}dW_{t}^{i},\quad i=1,...,N,

$\mathbb{E}\Big{[}\int_{0}^{T}|\alpha^{i}(t,\tilde{\bm{Y}}_{t})|^{2}dt\Big{]}\leq\mathbb{E}\Big{[}\int_{0}^{T}|\alpha_{t}^{i}|^{2}dt\Big{]}$ , and for any $t\in[0,T]$ , $\mathcal{L}(\tilde{Y}_{t})=\mathcal{L}(Y_{t})$ . Moreover, it is clear that for any such $\tilde{\bm{Y}}$ , its law $f_{t}=\mathcal{L}(\tilde{\bm{Y}}_{t})$ must be a weak solution of (1.2). Combining the last observations, we deduce that

\displaystyle-\frac{1}{N}\log\mathbb{E}\bigg{[}e^{-G_{\delta}({\bf X}_{T})}\bigg{]}\geq\frac{1}{N}\inf_{\bm{\alpha},f}\bigg{\{}H(f_{0}|\rho^{N}_{0})+\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}|\alpha^{i}(t,\bm{x})|^{2}df_{t}(\bm{x})dt+\int_{\mathbb{T}^{dN}}G_{\delta}(\bm{x})df_{T}(\bm{x})\bigg{\}}

(3.3)

where the infimum is over pairs $(\bm{\alpha},f)$ such that $f$ is a weak solution of (1.7).

Our next goal is to argue that we can restrict the infimum in (3.3) to smooth solutions of (1.7). For this, we note that by considering the infimum in (3.3) first with respect to $f_{0}$ and then with respect to $\bm{\alpha}$ , we see that we have

\displaystyle-\frac{1}{N}\log\mathbb{E}\bigg{[}e^{-G_{\delta}({\bf X}_{T})}\bigg{]}=\frac{1}{N}\inf_{f_{0}\ll\rho_{0}^{N}}\bigg{\{}H(f_{0}|\rho^{N}_{0})+\int_{\mathbb{T}^{dN}}V^{\delta}(\bm{x})df_{0}(\bm{x})\bigg{\}},

(3.4)

with $V^{\delta}(\bm{x})$ being the value function of a standard stochastic control problem, and in particular $V^{\delta}(\bm{x})=u^{\delta}(0,\bm{x})$ , where $u^{\delta}$ solves

-\partial u^{\delta}-\Delta u^{\delta}+|D_{\bm{x}}u^{\delta}|^{2}+\frac{1}{N}\sum_{i=1}^{N}\sum_{j\neq i}K(x^{i}-x^{j})\cdot D_{x^{i}}u^{\delta}=0,\quad(t,\bm{x})\in[0,T)\times\mathbb{T}^{dN}

(3.5)

with terminal conditions $u^{\delta}(T,\bm{x})=G_{\delta}(\bm{x})$ . Moreover, the theory of stochastic control also tells us that the optimal feedback $\bm{\alpha}$ in (3.3) is independent of $f$ , and takes the form $\alpha^{\delta,i}(t,\bm{x})=-2D_{x^{i}}u^{\delta}(t,\bm{x})$ . By parabolic regularity and the smoothness of $K$ and $G_{\delta}$ , $V^{\delta}$ and $\alpha^{\delta,i}$ are smooth. In particular, $f_{0}\mapsto\int V^{\delta}(\bm{x})df_{0}(\bm{x})$ is continuous, so for any $\epsilon>0$ , the minimization problem (3.4) admits a smooth $\epsilon-$ minimizer $f_{0}^{\epsilon}$ . If $f^{\epsilon}$ is the weak solution of (1.7) driven by $K$ and $\bm{\alpha}^{\delta}$ and starting from $f_{0}^{\epsilon}$ , we deduce that $(\bm{\alpha}^{\delta},f^{\epsilon})$ is a smooth $\epsilon$ -minimizer for the infimum in (3.3). In particular, this shows that (3.3) remains true when the infimum in the right-hand side is restricted to pairs $(\bm{\alpha},f)$ such that $\bm{\alpha}$ is smooth and $f$ is a classical (hence entropy) solution of (1.7).

Step 2 ( $K$ is smooth): The goal in this step is to take $\delta\to 0$ in (3.3). By the previous step, we can find for each $\delta>0$ a pair $(\bm{\alpha}^{\delta},f^{\delta})$ such that $\bm{\alpha}^{\delta}$ is smooth, $f^{\delta}$ is a classical solution of (1.7) and

\displaystyle-\frac{1}{N}\log\mathbb{E}\bigg{[}e^{-G_{\delta}({\bf X}_{T})}\bigg{]}\geq\frac{1}{N}H(f_{0}^{\delta}|\rho^{N}_{0})+\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}|\alpha^{i,\delta}(t,\bm{x})|^{2}df^{\delta}_{t}(\bm{x})dt+\frac{1}{N}\int_{\mathbb{T}^{dN}}G_{\delta}({\bf x})df_{T}^{\delta}(\bm{x})-\delta.

(3.6)

Using (2.1) and applying Cauchy-Schwartz, it follows that

	$\displaystyle H(f^{\delta}_{t})+$	$\displaystyle\int_{0}^{t}I(f^{\delta}_{s})ds\leq\sum_{i,j=1}^{N}\frac{1}{N}\bigg{(}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\frac{\|D_{x^{i}}f^{\delta}\|^{2}}{f^{\delta}}d\bm{x}ds\bigg{)}^{1/2}\bigg{(}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\|V(x^{i}-x^{j})\|^{2}df^{\delta}(\bm{x})ds\bigg{)}^{1/2}$
		$\displaystyle+H(f_{0}^{\delta})+\sum_{i=1}^{N}\bigg{(}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\frac{\|D_{x^{i}}f^{\delta}\|^{2}}{f^{\delta}}d\bm{x}ds\bigg{)}^{1/2}\bigg{(}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\|\alpha^{i,\delta}(t,\bm{x})\|^{2}df_{t}^{\delta}(\bm{x})dt\bigg{)}^{1/2}.$		(3.7)

However, $G_{\delta}$ converges to $G$ , so by the bounded convergence theorem, the left hand side of (3.6) converges and it is, therefore, bounded. Since $H(f_{0}^{\delta}|\rho_{0}^{N})$ , $G_{\delta}$ and $\frac{1}{N}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}|\alpha^{i,\delta}(t,\bm{x})|^{2}df_{t}^{\delta}(\bm{x})dt$ are nonnegative, this implies that the terms on the right hand side of (3.6) are also uniformly bounded (independently of $\delta$ ). Since $H(f_{0}^{\delta}|\rho^{N}_{0})$ is uniformly bounded, due to the upper bound of $\rho^{N}_{0}$ , we also get that $H(f_{0}^{\delta})$ is uniformly bounded. Combining these facts with $V\in L^{\infty}$ , we find that there exist constants $C_{1},C_{2}$ with $C_{2}>0$ , which are independent of $\delta$ , such that (3.1) becomes

C_{1}+\int_{0}^{t}\int_{\mathbb{T}^{dN}}\frac{|D_{x^{i}}f^{\delta}|^{2}}{f^{\delta}}d\bm{x}ds\leq C_{2}\bigg{(}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\frac{|D_{x^{i}}f^{\delta}|^{2}}{f^{\delta}}d\bm{x}ds\bigg{)}^{1/2}.

This clearly implies that $\int_{0}^{t}\int_{\mathbb{T}^{dN}}\frac{|D_{x^{i}}f^{\delta}|^{2}}{f^{\delta}}d\bm{x}ds$ , $i=1,...,N$ , are also uniformly bounded.

We can, now, apply Proposition 2.5 in various ways. By virtue of the uniform boundness of $\int_{0}^{T}\int_{\mathbb{T}^{dN}}\frac{|D_{x^{i}}f^{\delta}|^{2}}{f^{\delta}}d\bm{x}ds$ , $\int_{0}^{T}\int_{\mathbb{T}^{dN}}|\alpha^{i,\delta}(t,\bm{x})|^{2}df_{t}^{\delta}(\bm{x})dt$ , we get that as $\delta\rightarrow 0$ , up to subsequences

	$\displaystyle f^{\delta}\rightarrow f\;\;$	$\displaystyle\text{ in }\mathcal{C}([0,T];\mathcal{P}(\mathbb{T}^{dN})),$		(3.8)
	$\displaystyle\alpha^{i,\delta}f^{\delta}\rightarrow\alpha^{i}f,\,\,D_{x^{i}}f^{\delta}\rightarrow D_{x^{i}}f\;\;$	$\displaystyle\text{ in }\mathcal{M}([0,t]\times\mathbb{T}^{dN};\mathbb{R}^{dN}),\;i=1,...,N,\;\;t\in(0,T]$		(3.9)

In addition, by (3.1), we derive that $H(f^{\delta}_{t})$ is uniformly bounded independently of $t$ , so $\int_{0}^{T}H(f^{\delta}(t))dt$ is uniformly bounded. The Vallée–Poussin theorem [BR07, Theorem 4.5.9] implies that $f^{\delta}$ is uniformly integrable over $[0,T]\times\mathbb{T}^{dN}$ , hence by the Dunford-Pettis theorem [BR07, Theorem 4.7.18], the convergence in (3.8) also holds weakly (again up to subsequences):

	$\displaystyle f^{\delta}\rightarrow f$	$\displaystyle\text{ in }\mathcal{C}([0,T];\mathcal{P}(\mathbb{T}^{dN}))\text{ and weakly in }L^{1}([0,T]\times\mathbb{T}^{dN}),$		(3.10)
		$\displaystyle f_{0}^{\delta}\rightarrow f_{0}\text{ and }f_{T}^{\delta}\rightarrow f_{T}\text{ weakly in }L^{1}(\mathbb{T}^{dN}).$		(3.11)

We now pass to the limit $\delta\rightarrow 0$ in (3.6). By the weak convergence of $f_{0}^{\delta}$ , the weak lower semicontinuity of the relative entropy, Proposition 2.5 and the nonnegativity of $G_{\delta}$ we deduce

-\frac{1}{N}\log\rho^{N}_{T}(A)=-\frac{1}{N}\log\mathbb{E}\bigg{[}e^{-G(X_{T})}\bigg{]}\geq H(f_{0}|\rho^{N}_{0})+\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}|\alpha^{i}(t,\bm{x})|^{2}df_{t}(\bm{x})dt,

(3.12)

hence in order to prove (3.1), it suffices to show that $(f,\alpha)$ is an admissible candidate for the infimum on its right hand side.

We will start by showing that $f_{T}(A)=1$ or, equivalently, $f_{T}(A^{c})=0$ . Indeed, the family of sets $A_{\delta}=\{x\in\mathbb{T}^{dN}:G_{\delta}(x)\geq\frac{2}{\delta}\}$ is increasing and converges to $A^{c}$ as $\delta\rightarrow 0$ . We have by Markov’s inequality $f_{T}^{\delta}(A_{\delta})\leq\frac{\delta}{2}\int_{\mathbb{T}^{dN}}G_{\delta}(\bm{x})df_{T}^{\delta}(\bm{x})$ , thus $\lim_{\delta\rightarrow 0}f_{T}^{\delta}(A_{\delta})\leq 0,$ because $\int_{\mathbb{T}^{dN}}G_{\delta}(\bm{x})df_{T}^{\delta}(\bm{x})$ is uniformly bounded. But since $A_{\delta}$ is increasing and $f_{T}^{\delta}$ converges weakly in $L^{1}$ to $f_{T}$ , Lemma 2.8 and the last inequality imply $f_{T}(A^{c})=0$ , which is what we wanted.

We now prove that $f$ is an $(\bm{\alpha},0,K)-$ entropy solution. It is straightforward to check that the limit $(f,\bm{\alpha})$ satisfies (1.7) in the weak sense. Thus, it is an admissible candidate for the first infimum in (3.3), hence

	$\displaystyle H(f_{0}\|\rho^{N}_{0})+\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}$	$\displaystyle\|\alpha^{i}(t,\bm{x})\|^{2}df_{t}(\bm{x})dt+\int_{\mathbb{T}^{dN}}G_{\delta}(\bm{x})df_{T}(\bm{x})+\delta$
		$\displaystyle\geq H(f_{0}^{\delta}\|\rho^{N}_{0})+\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,\delta}(t,\bm{x})\|^{2}df^{\delta}_{t}(\bm{x})dt+\int_{\mathbb{T}^{dN}}G_{\delta}(\bm{x})df^{\delta}_{T}(\bm{x})$

We pass to the limit as $\delta\rightarrow 0$ and by the weak lower semi-continuity of the entropy, Proposition 2.5 and the fact that $f_{T}$ is supported on $A$ , we get

	$\displaystyle H(f_{0}\|\rho^{N}_{0})+\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}$	$\displaystyle\int_{\mathbb{T}^{dN}}\|\alpha^{i}(t,\bm{x})\|^{2}df_{t}(\bm{x})dt\geq\liminf_{\delta\rightarrow 0}\bigg{(}H(f_{0}^{\delta}\|\rho^{N}_{0})+\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,\delta}(t,\bm{x})\|^{2}df^{\delta}_{t}(\bm{x})dt\bigg{)}$
		$\displaystyle\geq\liminf_{\delta\rightarrow 0}H(f_{0}^{\delta}\|\rho^{N}_{0})+\liminf_{\delta\rightarrow 0}\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,\delta}(t,\bm{x})\|^{2}df^{\delta}_{t}(\bm{x})dt$
		$\displaystyle\geq H(f_{0}\|\rho^{N}_{0})+\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i}(t,\bm{x})\|^{2}df_{t}(\bm{x})dt.$

We deduce that

	$\displaystyle\liminf_{\delta\rightarrow 0}H(f_{0}^{\delta}\|\rho^{N}_{0})$	$\displaystyle=H(f_{0}\|\rho^{N}_{0})$		(3.13)
	$\displaystyle\liminf_{\delta\rightarrow 0}\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,\delta}(t,\bm{x})\|^{2}df^{\delta}_{t}(\bm{x})dt$	$\displaystyle=\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i}(t,\bm{x})\|^{2}df_{t}(\bm{x})dt.$		(3.14)

Since $f^{\delta}$ is a smooth $(\bm{\alpha}^{\delta},K)$ -entropy solution, (2.1) holds for every $t\in[0,T]$ and can be rewritten as

$\displaystyle H(f_{0}^{\delta})+\frac{1}{2}\sum_{i=1}^{N}$	$\displaystyle\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,\delta}(s,\bm{x})\|^{2}df_{s}^{\delta}(\bm{x})ds+\frac{1}{N}\sum_{i,j=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}V(x^{i}-x^{j})\cdot D_{x^{i}}f^{\delta}(s,\bm{x})d\bm{x}ds$
	$\displaystyle\geq\frac{1}{2}\sum_{i=1}^{N}\int_{t}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,\delta}(s,\bm{x})\|^{2}df_{s}^{\delta}(\bm{x})ds+\frac{1}{2}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\bigg{\|}\alpha^{i,\delta}(s,\bm{x})-\frac{D_{x^{i}}f^{\delta}}{f^{\delta}}\bigg{\|}^{2}df^{\delta}_{s}(\bm{x})ds$
	$\displaystyle\quad+H(f^{\delta}_{t})+\frac{1}{2}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\frac{\|D_{x^{i}}f^{\delta}\|^{2}}{f^{\delta}}d\bm{x}ds.$	(3.15)

We observe that since $H(f_{t}^{\delta})$ is uniformly bounded, by passing to a further subsequence if necessary, we get $f_{t}^{\delta}\rightarrow f_{t}$ weakly in $L^{1}(\mathbb{T}^{dN})$ . Due to the lower semi-continuity of the entropy, Proposition 2.5 and the remark after Proposition 2.5, we get that the $\liminf_{\delta\rightarrow 0}$ of the right hand side of (3.15) is at least

	$\displaystyle H(f_{t})+$	$\displaystyle\frac{1}{2}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\bigg{\|}\alpha^{i}(s,\bm{x})-\frac{D_{x^{i}}f}{f}\bigg{\|}^{2}df_{s}(\bm{x})ds$
		$\displaystyle+\frac{1}{2}\sum_{i=1}^{N}\int_{t}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i}(s,\bm{x})\|^{2}df_{s}(\bm{x})ds+\frac{1}{2}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\frac{\|D_{x^{i}}f\|^{2}}{f}d\bm{x}ds.$		(3.16)

On the other hand, because of the convergences (3.9), (3.13) and (3.14), the left hand side of (3.15) converges to

H(f_{0})+\frac{1}{2}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}|\alpha^{i}(s,\bm{x})|^{2}df_{s}(\bm{x})ds+\frac{1}{N}\sum_{i,j=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}V(x^{i}-x^{j})\cdot D_{x^{i}}f(s,\bm{x})d\bm{x}ds,

(3.17)

so that $(f,\bm{\alpha})$ satisfies (2.1) for each $t\in[0,T]$ .

Step 3 ( $K\in L^{\infty}$ ). Assume now that $K\in L^{\infty}$ . We consider $K^{r}$ to be a family of mollifications of $K$ . For any $r>0$ , by step 2, (3.1) holds when $K=K^{r}$ . We denote by $V_{r}$ the right-hand side of (3.1) with $K$ replaced by $K^{r}$ ; so that $-\frac{1}{N}\log\rho_{T}^{N,r}(A)\geq V_{r}$ . We wish to show that $-\frac{1}{N}\log\rho_{T}^{N}(A)\geq V_{0}$ . Of course if $\rho_{T}^{N}(A)=0$ , this is trivial, so we may assume that $\rho_{T}^{N}(A)>0$ .

Note that since $\liminf_{r\rightarrow 0}\rho_{T}^{N,r}(A)\geq\rho_{T}^{N}(A)>0$ , the set $\{V_{r}|r\in(0,1)\}$ is bounded. Set

\tilde{K}^{i,r}(\bm{x})=\frac{1}{N}\sum_{j\neq i}^{N}K^{r}(x^{i}-x^{j})\text{ and }\tilde{K}^{i}(\bm{x})=\frac{1}{N}\sum_{j\neq i}^{N}K(x^{i}-x^{j}).

Then, for $(\bm{\alpha}^{r},f^{r})$ an $r$ -minimizer for $V_{r}$ we have

	$\displaystyle V_{r}$	$\displaystyle\geq H_{N}(f_{0}^{r}\|\rho_{0}^{N})+\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,r}(t,\bm{x})\|^{2}df_{t}^{r}(\bm{x})dt-r$
		$\displaystyle=H_{N}(f_{0}^{r}\|\rho_{0}^{N})+\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,r}(t,\bm{x})+\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})\|^{2}df_{t}^{r}(\bm{x})dt$
		$\displaystyle\hbox{}\;-\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})\|^{2}df_{t}^{r}(\bm{x})dt-\frac{1}{2N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\alpha^{i,r}(t,\bm{x})\cdot(\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x}))df_{t}^{r}(\bm{x})dt.$

On the one hand, since $V_{r}$ is uniformly bounded, this implies

\sup_{r\in(0,1)}H_{N}(f_{0}^{r}|\rho_{0}^{N})+\frac{1}{4N}\sum_{i=1}^{N}\sup_{r\in(0,1)}\int_{0}^{T}\int_{\mathbb{T}^{dN}}|\alpha^{i,r}(t,\bm{x})|^{2}df_{t}^{r}(\bm{x})dt<\infty.

(3.18)

We note that due to the fact that $f^{r}$ is an $(\bm{\alpha}^{r},K^{r})$ -entropy solution, $\|K^{r}\|_{L^{\infty}}\leq\|K\|_{L^{\infty}}$ and (3.18), an argument as in Step 1 yields

\sup_{t\in[0,T]}\sup_{r\in(0,1)}H(f^{r}_{t})<\infty.

(3.19)

On the other hand, by Remark 2.3, $f^{r}$ is an $(\bm{\alpha}^{r}+\tilde{K}^{r}-\tilde{K},K)$ -entropy solution, therefore

\displaystyle V_{r}\geq V_{0}+\frac{1}{4N}\sum_{i=1}^{N}\bigg{(}\int_{0}^{T}\int_{\mathbb{T}^{dN}}|\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})|^{2}df_{t}^{r}(\bm{x})dt-2\int_{0}^{T}\int_{\mathbb{T}^{dN}}\alpha^{i,r}(t,\bm{x})\cdot(\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x}))df_{t}^{r}(\bm{x})dt\bigg{)}.

(3.20)

To finish the proof, we will show that the integral terms in (3.20) converge to $0$ as $r\rightarrow 0$ . Indeed, for any $M>0$ and $i\in\{1,..,N\}$ we have

	$\displaystyle\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})\|^{2}df_{t}^{r}(\bm{x})dt=$
	$\displaystyle\hskip 85.35826pt\int_{0}^{T}\int_{\{f_{t}^{r}>M\}}\|\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})\|^{2}df_{t}^{r}(\bm{x})dt+\int_{0}^{T}\int_{\{f_{t}^{r}\leq M\}}\|\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})\|^{2}df_{t}^{r}(\bm{x})dt$
	$\displaystyle\leq\bigg{(}\int_{0}^{T}\int_{\{f_{t}^{r}>M\}}\frac{\|\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})\|^{4}}{\log f_{t}^{r}}df_{t}^{r}(\bm{x})dt\bigg{)}^{1/2}\left(\int_{0}^{T}H(f^{r}_{t})dt\right)^{1/2}+M\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})\|^{2}d\bm{x}dt.$

Therefore, there exists a constant $C>0$ depending on the bound provided by (3.19) and $\|K\|_{L^{\infty}}$ such that

\displaystyle\int_{0}^{T}\int_{\mathbb{T}^{dN}}|\tilde{K}^{i,r}(\bm{x})-

\displaystyle\tilde{K}^{i}(\bm{x})|^{2}df_{t}^{r}(\bm{x})dt\leq\frac{C}{\sqrt{\log M}}+MT\int_{\mathbb{T}^{dN}}|\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})|^{2}d\bm{x}

Now $K\in L^{\infty}$ , so $\tilde{K}^{i,r}\xrightarrow{r\rightarrow 0}\tilde{K}^{i}$ in $L^{2}$ , hence by passing to the limit we discover

0\leq\limsup_{r\rightarrow 0}\int_{0}^{T}\int_{\mathbb{T}^{dN}}|\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})|^{2}df_{t}^{r}(\bm{x})dt\leq\frac{C}{\sqrt{\log M}}

and since $M$ was arbitrary, we get

\lim_{r\rightarrow 0}\int_{0}^{T}\int_{\mathbb{T}^{dN}}|\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})|^{2}df_{t}^{r}(\bm{x})dt=0.

(3.21)

We also have by Cauchy-Schwartz

	$\displaystyle\bigg{\|}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\alpha^{i,r}$	$\displaystyle(t,\bm{x})\cdot(\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x}))df_{t}^{r}(\bm{x})dt\bigg{\|}$
		$\displaystyle\leq\bigg{(}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,r}(t,\bm{x})\|^{2}df_{t}^{r}(\bm{x})dt\bigg{)}^{1/2}\bigg{(}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})\|^{2}df_{t}^{r}(\bm{x})dt\bigg{)}^{1/2}.$

By (3.18) and (3.21) we get

\lim_{r\rightarrow 0}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\alpha^{i,r}(t,\bm{x})\cdot(\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x}))df_{t}^{r}(\bm{x})dt=0.

(3.22)

Now we send $r$ to $0$ in (3.20) and use (3.21), (3.22) to to conclude that

\displaystyle-\frac{1}{N}\log\rho^{N}_{T}(A)\geq-\frac{1}{N}\limsup_{r\rightarrow 0}\log\rho_{T}^{N,r}(A)\geq\limsup_{r\rightarrow 0}V_{r}\geq V_{0}.

∎

Proof of Theorem 1.1.

The first step will be to reinterpret [FG15, Theorem 2] in a convenient way. If we specialize this result to the torus, we find that for each $p$ there is a constant $C\geq 1$ depending only on $p$ and $d$ such that whenever $X^{1},X^{2},...$ are i.i.d. $\mathbb{T}^{d}$ -valued random variables with common law $m$ , we have $\mathbb{P}\Big{[}{\bf d}_{p}(m_{\bm{X}}^{N},m)>\epsilon\Big{]}\leq C\exp(-\frac{a_{p}(\epsilon)N}{C}).$ In other words,

\displaystyle-\frac{1}{N}\log\mathbb{P}\Big{[}{\bf d}_{p}(m_{\bm{X}}^{N},m)>\epsilon\Big{]}\geq\frac{a_{p}(x)}{C}-\log(C)/N.

Recalling the variational formula (2.6) from Lemma 2.7, we find that the implication

\displaystyle\frac{1}{N}H(Q|m^{\otimes N})<\frac{a_{p}(\epsilon)}{C}-\log(C)/N\implies Q(A^{m,p}_{N,\epsilon})<1

(3.23)

holds for $Q\in\mathcal{P}(\mathbb{T}^{dN})$ , where $A^{m,p}_{N,\epsilon}=\Big{\{}\bm{x}\in(\mathbb{T}^{d})^{N}\Big{|}{\bf d}_{p}(m_{\bm{x}}^{N},m)>\epsilon\Big{\}}$ .

We are now going to combine (3.23) with Proposition 3.1 to complete the proof. Suppose that we have a pair $f^{N}$ and $\bm{\alpha}$ such that $f^{N}$ is an entropy solution of (1.7). Suppose further that

\displaystyle H_{N}(f^{N}_{0}|\overline{\rho}^{N}_{0})+\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{d}}|\alpha^{i}(t,\bm{x})|^{2}df^{N}_{t}(\bm{x})dt<\frac{a_{p}(\epsilon)}{C_{\text{con,p}}}-\frac{\log{C_{\text{con,p}}}}{N}.

holds for some $C_{\text{con,p}}>0$ . By assumption, it follows that

\displaystyle H_{N}(f^{N}_{T}|\overline{\rho}^{N}_{T})\leq C_{\text{ent}}\bigg{(}\frac{1}{N}+a_{p}(\epsilon)/C_{\text{con,p}}-\log C_{\text{con,p}}/N\bigg{)}=\frac{C_{\text{ent}}}{C_{\text{con,p}}}a_{p}(\epsilon)-\frac{C_{\text{ent}}\log C_{\text{con,p}}-C_{\text{ent}}}{N}.

Some simple algebra shows that if $C_{\text{con,p}}>\max\{CC_{\text{ent}},eC\}$ , then (here we use $C_{\text{ent}}\geq 1$ )

\displaystyle\frac{C_{\text{ent}}}{C_{\text{con,p}}}a_{p}(\epsilon)-\frac{C_{\text{ent}}\log C_{\text{con,p}}-C_{\text{ent}}}{N}\leq Ca_{p}(\epsilon)-\log(C)/N.

In particular, setting $C_{\text{con,p}}=eCC_{\text{ent}}$ (i.e. $C_{\text{con,p}}=C_{d}C_{\text{ent}}$ with $C_{\text{con,p}}=eC)$ , and applying (3.23), we find that we have the implication

	$\displaystyle H_{N}(f^{N}_{0}$	$\displaystyle\|\overline{\rho}^{N}_{0})+\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{d}}\|\alpha^{i}(t,\bm{x})\|^{2}df^{N}_{t}(\bm{x})dt<\frac{a_{p}(\epsilon)}{C_{\text{con,p}}}-\log{C_{\text{con,p}}}/N$
		$\displaystyle\implies H_{N}(f^{N}_{T}\|\overline{\rho}^{N}_{T})\leq\frac{a_{p}(\epsilon)}{C}-\log(C)/N\implies f^{N}_{T}(A_{N,\epsilon})<1.$

In light of Proposition 3.1, this implies $-\frac{1}{N}\log\rho_{T}^{N}(A_{N,\epsilon})\geq a_{p}(\epsilon)/C_{\text{con,p}}-\log(C_{\text{con,p}})/N$ , or in other words $\rho_{T}^{N}(A_{N,\epsilon})\leq C_{\text{con,p}}\exp(-\frac{\epsilon^{d}N}{C_{\text{con,p}}}).$ ∎

3.2. Proof of Proposition 1.6

We begin by establishing the well-posedness of (1.3).

Proposition 3.2.

Suppose Assumption 1.5 holds. Then (1.3) admits a unique classical solution, which satisfies

\displaystyle\|\overline{\rho}\|_{C^{2,\beta}_{t,x}}\leq C,\quad\inf_{t,x}\overline{\rho}(t,x)\geq C^{-1},

with $C$ depending only $T$ , $d$ , and the constants $C_{0}$ and $\beta$ appearing in Assumption 1.5.

Proof.

The main challenge is to obtain appropriate a-priori estimates, so we explain this point in detail and then quickly sketch the existence and uniqueness part. Thus we assume for the moment that in addition to Assumption 1.5, $K$ is smooth, so that we have a unique classical solution $\overline{\rho}$ . In what follows, $C$ can increase from line to line but can depend freely on the constants indicated in the statement of the proposition, and dependence on other parameters will be clearly indicated, e.g. $C(p)$ indicates a constant which can depend on $p$ as well as the constants appearing in the statement of the proposition. Moreover, we let $\phi$ be a vector field with $\text{div}\phi=\text{div}K$ and $\|\phi\|_{\infty}=\|\text{div}K\|_{-1,\infty}$ . First, we have by integration parts and Young’s inequality

	$\displaystyle\frac{d}{dt}H(\overline{\rho}_{t})$	$\displaystyle=-I(\overline{\rho}_{t})+\int_{\mathbb{T}^{d}}D\overline{\rho}\cdot\big{(}F+\phi*\overline{\rho}\big{)}dx$
		$\displaystyle\leq-\frac{1}{2}I(\overline{\rho}_{t})+\frac{1}{2}\int_{\mathbb{T}^{d}}\|F+\phi*\overline{\rho}\|^{2}\rho dx\leq C(\\|F\\|^{2}_{\infty}+\\|\text{div}K\\|^{2}_{-1,\infty}),$

and so in particular $\int_{0}^{T}I(\overline{\rho}_{t})dt\leq C$ , from which, by Cauchy-Schwartz, it follows that $\|D\rho\|_{L^{1}_{t,x}}\leq C$ . Rewriting (1.3) in non-divergence form as

\displaystyle\partial_{t}\overline{\rho}=\Delta\overline{\rho}-D\overline{\rho}\cdot\big{(}K*\rho\big{)}-\overline{\rho}\phi*D\overline{\rho},

(3.24)

a standard argument using the maximum principle shows that for each $(t,x)$ , we have

\displaystyle\overline{\rho}(t,x)\leq\|\rho_{0}\|_{\infty}\exp\Big{(}\int_{0}^{t}\|\phi*D\overline{\rho}\|_{L_{x}^{\infty}}ds\Big{)}\leq\|\rho_{0}\|_{\infty}\exp\Big{(}\int_{0}^{t}\|\phi\|_{L^{\infty}}\|D\overline{\rho}_{s}\|_{L_{x}^{1}}ds\Big{)}\leq C,

and likewise

\displaystyle\overline{\rho}(t,x)\geq\inf_{x}\rho_{0}(x)\exp\Big{(}-\int_{0}^{t}\|\phi*D\overline{\rho}\|_{L_{x}^{\infty}}ds\Big{)}\geq\inf_{x}\rho_{0}(x)\exp\Big{(}-\int_{0}^{t}\|\phi\|_{L^{\infty}}\|D\overline{\rho}_{s}\|_{L_{x}^{1}}ds\Big{)}\geq C^{-1}.

Thus we have $\|\overline{\rho}\|_{L^{\infty}}\leq C$ , and $\inf_{t,x}\overline{\rho}\geq C^{-1}$ . From here, we view (3.24) as a perturbation of the heat equation, applying the standard Calderon-Zygmund estimates and then the Gagliardo-Nirenberg interpolation inequality to get for any $p<\infty$ ,

	$\displaystyle\left\\|\overline{\rho}\right\\|_{W_{t,x}^{2,p}}$	$\displaystyle\leq C(p)\Big{(}\\|\overline{\rho}_{0}\\|_{W_{x}^{2,p}}+\\|\phi*D\overline{\rho}\\|_{L^{p}_{t,x}}\Big{)}\leq C(p)\Big{(}1+\\|D\overline{\rho}\\|_{L^{p}_{t,x}}\Big{)}$
		$\displaystyle\leq C(p)\Big{(}1+\\|\overline{\rho}\\|_{L_{t,x}^{p}}^{1/2}\\|\overline{\rho}\\|_{W_{t,x}^{2,p}}^{1/2}\Big{)}\leq C(p)+\frac{1}{2}\left\\|\overline{\rho}\right\\|_{W_{t,x}^{2,p}}.$

Thus for any $p<\infty$ , $\left\|\overline{\rho}\right\|_{W_{t,x}^{2,p}}\leq C(p)$ . Choosing a large enough $p$ , we get by Sobolev embeddings $\left\|\overline{\rho}\right\|_{C_{t,x}^{2,\beta}}\leq C$ , and then by again viewing (3.24) as a perturbation of the heat equation, we get by the Schauder estimates

\displaystyle\left\|\overline{\rho}\right\|_{C_{t,x}^{2,\beta}}\leq C\Big{(}\left\|\rho_{0}\right\|_{C^{2,\beta}}+\left\|D\overline{\rho}\right\|_{C_{t,x}^{\beta}}\left\|K*\rho\right\|_{C_{t,x}^{\beta}}+\|\rho\|_{C^{\beta}_{t,x}}\|\phi*D\overline{\rho}\|_{C_{t,x}^{\beta}}\Big{)}\leq C\Big{(}1+\|\overline{\rho}\|_{C_{t,x}^{1,\beta}}^{2}\Big{)}\leq C.

Thus we have established that when $K$ is smooth, the unique classical solution of (1.3) satisfies the estimates stated in the proposition. From this a-priori estimate, a standard mollification and compactness argument can be used to obtain the existence of a solution satisfying the desired bounds when $K$ is not smooth but Assumption 1.5 is in force. Uniqueness of classical solutions, meanwhile, can be easily proved in a number of ways, e.g. given two smooth solutions $\overline{\rho}_{t}^{1}$ and $\overline{\rho}_{t}^{2}$ one can compute $\frac{d}{dt}H(\overline{\rho}_{t}^{1}|\overline{\rho}_{t}^{2})$ and conclude via Grownall’s inequality. We omit the details. ∎

The proof of Proposition 1.6 also requires the following lemma, which is an easy extension of the main quantitative estimate of [JW18] to the “controlled” Liouivlle equation (1.7).

Lemma 3.3.

Let Assumption 1.5 hold, let $\overline{\rho}$ be the unique classical solution of (1.3) provided by Proposition 3.2, and let $f$ be an entropy solution of (1.7) in the sense of Definition 2.2. There exists a constant $C_{\text{ent}}$ depending only on $T$ , $d$ , and the constants $\beta$ and $C_{0}$ in Assumption 1.5 such that

H_{N}(f_{T}|\overline{\rho}^{N}_{T})\leq H_{N}(f_{0}|\overline{\rho}^{N}_{0})+C_{\text{ent}}\bigg{(}\frac{1}{N}+\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{d}}|\alpha^{i}(t,\bm{x})|^{2}df_{t}(\bm{x})dt\bigg{)}.

(3.25)

Proof.

The proof follows closely the proof of [JW18, Theorem 1], and so we report only the main difference. The first step is to mimic [JW18, Lemma 2] (here it is crucial that we work with an entropy solution $f$ ) to get

	$\displaystyle H_{N}(f_{t}\|\overline{\rho}^{N}_{t})$	$\displaystyle\leq H_{N}(f_{0}\|\overline{\rho}^{N}_{0})-\frac{1}{N^{2}}\sum_{i,j=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\big{(}K(x^{i}-x^{j})-K*\overline{\rho}(x^{i})\big{)}\cdot D_{x^{i}}\log\overline{\rho}^{N}df_{s}(\bm{x})ds$
		$\displaystyle\qquad-\frac{1}{N^{2}}\sum_{i,j=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\bigg{(}\text{div}_{x^{i}}K(x^{i}-x^{j})-\text{div}_{x^{i}}K*\overline{\rho}(x^{i})\bigg{)}df_{s}(\bm{x})ds$
		$\displaystyle\qquad+\frac{1}{N}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\alpha^{i}\cdot D_{x^{i}}\log\frac{f}{\overline{\rho}^{N}}df_{s}(\bm{x})ds-\frac{1}{N}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\Big{\|}D_{x^{i}}\log\frac{f}{\overline{\rho}^{N}}\Big{\|}^{2}df_{s}(\bm{x})ds.$

Young’s inequality immediately gives

$\displaystyle H_{N}(f_{t}\|\overline{\rho}^{N}_{t})$	$\displaystyle\leq H_{N}(f_{0}\|\overline{\rho}^{N}_{0})-\frac{1}{N^{2}}\sum_{i,j=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\bigg{(}K(x^{i}-x^{j})-K*\overline{\rho}(x^{i})\bigg{)}\cdot D_{x^{i}}\log\overline{\rho}^{N}df_{s}(\bm{x})ds$
	$\displaystyle-\frac{1}{N^{2}}\sum_{i,j=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\bigg{(}\text{div}_{x^{i}}K(x^{i}-x^{j})-\text{div}_{x^{i}}K*\overline{\rho}(x^{i})\bigg{)}df_{s}(\bm{x})ds$
	$\displaystyle-\frac{1}{2N}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\Big{\|}D_{x^{i}}\log\frac{f}{\overline{\rho}^{N}}\Big{\|}^{2}df_{s}(\bm{x})ds+\frac{1}{2N}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\big{\|}\alpha^{i}(t,\bm{x})\big{\|}^{2}df_{s}(\bm{x})ds.$	(3.26)

Notice that up to the last term and the factor $1/2$ appearing in the penultimate term, this is the same inequality appearing in [JW18, Lemma 2]. We now follow exactly the proof of [JW18, Theorem 1], applying Lemmas 3 and 4 of [JW18] (which are easily seen to apply here despite the fact that $f$ satisfies the “perturbed” Liouville equation (1.7) rather than the original Liouville equation (1.2)) to bound the second and third terms appearing on the right-hand side of (3.2). This results in the bound

\displaystyle H_{N}(f_{t}|\overline{\rho}^{N}_{t})\leq H_{N}(f_{0}|\overline{\rho}^{N}_{0})+C\int_{0}^{t}\bigg{(}H_{N}(f_{s}|\overline{\rho}^{N}_{s})+\frac{1}{N}\bigg{)}ds+\frac{1}{N}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\big{|}\alpha^{i}(s,\bm{x})\big{|}^{2}df_{s}(\bm{x})ds,

with $C$ depending only on the constants indicated in the lemma. An application of Gronwall’s inequality completes the proof. ∎

Proof of Proposition 1.6.

The existence of an admissible entropy solution is proved already in [JW18, Proposition 1]. Let $\rho^{N,\delta}$ be the unique classical solutions of (1.2) with $K_{\delta}$ replacing $K$ , and $\overline{\rho}^{\delta}$ be the unique classical solution of (1.3) with $K_{\delta}$ replacing $K$ . By Theorem 1.1 and Lemma 3.3, we have

\displaystyle\rho^{N,\delta}_{t}(A^{p,\delta}_{N,\epsilon})\leq C_{\text{con,p}}\exp(-C_{\text{con,p}}^{-1}a_{p}(\epsilon)N),\text{ where }A^{p,\delta}_{N,\epsilon}=\bigg{\{}x\in\mathbb{T}^{dN}\bigg{|}{\bf d}_{p}(m^{N}_{\bm{x}},(\overline{\rho}^{\delta})^{\otimes N}_{t})>\epsilon\bigg{\}},

(3.27)

with $C_{\text{con,p}}$ depending only on the constants stated in the proposition (here we use the fact that $\|K_{\delta}\|_{W^{-1,\infty}}\leq\|K\|_{W^{-1,\infty}}$ and $\|\text{div}\,K_{\delta}\|_{W^{-1,\infty}}\leq\|\text{div}\,K\|_{W^{-1,\infty}}$ ). Let $\delta_{k}$ be the sequence appearing in the definition of admissible entropy solution. Notice that by the uniqueness part of Proposition 3.2, we must have $\overline{\rho}^{\delta_{k}}_{t}\to\overline{\rho}_{t}$ for each fixed $t$ . Notice also that for $k$ large enough, we will have $A_{N,\epsilon}^{p,\delta_{k}}\supset A^{p}_{N,2\epsilon}$ , so that

\displaystyle\rho^{N}_{T}(A^{p}_{N,2\epsilon})\leq\liminf_{k\to\infty}\rho^{N,\delta_{k}}_{T}(A^{p}_{N,2\epsilon})\leq\liminf_{k\to\infty}\rho^{N,\delta_{k}}_{T}(A_{N,\epsilon}^{p,\delta_{k}})\leq C_{\text{con,p}}\exp(-C_{\text{con,p}}^{-1}a_{p}(\epsilon)N),

which, after replacing $C_{\text{con,p}}$ by $2C_{\text{con,p}}$ , completes the proof. ∎

References

[AFP00] Luigi Ambrosio, Nicola Fusco, and Diego Pallara. Functions of bounded variation and free discontinuity problems. Oxford university press, 2000.
[BD98] Michelle Boué and Paul Dupuis. A variational representation for certain functionals of brownian motion. The Annals of Probability, 26(4):1641–1659, 1998.
[BGV05] François Bolley, Arnaud Guillin, and Cédric Villani. Quantitative concentration inequalities for empirical measures on non-compact spaces. Probability Theory and Related Fields, 137:541–593, 2005.
[BJW20] Didier Bresch, Pierre-Emmanuel Jabin, and Zhenfu Wang. Modulated free energy and mean field limit. Séminaire Laurent Schwartz — EDP et applications, pages 1–22, 2019-2020. talk:2.
[BJW23] Didier Bresch, Pierre-Emmanuel Jabin, and Zhenfu Wang. Mean field limit and quantitative estimates with singular attractive kernels. Duke Mathematical Journal, 172(13):2591 – 2641, 2023.
[BR07] Vladimir Igorevich Bogachev and Maria Aparecida Soares Ruas. Measure theory, volume 1. Springer, 2007.
[BS13] Gerard Brunick and Steven Shreve. Mimicking an itô process by a solution of a stochastic differential equation. 2013.
[CD22] Louis-Pierre Chaintron and Antoine Diez. Propagation of chaos: A review of models, methods and applications. i. models and methods, 2022.
[CdCRS23] Antonin Chodron de Courcel, Matthew Rosenzweig, and Sylvia Serfaty. The attractive log gas: stability, uniqueness, and propagation of chaos, 2023.
[CdCRS23] Antonin Chodron de Courcel, Matthew Rosenzweig, and Sylvia Serfaty. Sharp uniform-in-time mean-field convergence for singular periodic riesz flows. Ann. Inst. H. Poincaré C Anal. Non Linéaire, 223.
[Dau23] Samuel Daudin. Optimal control of the fokker-planck equation under state constraints in the wasserstein space. Journal de Mathématiques Pures et Appliquées, 175:37–75, 2023.
[DE11] Paul Dupuis and Richard S Ellis. A weak convergence approach to the theory of large deviations. John Wiley & Sons, 2011.
[DLR18] François Delarue, Daniel Lacker, and Kavita Ramanan. From the master equation to mean field game limit theory: Large deviations and concentration of measure. The Annals of Probability, 2018.
[FG15] Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Related Fields, 162(3-4):707–738, 2015.
[FW23] Xuanrui Feng and Zhenfu Wang. uantitative propagation of chaos for 2d viscous vortex model on the whole space, 2023.
[GBM24] Arnaud Guillin, Pierre Bris, and Pierre Monmarché. Uniform in time propagation of chaos for the 2d vortex model and other singular stochastic systems. Journal of the European Mathematical Society, pages 1–28, 01 2024.
[HC23] Elias Hess-Childs. Large deviation principles for singular Riesz-type diffusive flows, 2023.
[JW16] Pierre-Emmanuel Jabin and Zhenfu Wang. Mean field limit and propagation of chaos for vlasov systems with bounded forces. Journal of Functional Analysis, 271(12):3588–3627, 2016.
[JW17] Pierre-Emmanuel Jabin and Zhenfu Wang. Mean Field Limit for Stochastic Particle Systems, volume 1. 04 2017.
[JW18] Pierre-Emmanuel Jabin and Zhenfu Wang. Quantitative estimates of propagation of chaos for stochastic systems with $w^{-1,\infty}$ kernels. Inventiones mathematicae, 214:523–591, 2018.
[Lac23] Daniel Lacker. Hierarchies, entropy, and quantitative propagation of chaos for mean field diffusions. Probability and Mathematical Physics, 4(2):377––432, 2023.
[McK69] Henry P. McKean. Propagation of chaos for a class of non-linear parabolic equations. In A. K. Aziz, editor, Lecture Series in Differential Equations, Volume 2, pages 177––194, Berlin, Heidelberg, 1969. Van Nostrand Reinhold Company.
[RS23] Matthew Rosenzweig and Sylvia Serfaty. Global-in-time mean-field convergence for singular Riesz-type diffusive flows. The Annals of Applied Probability, 33(2):954 – 998, 2023.
[Ser20] Sylvia Serfaty. Mean field limit for Coulomb-type flows. Duke Mathematical Journal, 169(15):2887 – 2935, 2020.
[Szn91] Alain-Sol Sznitman. Topics in propagation of chaos. In Paul-Louis Hennequin, editor, Ecole d’Eté de Probabilités de Saint-Flour XIX — 1989, pages 165–251, Berlin, Heidelberg, 1991. Springer Berlin Heidelberg.

	$\displaystyle H(f_{0}\|\rho^{N}_{0})+\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}$	$\displaystyle\int_{\mathbb{T}^{dN}}\|\alpha^{i}(t,\bm{x})\|^{2}df_{t}(\bm{x})dt\geq\liminf_{\delta\rightarrow 0}\bigg{(}H(f_{0}^{\delta}\|\rho^{N}_{0})+\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,\delta}(t,\bm{x})\|^{2}df^{\delta}_{t}(\bm{x})dt\bigg{)}$
		$\displaystyle\geq\liminf_{\delta\rightarrow 0}H(f_{0}^{\delta}\|\rho^{N}_{0})+\liminf_{\delta\rightarrow 0}\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,\delta}(t,\bm{x})\|^{2}df^{\delta}_{t}(\bm{x})dt$
		$\displaystyle\geq H(f_{0}\|\rho^{N}_{0})+\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i}(t,\bm{x})\|^{2}df_{t}(\bm{x})dt.$

	$\displaystyle\liminf_{\delta\rightarrow 0}H(f_{0}^{\delta}\|\rho^{N}_{0})$	$\displaystyle=H(f_{0}\|\rho^{N}_{0})$		(3.13)
	$\displaystyle\liminf_{\delta\rightarrow 0}\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,\delta}(t,\bm{x})\|^{2}df^{\delta}_{t}(\bm{x})dt$	$\displaystyle=\frac{1}{4}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i}(t,\bm{x})\|^{2}df_{t}(\bm{x})dt.$		(3.14)

$\displaystyle H(f_{0}^{\delta})+\frac{1}{2}\sum_{i=1}^{N}$	$\displaystyle\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,\delta}(s,\bm{x})\|^{2}df_{s}^{\delta}(\bm{x})ds+\frac{1}{N}\sum_{i,j=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}V(x^{i}-x^{j})\cdot D_{x^{i}}f^{\delta}(s,\bm{x})d\bm{x}ds$
	$\displaystyle\geq\frac{1}{2}\sum_{i=1}^{N}\int_{t}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,\delta}(s,\bm{x})\|^{2}df_{s}^{\delta}(\bm{x})ds+\frac{1}{2}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\bigg{\|}\alpha^{i,\delta}(s,\bm{x})-\frac{D_{x^{i}}f^{\delta}}{f^{\delta}}\bigg{\|}^{2}df^{\delta}_{s}(\bm{x})ds$
	$\displaystyle\quad+H(f^{\delta}_{t})+\frac{1}{2}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\frac{\|D_{x^{i}}f^{\delta}\|^{2}}{f^{\delta}}d\bm{x}ds.$	(3.15)

	$\displaystyle V_{r}$	$\displaystyle\geq H_{N}(f_{0}^{r}\|\rho_{0}^{N})+\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,r}(t,\bm{x})\|^{2}df_{t}^{r}(\bm{x})dt-r$
		$\displaystyle=H_{N}(f_{0}^{r}\|\rho_{0}^{N})+\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\alpha^{i,r}(t,\bm{x})+\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})\|^{2}df_{t}^{r}(\bm{x})dt$
		$\displaystyle\hbox{}\;-\frac{1}{4N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\|\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x})\|^{2}df_{t}^{r}(\bm{x})dt-\frac{1}{2N}\sum_{i=1}^{N}\int_{0}^{T}\int_{\mathbb{T}^{dN}}\alpha^{i,r}(t,\bm{x})\cdot(\tilde{K}^{i,r}(\bm{x})-\tilde{K}^{i}(\bm{x}))df_{t}^{r}(\bm{x})dt.$

	$\displaystyle H_{N}(f_{t}\|\overline{\rho}^{N}_{t})$	$\displaystyle\leq H_{N}(f_{0}\|\overline{\rho}^{N}_{0})-\frac{1}{N^{2}}\sum_{i,j=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\big{(}K(x^{i}-x^{j})-K*\overline{\rho}(x^{i})\big{)}\cdot D_{x^{i}}\log\overline{\rho}^{N}df_{s}(\bm{x})ds$
		$\displaystyle\qquad-\frac{1}{N^{2}}\sum_{i,j=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\bigg{(}\text{div}_{x^{i}}K(x^{i}-x^{j})-\text{div}_{x^{i}}K*\overline{\rho}(x^{i})\bigg{)}df_{s}(\bm{x})ds$
		$\displaystyle\qquad+\frac{1}{N}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\alpha^{i}\cdot D_{x^{i}}\log\frac{f}{\overline{\rho}^{N}}df_{s}(\bm{x})ds-\frac{1}{N}\sum_{i=1}^{N}\int_{0}^{t}\int_{\mathbb{T}^{dN}}\Big{\|}D_{x^{i}}\log\frac{f}{\overline{\rho}^{N}}\Big{\|}^{2}df_{s}(\bm{x})ds.$