Weak semiconvexity estimates for Schrödinger potentials and logarithmic Sobolev inequality for Schrödinger bridges

Giovanni Conforti CMAP, CNRS, Ecole polytechnique, Institut Polytechnique de Paris, 91120 Palaiseau, France E-mail address: giovanni.conforti@polytechnique.edu. Research supported by the ANR project ANR-20-CE40-0014.

Abstract

We investigate the quadratic Schrödinger bridge problem, a.k.a. Entropic Optimal Transport problem, and obtain weak semiconvexity and semiconcavity bounds on Schrödinger potentials under mild assumptions on the marginals that are substantially weaker than log-concavity. We deduce from these estimates that Schrödinger bridges satisfy a logarithmic Sobolev inequality on the product space. Our proof strategy is based on a second order analysis of coupling by reflection on the characteristics of the Hamilton-Jacobi-Bellman equation that reveals the existence of new classes of invariant functions for the corresponding flow.

Mathematics Subject Classification (2020)

49Q22,49L12,35G50,60J60,39B62

1 Introduction and statement of the main results

The Schrödinger problem [38] (SP) is a statistical mechanics problem that consists in finding the most likely evolution of a cloud of independent Brownian particles conditionally to observations. Also known as Entopic Optimal Transport (EOT) problem and formulated with the help of large deviations theory as a constrained entropy minimization problem, it stands nowadays at the cross of several research lines ranging from functional inequalities [13, 25], statistical machine learning [15, 37], control engineering [9, 10], and numerics for PDEs [5, 4]. Given two probability distributions $\mu,\nu$ on $\mathbb{R}^{d}$ , the corresponding (quadratic) Schrödinger problem is

\inf_{\pi\in\Pi(\mu,\nu)}\mathcal{H}(\pi|R_{0T}),

(1)

where $\Pi(\mu,\nu)$ represents the set of couplings of $\mu$ and $\nu$ and $\mathcal{H}(\pi|R_{0T})$ is the relative entropy of a coupling $\pi$ computed against the joint law $R_{0T}$ at times $0$ and $T$ of a Brownian motion with initial law $\mu$ . It is well known that under mild conditions on the marginals, the optimal coupling $\hat{\pi}$ , called (static) Schrödinger bridge, is unique and admits the representation

\hat{\pi}(\mathrm{d}x\,\mathrm{d}y)=\exp(-\varphi(x)-\psi(y))\exp\Big{(}-\frac{|x-y|^{2}}{2T}\Big{)}\mathrm{d}x\mathrm{d}y

(2)

where $\varphi,\psi$ are two functions, known as Schrödinger potentials [31] that can be regarded as proxies for the Brenier potentials of optimal transport, that are recovered in the short-time ( $T\rightarrow 0$ ) limit [35, 12]. In this article we seek for convexity and concavity estimates for Schrödinger potentials. Such estimates have been recently established in [11] and [24] working under a set of assumptions that implies in particular log-concavity of at least one of the two marginals. Such assumption is crucial therein as it allows to profit from classical functional inequalities such as Prékopa-Leindler inequality and Brascamp-Lieb inequality. In particular, the estimates obtained in the above-mentioned works yield alternative proofs of Caffarelli’s contraction Theorem [8] in the short-time limit. The purpose of this work is twofold: in the first place we show at Theorem 1.2 that, for any fixed $T>0$ it is possible to leverage the probabilistic interpretation of (1) to establish lower and upper bounds on the functions

\langle\nabla\varphi(x)-\nabla\varphi(y),x-y\rangle\quad\text{and}\quad\langle\nabla\psi(x)-\nabla\psi(y),x-y\rangle

that are valid for all $x,y\in\mathbb{R}^{d}$ and do not require strict log-concavity of the marginals to hold, but still allow to recover the results of [11] as a special case. The second main contribution is to apply these bounds to prove that static Schrödinger bridges satisfy the logarithmic Sobolev inequality (LSI for short) at Theorem 1.3. In our main results we shall quantify the weak semiconvexity of a potential $U:\mathbb{R}^{d}\longrightarrow\mathbb{R}$ appealing to the function $\kappa_{U}$ , defined as follows:

\kappa_{U}:(0,+\infty)\longrightarrow\mathbb{R},\quad\kappa_{U}(r)=\inf\{|x-y|^{-2}\langle\nabla U(x)-\nabla U(y),x-y\rangle:|x-y|=r\}.

(3)

$\kappa_{U}(r)$ may be regarded as an averaged or integrated convexity lower bound for $U$ for points that are at distance $r$ . This function is often encountered in applications of the coupling method to the study of the long time behavior of Fokker-Planck equations [22, 32]. Obviously $\kappa_{U}\geq 0$ is equivalent to the convexity of $U$ , but working with non-uniform lower bounds on $\kappa_{U}$ allows to design efficient generalizations of the classical notion of convexity. A commonly encountered sufficient condition on $\kappa_{U}$ ensuring the exponential trend to equilibrium of the Fokker-Planck equation

\partial_{t}\mu_{t}-\frac{1}{2}\Delta\mu_{t}-\nabla\cdot\big{(}\nabla U\,\mu_{t}\big{)}=0

is the following

\kappa_{U}(r)\geq\begin{cases}\alpha,\quad&\mbox{if $r>R$,}\\ \alpha-L^{\prime},\quad&\mbox{if $r\leq R$,}\end{cases}

(4)

for some $\alpha>0,L^{\prime},R\geq 0.$ In this work, we refer to assumptions of the form (4) and variants thereof as to weak convexity assumptions and our main result require an assumption of this kind, namely (6) below, that is shown to be no more demanding than (4) (see Proposition 5.1), and is expressed through a rescaled version of the hyperbolic tangent function. These functions play a special role in this work since, as we show at Theorem 2.1, they define a weak convexity property that propagates backward along the flow of the Hamilton-Jacobi-Bellman (HJB) equation

\partial_{t}\varphi_{t}+\frac{1}{2}\Delta\varphi_{t}-\frac{1}{2}|\nabla\varphi_{t}|^{2}=0.

Such invariance property represents the main innovation in our proof strategy: the propagation of the classical notion of convexity along the HJB equation used in [24] as well as the Brascamp-Lieb inequality employed in [11] are both consequences of the Prékopa-Leindler inequality, see [7]. In the framework considered here, such a powerful tool becomes ineffective due to the possible lack of log-concavity in both marginals. To overcome this obstacle we develop a probabilistic approach based on a second order analysis of coupling by reflection on the solutions of the SDE

\mathrm{d}X_{t}=-\nabla\varphi_{t}(X_{t})\mathrm{d}t+\mathrm{d}B_{t},

also known as characteristics of the HJB equation, revealing the existence of novel classes of weakly convex functions that are invariant for the HJB flow. This property, besides being a key ingredient in the proof of Theorem 1.2 is interesting on its own, and can be generalized in several directions. Remarkably, Theorem 1.2 can be aplied to show that Schrödinger bridges satisfy LSI: this is not a trivial task since static Schrödinger bridges are not known to be log-concave probability measures in general, not even in the case when both marginals are strongly log-concave. For this reason, one cannot infer LSI directly from Theorem 1.2 and the Bakry-Émery criterion. However, reintroducing a dynamical viewpoint and representing Schrödinger bridges as Doob $h$ -transforms of Brownian motion [21] reveals all the effectiveness of Theorem 1.2 that gives at once gradient estimates and local (or conditional, or heat kernel) logarithmic Sobolev inequalities and gradient estimates for the $h$ -transform semigroup. By carefully mixing the local inequalities with the help of gradient estimates, we finally establish at Theorem 1.3 LSI for $\hat{\pi}$ , that is our second main contribution. It is worth noticing that in the $T\rightarrow+\infty$ asymptotic regime, our approach to LSI can be related to the techniques recently developed in [34] to construct Lipschitz transports between the Gaussian distribution and probability measures that are approximately log-concave in a suitable sense. Because of the intrinsic probabilistic nature of our proof strategy, our ability to compensate for the lack of log-concavity in the marginals depends on the size of the regularization parameter $T$ , and indeed vanishes as $T\rightarrow 0$ . Thus, our main results do not yield any sensible convexity/concavity estimate on Brenier potentials that improves on Caffarelli’s Theorem. On the other hand, the semiconvexity bounds of Theorem 1.2 find applications beyond LSI, that we shall address in future works. For example, following classical arguments put forward in [20], they can be shown to imply transport-entropy (a.k.a. Talagrand) inequalities on path space for dynamic Schrödinger bridges. Moreover, building on the results of [13], they shall imply new semiconvexity estimates for the Fisher information along entropic interpolations. It is also natural to conjecture that these bounds will provide with new stability estimates for Schrödinger bridges under marginal perturbations, thus addressing a question that has recently drawn quite some attention, see [19, 12, 23, 26, 3] for example. Finally, we point out that Hessian bounds for potentials can play a relevant role in providing theoretical guarantees for learning algorithms that make use of dynamic Schrödinger bridges and conditional processes. In this framework, leveraging Doob’s $h$ -transform theory and time reversal arguments, they directly translate into various kinds of quantitative stability estimates for the diffusion processes used for sampling, see e.g. [18, 17, 39].

Organization

The document is organized as follows. In remainder of the first section we state and discuss our main hypothesis and results. In Section 2 we study invariant sets for the HJB flow. Sections 3 and 4 are devoted to the proof of our two main results, Theorem 1.2 and Theorem 1.3. Technical results and background material are collected in the Appendix section.

Assumption 1.1.

We assume that $\mu,\nu$ admit a positive density against the Lebesgue measure which can be written in the form $\exp(-U^{\mu})$ and $\exp(-U^{\nu})$ respectively. $U^{\mu},U^{\nu}$ are of class $C^{2}(\mathbb{R}^{d})$ .

$(\mathbf{H1})$

$\mu$ has finite second moment and finite relative entropy against the Lebsegue measure. Moreover, there exists $\beta_{\mu}>0$ such that

$\langle v,\nabla^{2}U^{\mu}(x)v\rangle\leq\beta_{\mu}|v|^{2}\quad\forall x,v\in\mathbb{R}^{d}.$ (5)

One of the following holds

$(\mathbf{H2})$

There exist $\alpha_{\nu},L>0$ such that

$\kappa_{U^{\nu}}(r)\geq\alpha_{\nu}-r^{-1}f_{L}(r)\quad\forall r>0,$ (6)

where the function $f_{L}$ is defined for any $L>0$ by:

$f_{L}:[0,+\infty]\longrightarrow[0,+\infty],\quad{f_{L}(r)=2\,L^{1/2}\tanh\Big{(}(rL^{1/2})/2\Big{)}}.$
$(\mathbf{H2^{\prime}})$

There exist $\alpha_{\nu}>0,R,L^{\prime}\geq 0$ such that

$\kappa_{U^{\nu}}(r)\geq\begin{cases}\alpha_{\nu},\quad&\mbox{if $r>R$,}\\ \alpha_{\nu}-L^{\prime},\quad&\mbox{if $r\leq R$.}\end{cases}$

In this case, we set

$L=\inf\{\bar{L}:R^{-1}f_{\bar{L}}(R)\geq L^{\prime}\}.$ (7)

Clearly, imposing (6) is less restrictive than asking that $\nu$ is strongly log-concave.

Remark 1.1.

We show that $(\mathbf{H2^{\prime}})$ implies $(\mathbf{H2})$ at Proposition 5.1. However, since $(\mathbf{H2^{\prime}})$ is more familiar to most readers we prefer to keep a statement Theorem 1.2 that makes use of this assumption.

Remark 1.2.

The requirement that the density of $\nu$ is strictly positive everywhere could be dropped at the price of additional technicalities. For $\mu$ , such requirement is a consequence of (5).

The Schrödinger system

Let $(P_{t})_{t\geq 0}$ the semigroup generated by a $d$ -dimensional standard Brownian motion. For given marginals, $\mu,\nu$ and $T>0$ the Schrödinger system is the following system of coupled non-linear equations

\begin{cases}\varphi(x)=U^{\mu}(x)+\log P_{T}\exp(-\psi)(x),\quad x\in\mathbb{R}^{d},\\ \psi(y)=U^{\nu}(y)+\log P_{T}\exp(-\varphi)(y),\quad y\in\mathbb{R}^{d}.\end{cases}

(8)

Under Assumption 1.1, it is known that the Schrödinger system admits a solution $(\varphi,\psi)$ , and that if $(\bar{\varphi},\bar{\psi})$ is another solution, then there exists $c\in\mathbb{R}$ such that $(\varphi,\psi)=(\bar{\varphi}+c,\bar{\psi}-c)$ , see [35, sec. 2][31] and references therein. The potentials $\varphi,\psi$ are known as Schrödinger potentials or entropic Brenier potentials in the literature.

Weak semiconvexity and semiconcavity bounds for Schrödinger potentials

In the rest of the article, given a scalar function $U$ , any pointwise lower bound on $\kappa_{U}$ implying in particular that

\liminf_{r\rightarrow+\infty}\kappa_{U}(r)>-\infty

shall be called a weak semiconvexity bound for $U$ . Next, in analogy with (3) we introduce for a differentiable $U:\mathbb{R}^{d}\longrightarrow\mathbb{R}$ the function $\ell_{U}$ as follows:

\ell_{U}:(0,+\infty)\longrightarrow\mathbb{R},\quad\ell_{U}(r)=\sup\{|x-y|^{-2}\langle\nabla U(x)-\nabla U(y),x-y\rangle:|x-y|=r\},

and call a weak semiconcavity bound for $U$ any pointwise upper bound for $\ell_{U}$ implying in particular that

\limsup_{r\rightarrow+\infty}\ell_{U}(r)<+\infty.

Our first main result is about weak semiconvexity and weak semiconcavity bounds for Schrödinger potentials.

Theorem 1.2.

Let Assumption 1.1 hold and $(\varphi,\psi)$ be solutions of the Schrödinger system. Then $\varphi,\psi$ are twice differentiable and for all $r>0$ we have

\kappa_{\psi}(r)\geq\alpha_{\psi}-r^{-1}f_{L}(r),

(9)

\ell_{\varphi}(r)\leq\beta_{\mu}-\frac{\alpha_{\psi}}{(1+T\alpha_{\psi})}+\frac{r^{-1}f_{L}(r)}{(1+T\alpha_{\psi})^{2}},

(10)

where $\alpha_{\psi}>\alpha_{\nu}-1/T$ can be taken to be the smallest solution of the fixed point equation

\alpha=\alpha_{\nu}-\frac{1}{T}+\frac{G(\alpha,2)}{2T^{2}},\quad\alpha\in(\alpha_{\nu}-1/T,+\infty)

(11)

where for all $\alpha\geq\alpha_{\nu}-1/T$ :

\begin{split}G(\alpha,u)&=\inf\{s\geq 0:F(\alpha,s)\geq u\},\quad u>0,\\ F(\alpha,s)&=\beta_{\mu}s+\frac{s}{T(1+T\alpha)}+\frac{s^{1/2}f_{L}(s^{1/2})}{(1+T\alpha)^{2}},\quad s>0.\end{split}

(12)

There seems to be no closed form expression for the solutions of the fixed point equation (11). However, it is possible to obtain explicit non trivial upper and lower bounds.

Corollary 1.1.

Let $\bar{\alpha}$ be a fixed point solution of (11). Then we have

\frac{\alpha_{\nu}}{2}-\frac{1}{T}-\frac{L}{2T^{2}\alpha_{\nu}\beta_{\mu}}+\frac{1}{2}\sqrt{\big{(}\alpha_{\nu}+\frac{L}{T^{2}\beta_{\mu}\alpha_{\nu}}\big{)}^{2}+\frac{4\alpha_{\nu}}{T^{2}\beta_{\mu}}}\leq\bar{\alpha}\leq\frac{\alpha_{\nu}}{2}-\frac{1}{T}+\frac{1}{2}\sqrt{\alpha^{2}_{\nu}+\frac{4\alpha_{\nu}}{T^{2}\beta_{\mu}}}.

Remark 1.3.

It is proven at Lemma 3.2 that $F(\alpha,\cdot)$ is increasing on $(0,+\infty)$ for all $\alpha>-1/T$ . $G(\alpha,\cdot)$ is therefore its inverse.

Remark 1.4.

It is possible to check that if (H2) holds with $L=0$ , Theorem 1.2 recovers the conclusion of [11, Theorem 4],after a change of variable. To be more precise, the potentials $(\varphi_{\varepsilon},\psi_{\varepsilon})$ considered there are related to the couple $(\varphi,\psi)$ appearing in (8) by choosing $\varepsilon=T$ and setting

\varphi_{\varepsilon}=\varepsilon\Big{(}\varphi-U^{\mu}+\frac{|\cdot|^{2}}{2\varepsilon}\Big{)},\quad\psi_{\varepsilon}=\varepsilon\Big{(}\psi-U^{\nu}+\frac{|\cdot|^{2}}{2\varepsilon}\Big{)}.

Remark 1.5.

The rescaled potential $T\varphi$ converges to the Brenier potential in the small noise limit [35]. As explained in the introduction, one cannot deduce from Theorem 1.2 an improvement over Caffarelli’s Theorem [8] by letting $T\rightarrow 0$ in Theorem 1.2.

Our second main result is that the static Schrödinger bridge $\hat{\pi}$ satisfies LSI with an explicit constant. We recall here that a probability measure $\rho$ on $\mathbb{R}^{d}$ satisfies LSI with constant $C$ if and only if for all positive differentiable function $f$

\mathrm{Ent}_{\rho}(f)\leq\frac{C}{2}\int\frac{|\nabla f|^{2}}{f}\mathrm{d}\rho,\quad\text{where}\quad\mathrm{Ent}_{\rho}(f)=\int f\log f\mathrm{d}\rho-\int f\mathrm{d}\rho\,\log\Big{(}\int f\mathrm{d}\rho\Big{)}.

Theorem 1.3.

Let Assumption 1.1 hold and assume furthermore that $\mu$ satisfies LSI with constant $C_{\mu}$ . Then the static Schrödinger bridge $\hat{\pi}$ satisfies LSI with constant

\max\left\{{2}\,C_{\mu},{2}\,C_{\mu}C_{0,T}+\int_{0}^{T}C_{t,T}\,\mathrm{d}t\right\},

where for all $t\leq T$

C_{t,T}:=\exp\Big{(}-\int_{t}^{T}\alpha^{\psi}_{s}\mathrm{d}s\Big{)},\quad\alpha^{\psi}_{t}:=\frac{\alpha_{\psi}}{1+(T-t)\alpha_{\psi}}-\frac{L}{(1+(T-t)\alpha_{\psi})^{2}},

and $\alpha_{\psi}$ is as in Theorem 1.2.

It is well known that LSI has a number of remarkable consequences including, but certainly not limited to, spectral gaps and concentration of measure inequalities for Lipschitz observables.

Remark 1.6.

It is worth noticing that if $U^{\nu}$ is the sum of a strongly convex potential and a Lipschitz perturbation with second derivative bounded below, then (6) holds. Moreover, the perturbation needs not to be of bounded support, covering many interesting scenarios as double wells or multiple-wells potentials. At the moment of writing, it is not clear whether or not (6) implies that $\nu$ is a bounded or log-Lipschitz perturbation of a strongly log-concave probability measure, a situation where the results of [28][1] already ensure that $\nu$ satisfies a logarithmic Sobolev inequality.

Remark 1.7.

By taking $\mu$ to be a Gaussian distribution, we obtain as a corollary of Theorem 1.3 that any probability $\nu$ fulfilling (6) satisfies a logarithmic Sobolev inequality, though the constant we exhibit here is not optimal. Indeed, the LSI constant for $\nu$ is deduced by marginalization from the LSI constant of $\hat{\pi}$ . Obviously, estimating the LSI constant for $\hat{\pi}$ is a much more difficult task than estimating the LSI constant of its marginal in particular because $\hat{\pi}$ does not admit an explicit expression. However, looking at limiting cases, Schrödinger potentials become explicit, and the LSI constant for $\nu$ can be more precisely estimated by constructing Lipschitz maps between some nice distribution and $\nu$ arguing on the basis of Theorem 2.1. In particular, setting $T=1$ and choosing $\mu=\delta_{0}$ allows to recover the setting in which the ”Brownian transport map” [33] is constructed. Changing the reference measure into the stationary Ornstein-Uhlenbeck process, choosing $\mu$ to be the standard Gaussian distribution and setting $T=+\infty$ allows to deploy the technique of heat flow maps [34]. These limiting scenarios are in some sense orthogonal to the scope of this work, that is to gain a better understanding on potentials when they cannot be computed in closed form. They are nevertheless of clear interest and will be analyzed in detail in forthcoming work.

2 Invariant sets of weakly convex functions for the HJB flow

We introduce the notation

U^{{T},{g}}_{t}(x):=-\log P_{T-t}\exp(-g)(x)=-\log\left(\frac{1}{(2\pi(T-t))^{d/2}}\int\exp\Big{(}-\frac{|y-x|^{2}}{2(T-t)}-g(y)\Big{)}\mathrm{d}y\right).

(13)

With this notation at hand, (8) rewrites as follows:

\begin{cases}\varphi=U^{\mu}-U^{{T},{\psi}}_{0},\\ \psi=U^{\nu}-U^{{T},{\varphi}}_{0}.\end{cases}

(14)

It is well known that under mild conditions on $g$ , the map $[0,T]\times\mathbb{R}^{d}\ni(t,x)\mapsto U^{{T},{g}}_{t}(x)$ is a classical solution of the HJB equation

\begin{cases}\partial_{t}\varphi_{t}(x)+\frac{1}{2}\Delta\varphi_{t}(x)-\frac{1}{2}|\nabla\varphi_{t}|^{2}(x)=0,\\ \varphi_{T}(x)=g(x).\end{cases}

(15)

In the next theorem, we construct for any $L>0$ a set of weakly convex functions $\mathcal{F}_{L}$ that is shown to be invariant for the HJB flow. In the proof, and in the rest of the paper we shall denote by $[\cdot,\cdot]$ the quadratic covariation of two Itô processes.

Theorem 2.1.

Fix $L>0$ and define

\mathcal{F}_{L}=\{g\in C^{1}(\mathbb{R}^{d}):\kappa_{g}(r)\geq-r^{-1}f_{L}(r)\quad\forall r>0\}.

Then for all $0\leq t\leq T<+\infty$ we have

g\in\mathcal{F}_{L}\Rightarrow U^{{T},{g}}_{t}\in\mathcal{F}_{L}.

(16)

The fact that convexity of the terminal condition in the HJB equation (15) implies convexity of the solution at all times is equivalent to the fact that the heat flow preserves log-concavity and has been known for a long time, see [7]. Theorem 2.1 offers a significant generalization of this property, by showing that there exist weaker properties than pointwise convexity that are transferred from the terminal condition to the solutions of the HJB equation. It can be checked that $f_{L}$ solves the ODE

ff^{\prime}(r)+2f^{\prime\prime}(r)=0\quad\forall r>0,\quad f(0)=0,f^{\prime}(0)=L.

(17)

To verify the above, it suffices to compute

f_{L}^{\prime}(r)=\frac{L}{\cosh^{2}(rL^{1/2}/2)},\quad f^{\prime\prime}_{L}(r)=-L^{3/2}\frac{\sinh(rL^{1/2}/2)}{\cosh^{3}(rL^{1/2}/2)}

Moreover, we recall here some useful properties of $f_{L}$ :

f_{L}(r)>0,\,f^{\prime}_{L}(r)>0,\,f^{\prime\prime}_{L}(r)<0,\,f_{L}(r)\geq rf^{\prime}_{L}(r)\quad\forall r>0.

(18)

The condition $ff^{\prime}(r)+2f^{\prime\prime}(r)\leq 0$ appears naturally in the main coupling argument of Theorem 2.1 and we have defined the functions $f_{L}$ ad hoc in order to saturate this differential inequality. We are now in position to prove Theorem 2.1. As anticipated above, the proof relies on the analysis of coupling by reflection along the characteristics of the HJB equation. In doing so, we heavily rely on a connection with stochastic control. More precisely, the HJB characteristic

\mathrm{d}X_{t}=-\nabla U^{{T},{g}}_{t}(X_{t})\mathrm{d}t+\mathrm{d}B_{t},\quad X_{0}=x,

is the optimal process for the stochastic control problem

\begin{split}\inf_{(u_{s})_{s\in[0,T]}}\,\,&\mathbb{E}\Big{[}\int_{0}^{T}\frac{1}{2}|u_{s}|^{2}\mathrm{d}s+g(X^{u,x}_{T})\Big{]}\\ &\text{s.t}\quad\mathrm{d}X^{u,x}_{s}=u_{s}\mathrm{d}s+\mathrm{d}B_{s},\quad X^{u,x}_{0}=x.\end{split}

In particular, the stochastic maximum principle [36] for this control problem grants that the process $(\nabla U^{{T},{g}}_{t}(X_{t}))_{t\in[0,T]}$ is a martingale, and we will use this fact in the proof of Theorem 2.1 giving a self contained proof for the reader’s convenience. In the recent article [14, Thm 1.3] Hessian bounds for HJB equations originating from stochastic control problems are obtained by means of coupling techniques. These are two-sided bounds that require an a priori knowledge of global Lipschitz bounds on solutions of the HJB equation to hold. The one-sided estimates of Theorem 2.1 do not require any Lipschitz property of solutions and their proof require finer arguments than those used in [14].

Proof.

We first assume w.l.o.g. that $t=0$ and work under the additional assumption that

g\in C^{3}(\mathbb{R}^{d}),\quad\sup_{x\in\mathbb{R}^{d}}|\nabla^{2}g|(x)<+\infty.

(19)

Combining the above with $g\in\mathcal{F}_{L}$ , we can justify differentiation under the integral sign in (13) and establish that

[0,T]\times\mathbb{R}^{d}\ni(t,x)\mapsto U^{T,g}_{t}(x)

is a classical solution of (15) such that

[0,T]\times\mathbb{R}^{d}\ni(t,x)\mapsto\nabla U^{T,g}_{t}(x)

is continuously differentiable in $t$ as well as twice continuously differentiable and uniformly Lipschitz in $x$ . Under these regularity assumptions, for given $x,\hat{x}\in\mathbb{R}^{d}$ , coupling by reflection of two diffusions started at $x$ and $\hat{x}$ respectively and whose drift field is $-\nabla U^{{T},{g}}_{t}$ is well defined, see [22]. That is to say, there exist a stochastic process $(X_{t},\hat{X}_{t})_{0\leq t\leq T}$ with $(X_{0},\hat{X}_{0})=(x,\hat{x})$ and two Brownian motions $(B_{t},\hat{B}_{t})_{0\leq t\leq T}$ all defined on the same probability space and such that

\begin{cases}\mathrm{d}X_{t}=-\nabla U^{{T},{g}}_{t}(X_{t})\mathrm{d}t+\mathrm{d}B_{t},\quad&\mbox{for $0\leq t\leq T$,}\\ \mathrm{d}\hat{X}_{t}=-\nabla U^{{T},{g}}_{t}(\hat{X}_{t})\mathrm{d}t+\mathrm{d}\hat{B}_{t},\quad&\mbox{for $0\leq t\leq\tau$, $X_{t}=\hat{X}_{t}$ for $t>\tau$,}\\ \end{cases}

where

\mathrm{e}_{t}=r^{-1}_{t}(X_{t}-\hat{X}_{t}),\quad r_{t}=|X_{t}-\hat{X}_{t}|,\quad\mathrm{d}\hat{B}_{t}=\mathrm{d}B_{t}-2\mathrm{e}_{t}\langle\mathrm{e}_{t},\mathrm{d}B_{t}\rangle

and

\tau=\inf\{t\in[0,T]:X_{t}=\hat{X}_{t}\}\wedge T.

In particular, $(\hat{B}_{t})_{0\leq t\leq T}$ is a Brownian motion by Lévy’s characterization. We now define

\mathcal{U}:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\longrightarrow\mathbb{R},\quad\mathcal{U}_{t}(x,\hat{x})=\begin{cases}|x-\hat{x}|^{-1}\langle\nabla U^{{T},{g}}_{t}(x)-\nabla U^{{T},{g}}_{t}(\hat{x}),x-\hat{x}\rangle,\quad&\mbox{if $x\neq\hat{x}$,}\\ 0\quad&\mbox{if $x=\hat{x}$,}\end{cases}

and proceed to prove that $(\mathcal{U}_{t}(X_{t},\hat{X}_{t}))_{0\leq t\leq T}$ is a supermartingale. To this aim, we first deduce from (15) and Itô’s formula that

\mathrm{d}\nabla U^{{T},{g}}_{t}(X_{t})=\mathrm{d}M_{t},\quad\mathrm{d}\nabla U^{{T},{g}}_{t}(\hat{X}_{t})=\mathrm{d}\hat{M}_{t}

(20)

where $M_{\cdot},\hat{M}_{\cdot}$ are square integrable martingales. Indeed we find from Itô’s formula

\begin{split}\mathrm{d}\nabla U^{{T},{g}}_{t}(X_{t})&=\Big{(}\partial_{t}\nabla U^{{T},{g}}_{t}(X_{t})-\nabla^{2}U^{{T},{g}}_{t}\nabla U^{{T},{g}}_{t}(X_{t})+\frac{1}{2}\Delta\nabla U^{{T},{g}}_{t}(X_{t})\Big{)}\mathrm{d}t+\nabla^{2}U^{{T},{g}}_{t}(X_{t})\cdot\mathrm{d}B_{t}\\ &\stackrel{{\scriptstyle\eqref{eq:HJB}}}{{=}}\nabla^{2}U^{{T},{g}}_{t}(X_{t})\cdot\mathrm{d}B_{t},\end{split}

and a completely analogous argument shows that $\nabla U^{{T},{g}}_{t}(\hat{X}_{t})$ is a square integrable martingale. We shall also prove separately at Lemma 2.1 that

\mathrm{d}\mathrm{e}_{t}=-r^{-1}_{t}\mathrm{proj}_{\mathrm{e}^{\bot}_{t}}(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))\mathrm{d}t\quad\forall t<\tau,

(21)

where $\mathrm{proj}_{\mathrm{e}^{\bot}_{t}}$ denotes the orthogonal projection on the orthogonal complement of the linear subspace generated by $\mathrm{e}_{t}$ . Combining together (20) and (21) we find that $\mathrm{d}\mathcal{U}_{t}(X_{t},\hat{X}_{t})=0$ for $t\geq\tau$ , whereas for $t<\tau$

\begin{split}\mathrm{d}\mathcal{U}_{t}(X_{t},\hat{X}_{t})&=\langle\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}),\mathrm{d}\mathrm{e}_{t}\rangle\\ &+\langle\mathrm{e}_{t},\mathrm{d}(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))\rangle+\mathrm{d}[(\nabla U^{{T},{g}}_{\cdot}(X_{\cdot})-\nabla U^{{T},{g}}_{\cdot}(\hat{X}_{\cdot})),\mathrm{e}_{\cdot}]_{t}\\ &\stackrel{{\scriptstyle\eqref{eq:propagation_2}+\eqref{eq:propagation_1}}}{{=}}-r^{-1}_{t}|\mathrm{proj}_{\mathrm{e}^{\bot}_{t}}(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))|^{2}\mathrm{d}t+\mathrm{d}\tilde{M}_{t}.\end{split}

proving that $(\mathcal{U}_{t}(X_{t},\hat{X}_{t}))_{0\leq t\leq T}$ is a supermartingale. In the above, $\tilde{M}_{\cdot}$ denotes a square integrable martingale and to obtain the last equality we used that the quadratic variation term vanishes because of (21). Next, arguing exactly as in [22, Eq. 60] (see also (25) below for more details) on the basis of Itô’s formula and invoking (17) we get

\begin{split}\mathrm{d}f_{L}(r_{t})&=[-f^{\prime}_{L}(r_{t})\mathcal{U}_{t}(X_{t},\hat{X}_{t})+2f^{\prime\prime}_{L}(r_{t})]\mathrm{d}t+\mathrm{d}N_{t}\\ &\stackrel{{\scriptstyle\eqref{eq:ODE_f}}}{{=}}-f^{\prime}_{L}(r_{t})[\mathcal{U}_{t}(X_{t},\hat{X}_{t})+f_{L}(r_{t})]\mathrm{d}t+\mathrm{d}N_{t},\end{split}

where $N_{\cdot}$ is a square integrable martingale. It then follows that

\mathrm{d}\big{(}\mathcal{U}_{t}(X_{t},\hat{X}_{t})+f_{L}(r_{t})\big{)}\leq-f^{\prime}_{L}(r_{t})\Big{(}\mathcal{U}_{t}(X_{t},\hat{X}_{t})+f_{L}(r_{t})\Big{)}\mathrm{d}t+\mathrm{d}N_{t}+\mathrm{d}\tilde{M}_{t}.

(22)

from which we deduce that the process

\Gamma_{t}=\exp\Big{(}\int_{0}^{t}f^{\prime}_{L}(r_{s})\mathrm{d}s\Big{)}\big{(}\mathcal{U}_{t}(X_{t},\hat{X}_{t})+f_{L}(r_{t})\big{)}

is a supermartingale and in particular is decreasing on average. This gives

\begin{split}&|x-\hat{x}|^{-1}\langle\nabla U^{{T},{g}}_{0}(x)-\nabla U^{{T},{g}}_{0}(\hat{x}),x-\hat{x}\rangle+f_{L}(|x-\hat{x}|)=\mathbb{E}[\Gamma_{0}]\\ &\geq\mathbb{E}[\Gamma_{T}]\geq\mathbb{E}\Big{[}\exp\Big{(}\int_{0}^{T}f^{\prime}_{L}(r_{s})\mathrm{d}s\Big{)}\big{(}|X_{T}-\hat{X}_{T}|\kappa_{g}(|X_{T}-\hat{X}_{T}|)+f_{L}(|X_{T}-\hat{X}_{T}|)\big{)}\Big{]}\geq 0,\end{split}

where the last inequality follows from $g\in\mathcal{F}_{L}$ . We have thus completed the proof under the additional assumption (19). In order to remove it, consider any $g\in\mathcal{F}_{L}$ . Then there exist $(g^{n})\subseteq\mathcal{F}_{L}$ such that (19) holds for any of the $g^{n}$ , $g^{n}\rightarrow g$ pointwise and $g^{n}$ is uniformly bounded below. From this, one can prove that $\nabla U^{{g^{n}},{T}}_{0}\rightarrow\nabla U^{{g},{T}}_{0}$ pointwise by differentiating (13) under the integral sign. Using this result in combination with the fact that (16) holds for any $g^{n}$ allows to reach the desired conclusion. ∎

Lemma 2.1.

Under the same assumptions and notations of Theorem 2.1 we have

\mathrm{d}\mathrm{e}_{t}=-r^{-1}_{t}\mathrm{proj}_{\mathrm{e}^{\bot}_{t}}(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))\mathrm{d}t\quad\forall t<\tau.

Proof.

Recall that if $\theta:\mathbb{R}^{d}\rightarrow\mathbb{R}$ is the map $z\mapsto|z|$ , then we have

\nabla\theta(z)=\frac{z}{|z|},\quad\nabla^{2}\theta(z)=\frac{\mathrm{I}}{|z|}-\frac{zz^{\top}}{|z|^{3}},\quad z\neq 0.

(23)

The proof consist of several applications of Itô’s formula. We first observe that for $t<\tau$

\mathrm{d}(X_{t}-\hat{X}_{t})=-(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))\mathrm{d}t+2\mathrm{e}_{t}\mathrm{d}W_{t},\quad\text{with}\quad\mathrm{d}W_{t}=\langle\mathrm{e}_{t},\mathrm{d}B_{t}\rangle.

(24)

Note that $(W_{t})_{0\leq t\leq T}$ is a Brownian motion by Lévy’s characterization. Thus, invoking (23) (or refferring directly to [22, Eq. 60] we obtain

\mathrm{d}r_{t}=-\langle\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}),\mathrm{e}_{t}\rangle\mathrm{d}t+2\mathrm{d}W_{t},

(25)

whence

\begin{split}\mathrm{d}r^{-1}_{t}&=-r^{-2}_{t}\mathrm{d}r_{t}+r^{-3}_{t}\mathrm{d}[r]_{t}\\ &=\Big{(}r^{-2}_{t}\langle\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}),\mathrm{e}_{t}\rangle+4r^{-3}_{t}\Big{)}\mathrm{d}t-2r^{-2}_{t}\mathrm{d}W_{t}.\end{split}

(26)

Combining (24) with (26) we find that for $t<\tau$

\begin{split}\mathrm{d}\mathrm{e}_{t}&=\mathrm{d}\big{(}r^{-1}_{t}(X_{t}-\hat{X}_{t}))\\ &=r^{-1}_{t}\mathrm{d}(X_{t}-\hat{X}_{t})+(X_{t}-\hat{X}_{t})\mathrm{d}(r^{-1}_{t})+\mathrm{d}[X_{\cdot}-\hat{X}_{\cdot},r^{-1}_{\cdot}]_{t}\\ &=-r^{-1}_{t}(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))\mathrm{d}t+2r^{-1}_{t}\mathrm{e}_{t}\mathrm{d}W_{t}\\ &+\Big{(}r^{-2}_{t}\langle\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}),\mathrm{e}_{t}\rangle+4r^{-3}_{t}\Big{)}(X_{t}-\hat{X}_{t})\mathrm{d}t\\ &-2r^{-2}_{t}(X_{t}-\hat{X}_{t})\mathrm{d}W_{t}-4r^{-2}_{t}\mathrm{e}_{t}\mathrm{d}t\\ &=-r^{-1}_{t}\Big{(}\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t})-\langle\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}),\mathrm{e}_{t}\rangle\mathrm{e}_{t}\Big{)}\mathrm{d}t\\ &=-(r^{-1}_{t})\mathrm{proj}_{\mathrm{e}_{t}^{\bot}}(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))\mathrm{d}t.\end{split}

∎

3 Second order bounds for Schrödinger potentials

From now on Assumption 1.1 is in force, even if we do not specify it. Moreover, since we show at Proposition 5.1 in the appendix that $(\mathbf{H2}^{\prime})$ implies $(\mathbf{H2})$ , we shall always assume that $(\mathbf{H2})$ holds in the sequel. The next two subsections are devoted to establish the key estimates needed in the proof of Theorem 1.2, that is carried out immediately afterwards.

3.1 Weak semiconvexity of $\psi$ implies weak semiconcavity of $\varphi$

We begin this section with a useful reminder of the definition of $F$ , first given at (12).

F(\alpha,s)=\beta_{\mu}s+\frac{s}{T(1+T\alpha)}+\frac{s^{1/2}f_{L}(s^{1/2})}{(1+T\alpha)^{2}},\quad s>0.

Lemma 3.1.

Assume that $\alpha>-1/T$ exists such that

\kappa_{\psi}(r)\geq\alpha-r^{-1}f_{L}(r)\quad\forall r>0.

Then we have

\kappa_{U^{T,\psi}_{0}}(r)\geq\frac{\alpha}{1+T\alpha}-\frac{r^{-1}f_{L}(r)}{(1+T\alpha)^{2}}\quad\forall r>0.

(27)

In particular,

\ell_{\varphi}(r)\leq\beta_{\mu}-\frac{\alpha}{1+T\alpha}+\frac{r^{-1}f_{L}(r)}{(1+T\alpha)^{2}}=r^{-2}F(\alpha,r^{2})-\frac{1}{T}\quad\forall r>0.

(28)

Proof.

We define

\hat{\psi}(\cdot)=\psi(\cdot)-\frac{\alpha}{2}|\cdot|^{2}.

and note that $\hat{\psi}\in\mathcal{F}_{L}$ by construction. We claim that

U^{T,\psi}_{0}(x)=\frac{\alpha}{2(1+T\alpha)}|x|^{2}+U^{T/(1+T\alpha),\hat{\psi}}_{0}((1+T\alpha)^{-1}x)+C,

(29)

where $C$ is some constant independent of $x$ . Indeed we have

\begin{split}U^{T,\psi}_{0}(x)-\frac{d}{2}\log(2\pi T)&=-\log\int\exp\Big{(}-\frac{|y-x|^{2}}{2T}-\frac{\alpha}{2}|y|^{2}-\hat{\psi}(y)\Big{)}\mathrm{d}y\\ &=-\log\int\exp\Big{(}-\frac{\alpha|x|^{2}}{2(1+T\alpha)}-\frac{1+T\alpha}{2T}|y-(1+T\alpha)^{-1}x|^{2}-\hat{\psi}(y)\Big{)}\mathrm{d}y\\ &=\frac{\alpha|x|^{2}}{2(1+T\alpha)}+U^{T/(1+T\alpha),\hat{\psi}}_{0}((1+T\alpha)^{-1}x)-\frac{d}{2}\log(2\pi T/(1+T\alpha))\end{split}

Since $\hat{\psi}\in\mathcal{F}_{L}$ , we can invoke Theorem 2.1 in (29) to prove (27). The estimate (28) is immediately deduced from (27) recalling the relation (14) and using Assumption 1.1. ∎

3.2 Weak semiconcavity of $\varphi$ implies weak semiconvexity of $\psi$

We begin by recording some useful properties of the functions $F(\cdot,\cdot)$ and $G(\cdot,\cdot)$ .

Lemma 3.2.

Let $T,\beta_{\mu}>0,L\geq 0$ be given.

(i)

For any $\alpha>-1/T$ the function

$s\mapsto F(\alpha,s)$

is concave and increasing $[0,+\infty)$ .
(ii)

$\alpha\mapsto G(\alpha,2)$ is positive and non decreasing over $(-\frac{1}{T},+\infty)$ . Moreover,

$\sup_{\alpha>-1/T}G(\alpha,2)\leq\frac{1}{2\beta_{\mu}}.$ (30)
(iii)

The fixed point equation (11) admits at least one solution on $(\alpha_{\nu}-1/T,+\infty)$ and $\alpha_{\nu}-1/T$ is not an accumulation point for the set of solutions.

Proof.

We begin with the proof of (i). To this aim, we observe that $f_{L}$ is increasing on $[0,+\infty)$ and therefore so is $s\mapsto s^{1/2}f_{L}(s^{1/2})$ . Therefore

\frac{\mathrm{d}}{\mathrm{d}s}F(\alpha,s)\geq\beta_{\mu}+\frac{1}{T(1+T\alpha)}>0,

where we used $\alpha>-1/T$ in the last inequality. To prove concavity, we observe that

\frac{\mathrm{d}^{2}}{\mathrm{d}u^{2}}\Big{(}u^{1/2}f_{L}(u^{1/2})\Big{)}\Big{|}_{u=s}=\frac{s^{-1/2}}{4}f^{{}^{\prime\prime}}_{L}(s^{1/2})+\frac{s^{-3/2}}{4}(f^{{}^{\prime}}_{L}(s^{1/2})s^{1/2}-f_{L}(s^{1/2}))\stackrel{{\scriptstyle\eqref{eq:basic properties}}}{{<}}0.

Thus $s\mapsto s^{1/2}f_{L}(s^{1/2})$ is concave and so is $F(\alpha,\cdot)$ . We now move on to the proof of (ii) by first showing that $G(\cdot,2)$ is positive and then showing that it is increasing. If this was not the case then $G(\alpha,2)=0$ for some $\alpha>-1/T$ and therefore there exists a sequence $(s_{n})_{n\geq 0}$ such that $s_{n}\rightarrow 0$ and $F(\alpha,s_{n})\geq 2$ . But this is impossible since $\lim_{s\downarrow 0}F(\alpha,s_{n})=0.$ Next, we observe that $F(\alpha,s)$ is increasing in $s$ from item $(i)$ and decreasing in $\alpha$ for $\alpha\in(-1/T,+\infty)$ . For this reason, for any $u$ and $\alpha^{\prime}\geq\alpha$ we have

\{s:F(\alpha^{\prime},s)\geq u\}\subseteq\{s:F(\alpha,s)\geq u\}

and therefore

G(\alpha^{\prime},u)\geq G(\alpha,u).

We complete the proof of (ii) by showing that (30) holds. To see this, using $f_{L}(r)\geq 0$ we obtain that for any $\alpha>-1/T$

F(\alpha,s)\geq\beta_{\mu}s\quad\forall s>0.

But then we obtain directly from (12) that $G(\alpha,2)\leq 1/(2\beta_{\mu})$ thus proving (30). To prove (iii), we introduce

h:[\alpha_{\nu}-\frac{1}{T},+\infty)\longrightarrow\mathbb{R},\quad h(\alpha):=\alpha-\Big{(}\alpha_{\nu}-\frac{1}{T}+\frac{G(\alpha,2)}{2T^{2}}\Big{)}

Note that that $h$ is continuous on its domain since $G(\cdot,2)$ is so. Therefore, to reach the conclusion it suffices to show that

h\Big{(}\alpha_{\nu}-\frac{1}{T}\Big{)}<0,\quad\lim_{\alpha\rightarrow+\infty}h(\alpha)=+\infty.

(31)

The first inequality is a direct consequence of $G(\alpha_{\nu}-1/T,2)>0$ , that we have already proven. Finally, the fact that $h$ diverges at infinity is a consequence of (30). ∎

We shall now introduce the modified potential $\bar{\psi}$ as follows

\bar{\psi}(y)=T\Big{(}\psi(y)-U^{\nu}(y)+\frac{|y|^{2}}{2T}\Big{)},

(32)

It has been proven at [11, Lemma 1] that the Hessian of $\bar{\psi}$ relates to the covariance matrix of the conditional distributions of the static Schrödinger bridge $\hat{\pi}$ . That is to say,

\nabla^{2}\bar{\psi}(y)=\frac{1}{T}\mathrm{Cov}_{X\sim\hat{\pi}^{y}}(X)

(33)

where $\hat{\pi}^{y}$ is (a version of) the conditional distribution of $\hat{\pi}$ that, in view of (8) has the following form:

\hat{\pi}^{y}(\mathrm{d}x)=\frac{\exp(-V^{\hat{\pi}^{y}}(x))\mathrm{d}x}{\int\exp(-V^{\hat{\pi}^{y}}(\bar{x}))\mathrm{d}\bar{x}},\quad V^{\hat{\pi}^{y}}(x):=\varphi(x)+\frac{|x|^{2}}{2T}-\frac{xy}{T}.

(34)

We shall give an independent proof of (33) under additional regularity assumptions at Proposition 5.2 in the Appendix for the readers’ convenience. A consequence of (33) is that $\bar{\psi}$ is convex and we obtain from (32) that

\kappa_{\psi}(r)\geq\alpha_{\nu}-\frac{1}{T}-r^{-1}f_{L}(r)\quad\forall r>0.

(35)

This is a first crude weak semiconvexity bound on $\psi$ upon which Theorem 1.2 improves by means of a recursive argument. We show in the forthcoming Lemma how to deduce weak semiconvexity of $\psi$ from weak semiconcavity of $\varphi$ . In the $L=0$ setting, this step is carried out in [11] invoking the Cramer-Rao inequality, whose application is not justified in the present more general setup.

Lemma 3.3.

Assume that $\alpha>-1/T$ exists such that

\ell_{\varphi}(r)\leq-\frac{1}{T}+r^{-2}F(\alpha,r^{2})\quad\forall r>0.

(36)

Then

\kappa_{\psi}(r)\geq\alpha_{\nu}-\frac{1}{T}+\frac{G(\alpha,2)}{2T^{2}}-r^{-1}f_{L}(r)\quad\forall r>0.

Proof.

Recalling the definition of $V^{\hat{\pi}^{y}}$ given at (34) we observe that the standing assumptions imply

\ell_{V^{\hat{\pi}^{y}}}(r)\stackrel{{\scriptstyle\eqref{eq:cond_distr}}}{{\leq}}\ell_{\varphi}(r)+\frac{1}{T}\stackrel{{\scriptstyle\eqref{eq:convexity_propagation_3}}}{{=}}r^{-2}F(\alpha,r^{2})\quad\forall r>0.

(37)

In view of (33), we now proceed to bound $\mathrm{Var}_{X\sim\hat{\pi}^{y}}(X_{1})$ from below for a given $y$ , where we adopted the notational convention $X=(X_{1},\ldots,X_{d})$ for the components of random vectors. We first observe that the variance-bias decomposition formula implies

\mathrm{Var}_{X\sim\hat{\pi}^{y}}(X_{1})\geq\mathbb{E}_{X\sim\hat{\pi}^{y}}[\mathrm{Var}_{X\sim\hat{\pi}^{y}}(X_{1}|X_{2},\ldots,X_{d})].

(38)

Moreover, we define for any $z=(z_{2},\ldots,z_{d})$

V^{\hat{\pi}^{y,z}}(\cdot):=V^{\hat{\pi}^{y}}(\cdot,z),\quad\hat{\pi}^{y,z}(\mathrm{d}x)=\frac{\exp(-V^{\hat{\pi}^{y,z}}(x))\mathrm{d}x}{\int\exp(-V^{\hat{\pi}^{y,z}}(\bar{x}))\mathrm{d}\bar{x}}

and observe that if $X\sim\hat{\pi}^{y}$ , then the conditional distribution of $X_{1}$ under given $\{(X_{2},\ldots,X_{d})=(z_{2},\ldots,z_{d})\}$ is precisely $\hat{\pi}^{y,z}$ . This gives the formula

\mathrm{Var}_{X\sim\hat{\pi}^{y}}(X_{1}|X_{2}=z_{2},\ldots,X_{d}=z_{d})=\frac{1}{2}\int|x-\hat{x}|^{2}\hat{\pi}^{y,z}(\mathrm{d}x)\hat{\pi}^{y,z}(\mathrm{d}\hat{x})

With this notation at hand, and with the help of the following identities, that can be obtained by one-dimensional integration by parts,

1=\int\partial_{x}V^{\hat{\pi}^{y,z}}(x)\,x\,\,\hat{\pi}^{y,z}(\mathrm{d}x),\quad 0=\int\partial_{x}V^{\hat{\pi}^{y,z}}(x)\hat{\pi}^{y,z}(\mathrm{d}x)

we find that, uniformly in $z\in\mathbb{R}^{d-1}$ ,

\begin{split}1&=\frac{1}{2}\int(\partial_{x}V^{\hat{\pi}^{y,z}}(x)-\partial_{x}V^{\hat{\pi}^{y,z}}(\hat{x}))(x-\hat{x})\hat{\pi}^{y,z}(\mathrm{d}x)\hat{\pi}^{y,z}(\mathrm{d}\hat{x})\\ &=\frac{1}{2}\int\langle\nabla V^{\hat{\pi}^{y}}(x,z)-\nabla V^{\hat{\pi}^{y}}(\hat{x},z),(x,z)-(\hat{x},z)\rangle\hat{\pi}^{y,z}(\mathrm{d}x)\hat{\pi}^{y,z}(\mathrm{d}\hat{x})\\ &\stackrel{{\scriptstyle\eqref{eq:convexity_propagation_1}}}{{\leq}}\frac{1}{2}\int F(\alpha,|x-\hat{x}|^{2})\hat{\pi}^{y,z}(\mathrm{d}x)\hat{\pi}^{y,z}(\mathrm{d}\hat{x})\\ &\leq\frac{1}{2}F(\alpha,2\mathrm{Var}_{X\sim\hat{\pi}^{y}}(X_{1}|X_{2}=z_{2},\ldots,X_{d}=z_{d}))\end{split}

where to establish the last inequality we used the concavity of $F$ (see Lemma 3.2 (i)) and Jensen’s inequality. Since $\alpha>-1/T$ , invoking again Lemma 3.2 (i) we have that $s\mapsto F(\alpha,s)$ is non decreasing. But then, we get from (38) and the last bound that

\mathrm{Var}_{X\sim\hat{\pi}^{y}}(X_{1})\geq\frac{1}{2}G(\alpha,2),\quad\forall y\in\mathbb{R}^{d}.

Next, we observe that, because of the fact that if $\varphi(\cdot)$ satisfies (36) then so does $\varphi(\mathrm{O}\cdot)$ for any orthonormal matrix $\mathrm{O}$ , repeating the argument above yields

\mathrm{Var}_{X\sim\hat{\pi}^{y}}(\langle v,X\rangle)\geq\frac{1}{2}G(\alpha,2),\quad\forall y,v\in\mathbb{R}^{d}\,\,\text{s.t.}\,\,|v|=1.

Recalling that $\nabla^{2}\bar{\psi}(y)=\frac{1}{T}\mathrm{Cov}_{X\sim\hat{\pi}^{y}}(X)$ with $\bar{\psi}$ defined by (32), we find

\langle v,\nabla^{2}\bar{\psi}(y)v\rangle\geq\frac{G(\alpha,2)}{2T}|v|^{2}\quad\forall v,y\in\mathbb{R}^{d}.

(39)

But then, rewriting (32) as

\psi(\cdot)=U^{\nu}(\cdot)-\frac{|\cdot|^{2}}{2T}+\frac{\bar{\psi}(\cdot)}{T},

we immediately obtain that for all $r>0$

\begin{split}\kappa_{\psi}(r)&\geq\kappa_{U^{\nu}}(r)-\frac{1}{T}+\frac{1}{T}\kappa_{\bar{\psi}(y)}(r)\\ &\geq\alpha_{\nu}-\frac{1}{T}+\frac{G(\alpha,2)}{2T^{2}}-r^{-1}f_{L}(r),\end{split}

where we used (39) and hypothesis (6) to obtain the last inequality. ∎

3.3 Proof of Theorem 1.2

The proof is obtained by combining the results of the former two sections through a fixed point argument.

Proof of Theorem 1.2.

We define a sequence $(\alpha^{n})_{n\geq 0}$ via

\alpha^{0}=\alpha_{\nu}-\frac{1}{T},\quad\alpha^{n}=\alpha_{\nu}-\frac{1}{T}+\frac{G(\alpha^{n-1},2)}{2T^{2}},\quad n\geq 1.

Using Lemma 3.2 (ii) and an induction argument, we obtain that $\alpha^{1}\geq\alpha^{0}$ and $(\alpha^{n})_{n\geq 0}$ is a non decreasing sequence. Moreover, $(\alpha^{n})_{n\geq 0}$ is a bounded sequence by (30) and therefore it admits a finite limit $\alpha^{*}$ . By continuity of $G(\cdot,2)$ , we know that $\alpha^{*}>\alpha_{\nu}-1/T$ and $\alpha^{*}$ satisfies the fixed point equation (11). To conclude the proof, we show by induction that

\kappa_{\psi}(r)\geq\alpha^{n}-r^{-1}f_{L}(r)\quad\forall n\geq 1.

(40)

The case $n=0$ is (35). For the inductive step, suppose (40) holds for a given $n$ . Then Lemma 3.1 gives that

\ell_{\varphi}(r)\leq r^{-2}F(\alpha^{n},r^{2})-\frac{1}{T}\quad\forall r>0.

But then, an application of Lemma 3.3 proves that for all $r>0$ we have

\kappa_{\psi}(r)\geq\alpha_{\nu}-\frac{1}{T}+\frac{G(\alpha^{n},2)}{2T^{2}}-r^{-1}f_{L}(r)=\alpha^{n+1}-r^{-1}f_{L}(r).

The proof of (9) is now finished. To conclude, we observe that (10) follows directly from (9) and Lemma 3.3.

∎

Let us now prove Corollary 1.1

Proof of Corollary 1.1.

We first prove the upper bound. To do so, we observe that

F(\alpha,s)\geq(\beta_{\mu}+\frac{1}{T(1+T\alpha)})s,\quad\forall\alpha\geq\alpha_{\nu}-1/T,s\geq 0.

But then,

\frac{G(\alpha,2)}{2T^{2}}\leq\frac{1}{T^{2}\beta_{\mu}+\frac{1}{(\alpha+1/T)}}.

Since $\bar{\alpha}$ is a fixed point, we obtain

\bar{\alpha}+\frac{1}{T}\leq\alpha_{\nu}+\frac{1}{T^{2}\beta_{\mu}+\frac{1}{(\bar{\alpha}+1/T)}}.

If we now define $\bar{a}=\bar{\alpha}+1/T$ , the above implies

\bar{a}\leq\alpha_{\nu}+\frac{\bar{a}}{T^{2}\beta_{\mu}\bar{a}+1}.

Since $\bar{a}>0$ , we can rewrite the last inequality in the equivalent form

T^{2}\beta_{\mu}\bar{a}^{2}-T^{2}\alpha_{\nu}\beta_{\mu}\bar{a}-\alpha_{\nu}\leq 0.

Solving this differential inequality yields

\bar{a}\leq\frac{\alpha_{\nu}}{2}+\frac{1}{2}\sqrt{\alpha^{2}_{\nu}+\frac{4\alpha_{\nu}}{T^{2}\beta_{\mu}}}.

The desired result follows from $\bar{\alpha}=\bar{a}-\frac{1}{T}$ . We now move to the proof of the lower bound. First, we recalling that $f_{L}(r)\leq Lr$ , we obtain

F(\alpha,s)\leq\big{(}\beta_{\mu}+\frac{1}{T(1+T\alpha)}+\frac{L}{(1+T\alpha)^{2}}\big{)}s.

Using that $\bar{\alpha}\geq\alpha_{\nu}-1/T$ , we obtain that for all $s>0$

F(\bar{\alpha},s)\leq\big{(}\beta_{\mu}+\frac{1}{T(1+T\bar{\alpha})}+\frac{L}{T\alpha_{\nu}(1+T\bar{\alpha})}\big{)}s.

But then,

\frac{G(\bar{\alpha},2)}{2T^{2}}\geq\frac{1}{T^{2}\beta_{\mu}+\frac{(1+L/\alpha_{\nu})}{(\bar{\alpha}+1/T)}}.

Setting $\bar{a}=\alpha+1/T$ , we deduce from the fact that $\bar{\alpha}$ is a fixed point that

\bar{a}\geq\alpha_{\nu}+\frac{\bar{a}}{\bar{a}T^{2}\beta_{\mu}+(1+L/\alpha_{\nu})}.

Using $\bar{a}>0$ we rewrite the last inequality in the equivalent form

T^{2}\beta_{\mu}\bar{a}^{2}+(L/\alpha_{\nu}-T^{2}\alpha_{\nu}\beta_{\mu})\bar{a}-(\alpha_{\nu}+L)\geq 0.

Solving this differential inequality yields

\bar{a}\geq\frac{\alpha_{\nu}}{2}+\frac{L}{2T^{2}\alpha_{\nu}\beta_{\mu}}+\frac{1}{2}\sqrt{\big{(}\alpha_{\nu}+\frac{L}{T^{2}\alpha_{\nu}\beta_{\mu}}\big{)}^{2}+\frac{4\alpha_{\nu}}{T^{2}\beta_{\mu}}}\,\,.

The desired conclusion follows from $\bar{\alpha}=\bar{a}-1/T$ . ∎

4 Logarithmic Sobolev inequality for Schrödinger bridges

This section is devoted to the proof of Theorem 1.3 and is structured as follows: we first recall known facts about logarithmic Sobolev inequalities and gradient estimates for diffusion semigroups whose proofs can be found e.g. in [2] and eventually prove at Lemma 4.1 a sufficient condition for the two-times distribution of a diffusion process to satisfy LSI. Though such a result may not appear surprising, we could not find it in this form in the existing literature. We then proceed to elucidate the connection between Schrödinger bridges and Doob $h$ -transforms at Lemma 4.2, and then finally prove Theorem 1.3.

Local LSIs and gradient estimates

Let $[0,T^{\prime}]\times\mathbb{R}^{d}\ni(t,x)\mapsto U_{t}(x)$ be continuous in the time variable, twice differentiable and uniformly Lipschitz in the space variable. We consider the time-inhomogeneous semigroup $(P_{s,t})_{0\leq s\leq t\leq T^{\prime}}$ generated by the diffusion process whose generator at time $t$ acts on smooth functions with bounded support as follows

f\mapsto\frac{1}{2}\Delta f-\langle\nabla U_{t},\nabla f\rangle.

Moreover, we define for all $t\in[0,T^{\prime}]$

\alpha_{t}=\inf_{x,v\in\mathbb{R}^{d},|v|=1}\langle v,\nabla^{2}U_{t}(x),v\rangle.

We now recall some basic fact about gradient estimates and local LSIs for the semigroup $(P_{s,t})_{0\leq s\leq t\leq T^{\prime}}$ . For time-homogeneous semigroups these facts are well known and can be found e.g. in [2]: the adaptation to the time-inhomogeneous setting is straightforward. The first result we shall need afterwards is the gradient estimate (see [2, Thm. 3.3.18])

|\nabla P_{t,T^{\prime}}f|(x)\leq C_{t,T^{\prime}}\,P_{t,T^{\prime}}(|\nabla f|)(x),\quad C_{t,T^{\prime}}=\exp\Big{(}-\int_{t}^{T^{\prime}}\alpha_{s}\mathrm{d}s\Big{)},

(41)

that holds for all $(t,x)\in[0,T^{\prime}]\times\mathbb{R}^{d}$ and any continuously differentiable $f$ . Moreover, the local logarithmic Sobolev inequalities (see [2, Thm. 5.5.2])

(P_{0,T^{\prime}}f\log f)(x)-(P_{0,T^{\prime}}f)(x)\log(P_{0,T^{\prime}}f)(x)\leq\frac{\tilde{C}_{0,T^{\prime}}}{2}P_{0,T^{\prime}}(|\nabla f|^{2}/f)(x),\quad\tilde{C}_{0,T^{\prime}}=\int_{0}^{T^{\prime}}C_{t,T^{\prime}}\,\mathrm{d}t

(42)

hold for all $x\in\mathbb{R}^{d}$ and all positive continuously differentiable $f$ . In the next Lemma we show how to obtain LSI for the joint law at times $0$ and $T^{\prime}$ of a diffusion process with initial distribution $\mu$ and drift $-\nabla U_{t}$ , that is to say for the coupling $\pi$ defined by

\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}f(x,y)\pi(\mathrm{d}x\mathrm{d}y)=\int_{\mathbb{R}^{d}}P_{0,T^{\prime}}f(x,\cdot)(x)\mu(\mathrm{d}x)\quad\forall f>0.

(43)

Lemma 4.1.

Assume that $\mu$ satisfies LSI with constant $C_{\mu}$ and let $\pi$ be as in (43). Then $\pi$ satisfies LSI with constant

\max\{2C_{\mu},{2}C_{\mu}C_{0,T^{\prime}}+\int_{0}^{T^{\prime}}C_{t,T^{\prime}}\mathrm{d}t\}.

The proof is carried out by carefully ”mixing” the local (conditional) LSIs (42) with the help of gradient estimates. Similar arguments and ideas can be found e.g. in [6, 27].

Proof.

Let $f>0$ be continuously differentiable. We recall the decomposition of the entropy formula (see [30, Thm. 2.4])

\mathrm{Ent}_{\pi}(f)=\mathrm{Ent}_{\mu}(f_{0})+\int_{\mathbb{R}^{d}}\mathrm{Ent}_{\pi^{x}}(f^{x})f_{0}(x)\mu(\mathrm{d}x),

(44)

where we adopted the following conventions

f_{0}(x)=(P_{0,T^{\prime}}f(x,\cdot))(x),\quad f^{x}(y)=f(x,y)/f_{0}(x),\quad\int g(y)\pi^{x}(\mathrm{d}y)=\Big{(}P_{0,T^{\prime}}g\Big{)}(x)\,\,\forall g>0.

(45)

The proof is carried out in two steps. In a first step, we bound the second term in (44) by means of the conditional LSIs. In the second step, we bound the first term using the LSI for $\mu$ and gradient estimates.

•

Step 1 The local logarithmic Sobolev inequalities (42) imply that

\begin{split}\mathrm{Ent}_{\pi^{x}}(f^{x})&=P_{0,T^{\prime}}\big{(}f^{x}\log f^{x}\big{)}(x)-\big{(}P_{0,T^{\prime}}f^{x}\log P_{0,T^{\prime}}f^{x}\big{)}(x)\\ &\leq\frac{\tilde{C}_{0,T^{\prime}}}{2f_{0}(x)}\,\int|\nabla_{y}f^{x}(y)|^{2}/f^{x}(y)\pi^{x}(\mathrm{d}y)\\ &\stackrel{{\scriptstyle\eqref{eq:conventions}}}{{\leq}}\frac{\tilde{C}_{0,T^{\prime}}}{2f_{0}(x)}\,\int|\nabla_{y}f(x,y)|^{2}/f(x,y)\pi^{x}(\mathrm{d}y).\end{split}

uniformly in $x\in\mathbb{R}^{d}$ .Integrating this inequality and using (43) gives

\int\mathrm{Ent}_{\pi^{x}}(f^{x})f_{0}(x)\mu(\mathrm{d}x)\leq\frac{\tilde{C}_{0,T^{\prime}}}{2}\,\int\frac{|\nabla_{y}f(x,y)|^{2}}{f(x,y)}\pi(\mathrm{d}x\mathrm{d}y).

(46)

•

Step 2 We start with the observation that

\nabla_{x}f_{0}(x)\stackrel{{\scriptstyle\eqref{eq:conventions}}}{{=}}P_{0,T^{\prime}}\nabla_{x}f(x,\cdot)(x)+\nabla_{z}\big{(}P_{0,T^{\prime}}f(x,\cdot)(z)\big{)}\Big{|}_{z=x}.

But then, using the LSI for $\mu$ and Young’s inequality we obtain

\begin{split}\mathrm{Ent}_{\mu}(f_{0})&\leq\frac{C_{\mu}}{2}\int|\nabla_{x}f_{0}(x)|^{2}/f_{0}(x)\mu(\mathrm{d}x)\\ &\leq C_{\mu}\int|P_{0,T^{\prime}}(\nabla_{x}f(x,\cdot))(x)|^{2}(P_{0,T^{\prime}}f(x,\cdot))^{-1}(x)\,\mu(\mathrm{d}x)\\ &+C_{\mu}\int|\nabla_{z}P_{0,T^{\prime}}(f(x,\cdot))(z)|^{2}\Big{|}_{z=x}(P_{0,T^{\prime}}f(x,\cdot))^{-1}(x)\,\mu(\mathrm{d}x).\end{split}

(47)

For the first summand on the rhs of (47), we can argue on the basis of Jensen’s inequality applied to the convex function $a,b\mapsto a^{2}/b$ to obtain

\begin{split}&\int|P_{0,T^{\prime}}(\nabla_{x}f(x,\cdot))(x)|^{2}(P_{0,T^{\prime}}f(x,\cdot))^{-1}(x)\mu(\mathrm{d}x)\\ &\leq C_{\mu}\int P_{0,T^{\prime}}\Big{(}|\nabla_{x}f(x,\cdot)|^{2}/f(x,\cdot)\Big{)}(x)\mu(\mathrm{d}x)\\ &=C_{\mu}\int|\nabla_{x}f(x,y)|^{2}/f(x,y)\pi(\mathrm{d}x\mathrm{d}y),\end{split}

(48)

where we used (43) to obtain the last identity. For the second summand on the rhs of (47), we first invoke the gradient estimate (41) and eventually apply again Jensen’s inequality as we did in the previous calculation to obtain

\begin{split}&C_{\mu}\int|\nabla_{z}P_{0,T^{\prime}}(f(x,\cdot))(z)|^{2}\Big{|}_{z=x}(P_{0,T^{\prime}}f(x,\cdot))^{-1}(x)\mu(\mathrm{d}x)\\ &{\stackrel{{\scriptstyle\eqref{eq:grad_est}}}{{\leq}}}C_{\mu}C_{0,T^{\prime}}\int(P_{0,T^{\prime}}(|\nabla_{y}f(x,\cdot)|)(x))^{2}(P_{0,T}f(x,\cdot))^{-1}(x)\mu(\mathrm{d}x)\\ &{\stackrel{{\scriptstyle\text{Jensen}}}{{\leq}}}C_{\mu}C_{0,T^{\prime}}\int P_{0,T^{\prime}}\Big{(}|\nabla_{y}f(x,\cdot)|^{2}/f(x,\cdot)\Big{)}(x)\mu(\mathrm{d}x)\\ &{\stackrel{{\scriptstyle\eqref{eq:diffusion_coupling}}}{{=}}}C_{\mu}C_{0,T^{\prime}}\int|\nabla_{y}f(x,y)|^{2}/f(x,y)\,\pi(\mathrm{d}x\mathrm{d}y).\end{split}

(49)

Plugging in (48)-(49) into (47) we obtain

\begin{split}\mathrm{Ent}_{\mu}(f_{0})&\leq C_{\mu}\int|\nabla_{x}f(x,y)|^{2}/f(x,y)\pi(\mathrm{d}x\mathrm{d}y)\\ &+C_{\mu}C_{0,T^{\prime}}\int|\nabla_{y}f(x,y)|^{2}/f(x,y)\,\pi(\mathrm{d}x\mathrm{d}y).\end{split}

(50)

To conclude the proof, we combine the strength of the bounds (46) and (50) with the entropy decomposition formula (44) to obtain

\begin{split}\mathrm{Ent}_{\pi}(f)&\leq C_{\mu}\int|\nabla_{x}f(x,y)|^{2}/f(x,y)\pi(\mathrm{d}x\mathrm{d}y)\\ &+(\tilde{C}_{0,T^{\prime}}/2+C_{\mu}C_{0,T^{\prime}})\int|\nabla_{y}f(x,y)|^{2}/f(x,y)\,\pi(\mathrm{d}x\mathrm{d}y)\\ &\leq\max\{C_{\mu},C_{\mu}C_{0,T^{\prime}}+\tilde{C}_{0,T^{\prime}}/2\}\int|\nabla f|^{2}/f\,\mathrm{d}\pi,\end{split}

which is the desired result. ∎

In the next lemma, we represent an approximated version of the static Schrödinger bridge (2) through a diffusion process. It is a classical result saying that Schrödinger bridges are indeed Doob’s h-transforms, see e.g. [31, Sec. 4][16].

Lemma 4.2.

Let Assumption 1.1 hold and $\hat{\pi}$ be the static Schrödinger bridge (2). For any $\varepsilon\in(0,T)$ define

\hat{\pi}^{\varepsilon}(\mathrm{d}x\,\mathrm{d}y):=(2\pi(T-\varepsilon))^{-d/2}\exp\Big{(}-\varphi(x)-U^{{T},{\psi}}_{T-\varepsilon}(y)-\frac{|y-x|^{2}}{2(T-\varepsilon)}\Big{)}\mathrm{d}x\,\mathrm{d}y.

(51)

Then $\hat{\pi}^{\varepsilon}$ has the form (43) for $T^{\prime}=T-\varepsilon$ , where $(P_{s,t})_{0\leq s\leq t\leq T-\varepsilon}$ is the time-inhomogeneous semigroup associated with the generator acting on smooth test functions as follows

f\mapsto\frac{1}{2}\Delta f-\langle\nabla U^{{T},{\psi}}_{t},\nabla f\rangle,\quad\,\,t\in[0,T-\varepsilon].

(52)

Proof.

Let $\psi$ be the Schrödinger potential in (8). Invoking Theorem 1.2 we obtain that (27) holds with $\alpha=\alpha_{\psi}$ . Moreover, it is well known that (see [33, Eq 3.3] for example) $\ell_{U^{{T},{\psi}}_{t}}\leq(T-t)^{-1}$ , i.e. the Hessian of $U^{{T},{\psi}}_{t}$ is bounded above by $(T-t)^{-1}$ . Thereofore, the vector field $[0,T-\varepsilon]\times\mathbb{R}^{d}\ni(t,x)\mapsto-\nabla U^{{T},{\psi}}_{t}(x)$ is uniformly Lipschitz w.r.t. the space variable for any $\varepsilon\in(0,T)$ . This classically implies existence and uniqueness of strong solutions for the stochastic differential equation

\mathrm{d}X_{t}=-\nabla U^{{T},{\psi}}_{t}(X_{t})\mathrm{d}t+\mathrm{d}B_{t},\quad X_{0}\sim\mu

(53)

over any time interval $[0,T-\varepsilon]$ and we shall denote by $\mathbb{Q}^{\varepsilon}$ the law of the solution on $C([0,T-\varepsilon];\mathbb{R}^{d})$ . Next, we denote by $\mathbb{P}^{\varepsilon}$ the law on law on $C([0,T-\varepsilon];\mathbb{R}^{d})$ of a Brownian motion started at $\mu$ . By Girsanov’s Theorem, see [29] for a version that applies in the current setting, we know that

\frac{\mathrm{d}\mathbb{Q}^{\varepsilon}}{\mathrm{d}\mathbb{P}^{\varepsilon}}(\omega)=\exp\Big{(}-\int_{0}^{T-\varepsilon}\nabla U^{{T},{\psi}}_{t}(\omega_{t})\mathrm{d}\omega_{t}-\frac{1}{2}\int_{0}^{T-\varepsilon}|\nabla U^{{T},{\psi}}_{t}(\omega_{t})|^{2}\mathrm{d}t\Big{)}\quad\mathbb{P}^{\varepsilon}-\text{a.s.},

where we denote by $\omega$ the typical element of the canonical space $C([0,T-\varepsilon];\mathbb{R}^{d})$ . Using Itô formula we rewrite the above as

\begin{split}\frac{\mathrm{d}\mathbb{Q}^{\varepsilon}}{\mathrm{d}\mathbb{P}^{\varepsilon}}(\omega)&=\exp\Big{(}U^{{T},{\psi}}_{0}(\omega_{0})-U^{{T},{\psi}}_{T-\varepsilon}(\omega_{T-\varepsilon})+\int_{0}^{T-\varepsilon}\Big{(}\partial_{t}U^{{T},{\psi}}_{t}+\frac{1}{2}\Delta U^{{T},{\psi}}_{t}-\frac{1}{2}|\nabla U^{{T},{\psi}}_{t}|^{2}\Big{)}(\omega_{t})\mathrm{d}t\Big{)}\\ &=\exp(U^{\mu}(\omega_{0})-\varphi(\omega_{0})-U^{{T},{\psi}}_{T-\varepsilon}(\omega_{T-\varepsilon}))\end{split}

where we used the Schrödinger system (8) and the HJB equation (15) to obtain the last expression. Indeed because of Theorem 1.2 one can deduce that $[0,T]\times\mathbb{R}^{d}\ni(t,x)\mapsto U^{{T},{\psi}}_{t}(x)$ , is a classical solution of (15) by differentiating under the integral sign in (13). From this, we deduce that

\frac{\mathrm{d}\mathbb{Q}^{\varepsilon}_{0,T-\varepsilon}}{\mathrm{d}\mathbb{P}^{\varepsilon}_{0,T-\varepsilon}}(x,y)=\exp\big{(}U^{\mu}(x)-\varphi(x)-U^{{T},{\psi}}_{T-\varepsilon}(y)\big{)}\quad\mathbb{P}^{\varepsilon}_{0T}-\text{a.s.},

where $\mathbb{Q}^{\varepsilon}_{0,T-\varepsilon}$ (resp. $\mathbb{P}^{\varepsilon}_{0,T-\varepsilon}$ ) denotes the joint distribution of $\mathbb{Q}^{\varepsilon}$ (resp. $\mathbb{P}^{\varepsilon}$ ) at times $0$ and $T-\varepsilon$ . Since

\mathbb{P}^{\varepsilon}_{0,T-\varepsilon}(\mathrm{d}x\,\mathrm{d}y)=(2\pi(T-\varepsilon))^{-d/2}\exp(-U^{\mu}(x))\exp\Big{(}-\frac{|y-x|^{2}}{2(T-\varepsilon)}\Big{)}\mathrm{d}x\,\mathrm{d}y,

we conclude that

\mathbb{Q}^{\varepsilon}_{0,T-\varepsilon}(\mathrm{d}x\mathrm{d}y)=(2\pi(T-\varepsilon))^{-d/2}\exp\Big{(}-\varphi(x)-U^{{T},{\psi}}_{T-\varepsilon}(y)-\frac{|y-x|^{2}}{2(T-\varepsilon)}\Big{)}\mathrm{d}x\,\mathrm{d}y.

But then $\mathbb{Q}^{\varepsilon}_{0,T-\varepsilon}=\hat{\pi}^{\varepsilon}$ , where $\hat{\pi}^{\varepsilon}$ is defined at (51). To conclude, we recall that $\mathbb{Q}_{0,T-\varepsilon}$ has the desired form (43) where $(P_{s,t})_{0\leq s\leq t\leq T-\varepsilon}$ is indeed the semigroup generated by (52). ∎

Proof of Theorem 1.3.

We know by Lemma 4.2 that $\hat{\pi}^{\varepsilon}$ has the form (43) for $T^{\prime}=T-\varepsilon$ and the inhomogeneous semigroup generated by (52). We now set for $t\in[0,T]$

\alpha^{\psi}_{t}=\inf_{x,v\in\mathbb{R}^{d},|v|=1}\langle v,\nabla^{2}U^{{\psi},{T}}_{t}(x),v\rangle

and proceed to estimate $\alpha^{\psi}_{t}$ from below. Invoking Theorem 1.2 we obtain that (27) holds with $\alpha=\alpha_{\psi}$ . That is to say, the estimate

\kappa_{U^{T,\psi}_{t}}(r)\geq\frac{\alpha^{\psi}}{1+(T-t)\alpha^{\psi}}-\frac{r^{-1}f_{L}(r)}{(1+(T-t)\alpha^{\psi})^{2}}

holds uniformly on $r>0$ and $0\leq t\leq T$ . From here, using the concavity of $f_{L}$ and $f^{\prime}_{L}(0)=L$ we obtain

\alpha^{\psi}_{t}\geq\frac{\alpha^{\psi}}{1+(T-t)\alpha^{\psi}}-\frac{L}{(1+(T-t)\alpha^{\psi})^{2}}.

We can now apply Lemma 4.1 to obtain that $\hat{\pi}^{\varepsilon}$ satisfies LSI with constant given by

\eta_{\varepsilon}:=\max\{2C_{\mu},{2}C_{\mu}C_{0,T-\varepsilon}+\int_{0}^{T-\varepsilon}C_{t,T-\varepsilon}\mathrm{d}t\}.

Next, observe that the weak convexity bounds on $\psi$ of Theorem 1.2 imply that $U^{{T},{\psi}}_{T-\varepsilon}$ converges to $\psi$ pointwise as $\varepsilon\rightarrow 0$ . But then, we have that $\hat{\pi}^{\varepsilon}$ converges in total variation to $\hat{\pi}$ by Scheffé’s Lemma. Take now any continuously differentiable function $f$ bounded above and below by positive constants and with bounded derivative. Letting $\varepsilon\rightarrow 0$ in

\mathrm{Ent}_{\hat{\pi}^{\varepsilon}}(f)\leq\frac{\eta_{\varepsilon}}{2}\int\frac{|\nabla f|^{2}}{f}(x,y)\,\hat{\pi}^{\varepsilon}(\mathrm{d}x\,\mathrm{d}y)

and using the convergence in variation of $\hat{\pi}^{\varepsilon}$ obtain that LSI holds for $f$ under $\hat{\pi}$ with the desired constant. The extension to a general positive and continuously differentiable functions is achieved through a standard approximation argument where $f(\cdot)$ is approximated by $f\,\chi(N^{-1}\cdot)+N^{-1}$ with $\chi(\cdot)$ a smooth cutoff function.

∎

5 Appendix

Proposition 5.1.

Assume that $U$ satisfies (4) for some $\alpha>0,L^{\prime},R\geq 0$ . Then

\kappa_{U}(r)\geq\alpha-r^{-1}f_{L}(r)\quad\forall r>0.

with $L$ given by (7).

Proof.

If $r>R$ the claim is a simple consequence of $f_{L}(r)\geq 0$ . If $r\leq R$ , using (18) to get that $r^{\prime}\mapsto r^{\prime-1}f_{L}(r^{\prime})$ is non increasing on $(0,+\infty)$ , we obtain

r^{-1}f_{L}(r)\geq R^{-1}f_{L}(R)=L^{\prime},

from which the conclusion follows. ∎

Proposition 5.2.

Let Assumption 1.1 hold and assume furthermore that there exist $\varepsilon,\gamma^{\prime}>0$ such that

\int\exp(\gamma^{\prime}|x|^{1+\varepsilon})\mu(\mathrm{d}x)<+\infty.

(54)

Moreover, let $\bar{\psi}$ be as in (32). Then $\bar{\psi}$ is twice differentiable and we have

\nabla^{2}\bar{\psi}(y)=\frac{1}{T}\mathrm{Cov}_{X\sim\hat{\pi}^{y}}(X)\quad\forall y\in\mathbb{R}^{d},

where $\hat{\pi}^{y}$ is given by (34).

Proof.

From (8) we obtain that

\bar{\psi}(y)+\frac{d}{2}\log(\pi)=T\log\int_{\mathbb{R}^{d}}\exp\Big{(}-\varphi(x)-\frac{|x|^{2}}{2T}+\frac{\langle x,y\rangle}{T}\Big{)}\mathrm{d}x.

(55)

From Assumption 1.1, (8) and (54) it follows that

\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}\exp\Big{(}\gamma^{\prime}|x|^{1+\varepsilon}-\varphi(x)-\psi(y)-\frac{|x-y|^{2}}{2T}\Big{)}\mathrm{d}x\,\mathrm{d}y<+\infty,

whence the existence of some $y^{\prime}$ such that

\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}\exp\left(\gamma^{\prime}|x|^{1+\varepsilon}-\varphi(x)-\frac{|x|^{2}}{2T}+\frac{\langle x,y^{\prime}\rangle}{T}\right)\mathrm{d}x<+\infty.

From this, we easily obtain that for all $\gamma<\gamma^{\prime}$

\int_{\mathbb{R}^{d}}\exp\left(\gamma|x|^{1+\varepsilon}-\varphi(x)-\frac{|x|^{2}}{2T}+\frac{\langle x,y\rangle}{T}\right)\mathrm{d}x<+\infty\quad\forall y\in\mathbb{R}^{d}.

(56)

Thanks to (56) we can apply the dominated convergence theorem and differentiate under the integral sign in (15) to obtain that $\bar{\psi}$ is differentiable and

\nabla\bar{\psi}(y)=\frac{\int x\exp(-\varphi(x)-\frac{|x|^{2}}{2T}+\frac{\langle x,y\rangle}{T})\mathrm{d}x}{\int\exp(-\varphi(\bar{x})-\frac{|\bar{x}|^{2}}{2T}+\frac{\langle\bar{x},y\rangle}{T})\mathrm{d}\bar{x}}\stackrel{{\scriptstyle\eqref{eq:cond_distr}}}{{=}}\mathbb{E}_{X\sim\hat{\pi}^{y}}[X]

Using once again (56) to differentiate under the integral sign in (55) we conclude that $\bar{\psi}$ is twice differentiable and that (33) holds. ∎

References

[1] Shigeki Aida and Ichiro Shigekawa. Logarithmic sobolev inequalities and spectral gaps: perturbation theory. Journal of Functional Analysis, 126(2):448–475, 1994.
[2] Dominique Bakry, Ivan Gentil, and Michel Ledoux. Analysis and geometry of Markov diffusion operators, volume 348. Springer Science & Business Media, 2013.
[3] Erhan Bayraktar, Stephan Eckstein, and Xin Zhang. Stability and sample complexity of divergence regularized optimal transport. arXiv preprint arXiv:2212.00367, 2022.
[4] Jean-David Benamou. Optimal transportation, modelling and numerical simulation. Acta Numerica, 30:249–325, 2021.
[5] Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel Peyré. Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015.
[6] Th Bodineau and B Helffer. The log-sobolev inequality for unbounded spin systems. Journal of functional analysis, 166(1):168–178, 1999.
[7] H.J. Brascamp and E.H. Lieb. On extensions of the Brunn-Minkowski and Prékopa-leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. Journal of Functional Analysis, 22(4):366–389, 1976.
[8] Luis A Caffarelli. Monotonicity properties of optimal transportation and the FKG and related inequalities. Communications in Mathematical Physics, 214(3):547–563, 2000.
[9] Y. Chen, T. Georgiou, and M. Pavon. On the relation between optimal transport and Schrödinger bridges: A stochastic control viewpoint. preprint arXiv:1412.4430, 2014.
[10] Yongxin Chen, Tryphon T Georgiou, and Michele Pavon. Stochastic Control Liaisons: Richard Sinkhorn Meets Gaspard Monge on a Schrodinger Bridge. SIAM Review, 63(2):249–313, 2021.
[11] Sinho Chewi and Aram-Alexandre Pooladian. An entropic generalization of caffarelli’s contraction theorem via covariance inequalities. arXiv preprint arXiv:2203.04954, 2022.
[12] Alberto Chiarini, Giovanni Conforti, Giacomo Greco, and Luca Tamanini. Gradient estimates for the Schrödinger potentials: convergence to the Brenier map and quantitative stability. arXiv preprint arXiv:2207.14262, 2022.
[13] Giovanni Conforti. A second order equation for Schrödinger bridges with applications to the hot gas experiment and entropic transportation cost. Probability Theory and Related Fields, 174(1-2):1–47, 2019.
[14] Giovanni Conforti. Coupling by reflection for controlled diffusion processes: turnpike property and large time behavior of Hamilton Jacobi Bellman equations. Annals of Applied Probability (to appear), 2022.
[15] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages 2292–2300, 2013.
[16] P. Dai Pra and M. Pavon. On the Markov processes of Schrödinger, the Feynman-Kac formula and stochastic control. In Realization and Modelling in System Theory, volume 3, pages 497–504. Springer, 1990.
[17] Valentin De Bortoli, Arnaud Doucet, Jeremy Heng, and James Thornton. Simulating diffusion bridges with score matching. arXiv preprint arXiv:2111.07243, 2021.
[18] Valentin De Bortoli, James Thornton, Jeremy Heng, and Arnaud Doucet. Diffusion schrödinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34, 2021.
[19] George Deligiannidis, Valentin De Bortoli, and Arnaud Doucet. Quantitative uniform stability of the iterative proportional fitting procedure. arXiv preprint arXiv:2108.08129, 2021.
[20] Hacene Djellout, Arnaud Guillin, and Liming Wu. Transportation cost-information inequalities and applications to random dynamical systems and diffusions. The Annals of Probability, 32(3B):2702–2732, 2004.
[21] Joseph Doob. Conditional Brownian motion and the boundary limits of harmonic functions. Bulletin de la Société Mathématique de France, 85:431–458, 1957.
[22] Andreas Eberle. Reflection couplings and contraction rates for diffusions. Probability Theory and Related Fields, 166(3-4):851–886, 2016.
[23] Stephan Eckstein and Marcel Nutz. Quantitative stability of regularized optimal transport and convergence of sinkhorn’s algorithm. arXiv preprint arXiv:2110.06798, 2021.
[24] Max Fathi, Nathael Gozlan, and Maxime Prodhomme. A proof of the caffarelli contraction theorem via entropic regularization. Calculus of Variations and Partial Differential Equations, 59(96), 2020.
[25] Ivan Gentil, Christian Léonard, and Luigia Ripani. Dynamical aspects of the generalized schrödinger problem via otto calculus–a heuristic point of view. Revista Matemática Iberoamericana, 36(4):1071–1112, 2020.
[26] Promit Ghosal and Marcel Nutz. On the convergence rate of sinkhorn’s algorithm. arXiv preprint 2212.06000, 2022.
[27] Natalie Grunewald, Felix Otto, Cédric Villani, and Maria G Westdickenberg. A two-scale approach to logarithmic sobolev inequalities and the hydrodynamic limit. In Annales de l’IHP Probabilités et statistiques, volume 45, pages 302–351, 2009.
[28] Richard Holley and Daniel W Stroock. Logarithmic sobolev inequalities and stochastic ising models. 1986.
[29] Christian Léonard. Girsanov theory under a finite entropy condition. In C. Donati-Martin, A. Lejay, and A. Rouault, editors, Séminaire de Probabilités XLIV, volume 2046 of Lecture Notes in Mathematics, pages 429–465. Springer, 2012.
[30] Christian Léonard. Some properties of path measures. In Séminaire de Probabilités XLVI, pages 207–230. Springer, 2014.
[31] Christian Léonard. A survey of the Schrödinger problem and some of its connections with optimal transport. Discrete and Continuous Dynamical Systems, 34(4):1533–1574, 2014.
[32] Torgny Lindvall and L Cris G Rogers. Coupling of multidimensional diffusions by reflection. The Annals of Probability, pages 860–872, 1986.
[33] Dan Mikulincer and Yair Shenfeld. The brownian transport map. arXiv preprint arXiv:2111.11521, 2021.
[34] Dan Mikulincer and Yair Shenfeld. On the Lipschitz properties of transportation along heat flows. arXiv preprint arXiv:2201.01382, 2022.
[35] Marcel Nutz and Johannes Wiesel. Entropic optimal transport: Convergence of potentials. Probability Theory and Related Fields, 184(1):401–424, 2022.
[36] Shige Peng. A general stochastic maximum principle for optimal control problems. SIAM J. Control Optim., 28(4):966–979, 1990.
[37] Gabriel Peyré and Marco Cuturi. Computational optimal transport. Foundations and Trends in Machine Learning, 11(5-6):355–607, 2019.
[38] Erwin Schrödinger. Über die Umkehrung der Naturgesetze. Sitzungsberichte Preuss. Akad. Wiss. Berlin. Phys. Math., 144:144–153, 1931.
[39] Yuyang Shi, Valentin De Bortoli, George Deligiannidis, and Arnaud Doucet. Conditional simulation using diffusion schrödinger bridges. arXiv preprint arXiv:2202.13460, 2022.

Weak semiconvexity estimates for Schrödinger potentials and logarithmic Sobolev inequality for Schrödinger bridges

Abstract

Mathematics Subject Classification (2020)

1 Introduction and statement of the main results

Organization

Assumption 1.1.

Remark 1.1.

Remark 1.2.

The Schrödinger system

Weak semiconvexity and semiconcavity bounds for Schrödinger potentials

Theorem 1.2.

Corollary 1.1.

Remark 1.3.

Remark 1.4.

Remark 1.5.

Theorem 1.3.

Remark 1.6.

Remark 1.7.

2 Invariant sets of weakly convex functions for the HJB flow

Theorem 2.1.

Proof.

Lemma 2.1.

Proof.

3 Second order bounds for Schrödinger potentials

3.1 Weak semiconvexity of ψ\psi implies weak semiconcavity of φ\varphi

Lemma 3.1.

Proof.

3.2 Weak semiconcavity of φ\varphi implies weak semiconvexity of ψ\psi

Lemma 3.2.

Proof.

Lemma 3.3.

Proof.

3.3 Proof of Theorem 1.2

Proof of Theorem 1.2.

Proof of Corollary 1.1.

4 Logarithmic Sobolev inequality for Schrödinger bridges

Local LSIs and gradient estimates

Lemma 4.1.

Proof.

Lemma 4.2.

Proof.

Proof of Theorem 1.3.

5 Appendix

Proposition 5.1.

Proof.

Proposition 5.2.

Proof.

References

3.1 Weak semiconvexity of $\psi$ implies weak semiconcavity of $\varphi$

3.2 Weak semiconcavity of $\varphi$ implies weak semiconvexity of $\psi$