This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Weak semiconvexity estimates for Schrödinger potentials and logarithmic Sobolev inequality for Schrödinger bridges

Giovanni Conforti CMAP, CNRS, Ecole polytechnique, Institut Polytechnique de Paris, 91120 Palaiseau, France E-mail address: giovanni.conforti@polytechnique.edu. Research supported by the ANR project ANR-20-CE40-0014.
Abstract

We investigate the quadratic Schrödinger bridge problem, a.k.a. Entropic Optimal Transport problem, and obtain weak semiconvexity and semiconcavity bounds on Schrödinger potentials under mild assumptions on the marginals that are substantially weaker than log-concavity. We deduce from these estimates that Schrödinger bridges satisfy a logarithmic Sobolev inequality on the product space. Our proof strategy is based on a second order analysis of coupling by reflection on the characteristics of the Hamilton-Jacobi-Bellman equation that reveals the existence of new classes of invariant functions for the corresponding flow.

Mathematics Subject Classification (2020)

49Q22,49L12,35G50,60J60,39B62

1 Introduction and statement of the main results

The Schrödinger problem [38] (SP) is a statistical mechanics problem that consists in finding the most likely evolution of a cloud of independent Brownian particles conditionally to observations. Also known as Entopic Optimal Transport (EOT) problem and formulated with the help of large deviations theory as a constrained entropy minimization problem, it stands nowadays at the cross of several research lines ranging from functional inequalities [13, 25], statistical machine learning [15, 37], control engineering [9, 10], and numerics for PDEs [5, 4]. Given two probability distributions μ,ν\mu,\nu on d\mathbb{R}^{d}, the corresponding (quadratic) Schrödinger problem is

infπΠ(μ,ν)(π|R0T),\inf_{\pi\in\Pi(\mu,\nu)}\mathcal{H}(\pi|R_{0T}), (1)

where Π(μ,ν)\Pi(\mu,\nu) represents the set of couplings of μ\mu and ν\nu and (π|R0T)\mathcal{H}(\pi|R_{0T}) is the relative entropy of a coupling π\pi computed against the joint law R0TR_{0T} at times 0 and TT of a Brownian motion with initial law μ\mu. It is well known that under mild conditions on the marginals, the optimal coupling π^\hat{\pi}, called (static) Schrödinger bridge, is unique and admits the representation

π^(dxdy)=exp(φ(x)ψ(y))exp(|xy|22T)dxdy\hat{\pi}(\mathrm{d}x\,\mathrm{d}y)=\exp(-\varphi(x)-\psi(y))\exp\Big{(}-\frac{|x-y|^{2}}{2T}\Big{)}\mathrm{d}x\mathrm{d}y (2)

where φ,ψ\varphi,\psi are two functions, known as Schrödinger potentials [31] that can be regarded as proxies for the Brenier potentials of optimal transport, that are recovered in the short-time (T0T\rightarrow 0) limit [35, 12]. In this article we seek for convexity and concavity estimates for Schrödinger potentials. Such estimates have been recently established in [11] and [24] working under a set of assumptions that implies in particular log-concavity of at least one of the two marginals. Such assumption is crucial therein as it allows to profit from classical functional inequalities such as Prékopa-Leindler inequality and Brascamp-Lieb inequality. In particular, the estimates obtained in the above-mentioned works yield alternative proofs of Caffarelli’s contraction Theorem [8] in the short-time limit. The purpose of this work is twofold: in the first place we show at Theorem 1.2 that, for any fixed T>0T>0 it is possible to leverage the probabilistic interpretation of (1) to establish lower and upper bounds on the functions

φ(x)φ(y),xyandψ(x)ψ(y),xy\langle\nabla\varphi(x)-\nabla\varphi(y),x-y\rangle\quad\text{and}\quad\langle\nabla\psi(x)-\nabla\psi(y),x-y\rangle

that are valid for all x,ydx,y\in\mathbb{R}^{d} and do not require strict log-concavity of the marginals to hold, but still allow to recover the results of [11] as a special case. The second main contribution is to apply these bounds to prove that static Schrödinger bridges satisfy the logarithmic Sobolev inequality (LSI for short) at Theorem 1.3. In our main results we shall quantify the weak semiconvexity of a potential U:dU:\mathbb{R}^{d}\longrightarrow\mathbb{R} appealing to the function κU\kappa_{U}, defined as follows:

κU:(0,+),κU(r)=inf{|xy|2U(x)U(y),xy:|xy|=r}.\kappa_{U}:(0,+\infty)\longrightarrow\mathbb{R},\quad\kappa_{U}(r)=\inf\{|x-y|^{-2}\langle\nabla U(x)-\nabla U(y),x-y\rangle:|x-y|=r\}. (3)

κU(r)\kappa_{U}(r) may be regarded as an averaged or integrated convexity lower bound for UU for points that are at distance rr. This function is often encountered in applications of the coupling method to the study of the long time behavior of Fokker-Planck equations [22, 32]. Obviously κU0\kappa_{U}\geq 0 is equivalent to the convexity of UU, but working with non-uniform lower bounds on κU\kappa_{U} allows to design efficient generalizations of the classical notion of convexity. A commonly encountered sufficient condition on κU\kappa_{U} ensuring the exponential trend to equilibrium of the Fokker-Planck equation

tμt12Δμt(Uμt)=0\partial_{t}\mu_{t}-\frac{1}{2}\Delta\mu_{t}-\nabla\cdot\big{(}\nabla U\,\mu_{t}\big{)}=0

is the following

κU(r){α,if r>R,αL,if rR,\kappa_{U}(r)\geq\begin{cases}\alpha,\quad&\mbox{if $r>R$,}\\ \alpha-L^{\prime},\quad&\mbox{if $r\leq R$,}\end{cases} (4)

for some α>0,L,R0.\alpha>0,L^{\prime},R\geq 0. In this work, we refer to assumptions of the form (4) and variants thereof as to weak convexity assumptions and our main result require an assumption of this kind, namely (6) below, that is shown to be no more demanding than (4) (see Proposition 5.1), and is expressed through a rescaled version of the hyperbolic tangent function. These functions play a special role in this work since, as we show at Theorem 2.1, they define a weak convexity property that propagates backward along the flow of the Hamilton-Jacobi-Bellman (HJB) equation

tφt+12Δφt12|φt|2=0.\partial_{t}\varphi_{t}+\frac{1}{2}\Delta\varphi_{t}-\frac{1}{2}|\nabla\varphi_{t}|^{2}=0.

Such invariance property represents the main innovation in our proof strategy: the propagation of the classical notion of convexity along the HJB equation used in [24] as well as the Brascamp-Lieb inequality employed in [11] are both consequences of the Prékopa-Leindler inequality, see [7]. In the framework considered here, such a powerful tool becomes ineffective due to the possible lack of log-concavity in both marginals. To overcome this obstacle we develop a probabilistic approach based on a second order analysis of coupling by reflection on the solutions of the SDE

dXt=φt(Xt)dt+dBt,\mathrm{d}X_{t}=-\nabla\varphi_{t}(X_{t})\mathrm{d}t+\mathrm{d}B_{t},

also known as characteristics of the HJB equation, revealing the existence of novel classes of weakly convex functions that are invariant for the HJB flow. This property, besides being a key ingredient in the proof of Theorem 1.2 is interesting on its own, and can be generalized in several directions. Remarkably, Theorem 1.2 can be aplied to show that Schrödinger bridges satisfy LSI: this is not a trivial task since static Schrödinger bridges are not known to be log-concave probability measures in general, not even in the case when both marginals are strongly log-concave. For this reason, one cannot infer LSI directly from Theorem 1.2 and the Bakry-Émery criterion. However, reintroducing a dynamical viewpoint and representing Schrödinger bridges as Doob hh-transforms of Brownian motion [21] reveals all the effectiveness of Theorem 1.2 that gives at once gradient estimates and local (or conditional, or heat kernel) logarithmic Sobolev inequalities and gradient estimates for the hh-transform semigroup. By carefully mixing the local inequalities with the help of gradient estimates, we finally establish at Theorem 1.3 LSI for π^\hat{\pi}, that is our second main contribution. It is worth noticing that in the T+T\rightarrow+\infty asymptotic regime, our approach to LSI can be related to the techniques recently developed in [34] to construct Lipschitz transports between the Gaussian distribution and probability measures that are approximately log-concave in a suitable sense. Because of the intrinsic probabilistic nature of our proof strategy, our ability to compensate for the lack of log-concavity in the marginals depends on the size of the regularization parameter TT, and indeed vanishes as T0T\rightarrow 0. Thus, our main results do not yield any sensible convexity/concavity estimate on Brenier potentials that improves on Caffarelli’s Theorem. On the other hand, the semiconvexity bounds of Theorem 1.2 find applications beyond LSI, that we shall address in future works. For example, following classical arguments put forward in [20], they can be shown to imply transport-entropy (a.k.a. Talagrand) inequalities on path space for dynamic Schrödinger bridges. Moreover, building on the results of [13], they shall imply new semiconvexity estimates for the Fisher information along entropic interpolations. It is also natural to conjecture that these bounds will provide with new stability estimates for Schrödinger bridges under marginal perturbations, thus addressing a question that has recently drawn quite some attention, see [19, 12, 23, 26, 3] for example. Finally, we point out that Hessian bounds for potentials can play a relevant role in providing theoretical guarantees for learning algorithms that make use of dynamic Schrödinger bridges and conditional processes. In this framework, leveraging Doob’s hh-transform theory and time reversal arguments, they directly translate into various kinds of quantitative stability estimates for the diffusion processes used for sampling, see e.g. [18, 17, 39].

Organization

The document is organized as follows. In remainder of the first section we state and discuss our main hypothesis and results. In Section 2 we study invariant sets for the HJB flow. Sections 3 and 4 are devoted to the proof of our two main results, Theorem 1.2 and Theorem 1.3. Technical results and background material are collected in the Appendix section.

Assumption 1.1.

We assume that μ,ν\mu,\nu admit a positive density against the Lebesgue measure which can be written in the form exp(Uμ)\exp(-U^{\mu}) and exp(Uν)\exp(-U^{\nu}) respectively. Uμ,UνU^{\mu},U^{\nu} are of class C2(d)C^{2}(\mathbb{R}^{d}).

  1. (𝐇𝟏)(\mathbf{H1})

    μ\mu has finite second moment and finite relative entropy against the Lebsegue measure. Moreover, there exists βμ>0\beta_{\mu}>0 such that

    v,2Uμ(x)vβμ|v|2x,vd.\langle v,\nabla^{2}U^{\mu}(x)v\rangle\leq\beta_{\mu}|v|^{2}\quad\forall x,v\in\mathbb{R}^{d}. (5)

One of the following holds

  1. (𝐇𝟐)(\mathbf{H2})

    There exist αν,L>0\alpha_{\nu},L>0 such that

    κUν(r)ανr1fL(r)r>0,\kappa_{U^{\nu}}(r)\geq\alpha_{\nu}-r^{-1}f_{L}(r)\quad\forall r>0, (6)

    where the function fLf_{L} is defined for any L>0L>0 by:

    fL:[0,+][0,+],fL(r)=2L1/2tanh((rL1/2)/2).f_{L}:[0,+\infty]\longrightarrow[0,+\infty],\quad{f_{L}(r)=2\,L^{1/2}\tanh\Big{(}(rL^{1/2})/2\Big{)}}.
  2. (𝐇𝟐)(\mathbf{H2^{\prime}})

    There exist αν>0,R,L0\alpha_{\nu}>0,R,L^{\prime}\geq 0 such that

    κUν(r){αν,if r>R,ανL,if rR.\kappa_{U^{\nu}}(r)\geq\begin{cases}\alpha_{\nu},\quad&\mbox{if $r>R$,}\\ \alpha_{\nu}-L^{\prime},\quad&\mbox{if $r\leq R$.}\end{cases}

    In this case, we set

    L=inf{L¯:R1fL¯(R)L}.L=\inf\{\bar{L}:R^{-1}f_{\bar{L}}(R)\geq L^{\prime}\}. (7)

Clearly, imposing (6) is less restrictive than asking that ν\nu is strongly log-concave.

Remark 1.1.

We show that (𝐇𝟐)(\mathbf{H2^{\prime}}) implies (𝐇𝟐)(\mathbf{H2}) at Proposition 5.1. However, since (𝐇𝟐)(\mathbf{H2^{\prime}}) is more familiar to most readers we prefer to keep a statement Theorem 1.2 that makes use of this assumption.

Remark 1.2.

The requirement that the density of ν\nu is strictly positive everywhere could be dropped at the price of additional technicalities. For μ\mu, such requirement is a consequence of (5).

The Schrödinger system

Let (Pt)t0(P_{t})_{t\geq 0} the semigroup generated by a dd-dimensional standard Brownian motion. For given marginals, μ,ν\mu,\nu and T>0T>0 the Schrödinger system is the following system of coupled non-linear equations

{φ(x)=Uμ(x)+logPTexp(ψ)(x),xd,ψ(y)=Uν(y)+logPTexp(φ)(y),yd.\begin{cases}\varphi(x)=U^{\mu}(x)+\log P_{T}\exp(-\psi)(x),\quad x\in\mathbb{R}^{d},\\ \psi(y)=U^{\nu}(y)+\log P_{T}\exp(-\varphi)(y),\quad y\in\mathbb{R}^{d}.\end{cases} (8)

Under Assumption 1.1, it is known that the Schrödinger system admits a solution (φ,ψ)(\varphi,\psi), and that if (φ¯,ψ¯)(\bar{\varphi},\bar{\psi}) is another solution, then there exists cc\in\mathbb{R} such that (φ,ψ)=(φ¯+c,ψ¯c)(\varphi,\psi)=(\bar{\varphi}+c,\bar{\psi}-c), see [35, sec. 2][31] and references therein. The potentials φ,ψ\varphi,\psi are known as Schrödinger potentials or entropic Brenier potentials in the literature.

Weak semiconvexity and semiconcavity bounds for Schrödinger potentials

In the rest of the article, given a scalar function UU, any pointwise lower bound on κU\kappa_{U} implying in particular that

lim infr+κU(r)>\liminf_{r\rightarrow+\infty}\kappa_{U}(r)>-\infty

shall be called a weak semiconvexity bound for UU. Next, in analogy with (3) we introduce for a differentiable U:dU:\mathbb{R}^{d}\longrightarrow\mathbb{R} the function U\ell_{U} as follows:

U:(0,+),U(r)=sup{|xy|2U(x)U(y),xy:|xy|=r},\ell_{U}:(0,+\infty)\longrightarrow\mathbb{R},\quad\ell_{U}(r)=\sup\{|x-y|^{-2}\langle\nabla U(x)-\nabla U(y),x-y\rangle:|x-y|=r\},

and call a weak semiconcavity bound for UU any pointwise upper bound for U\ell_{U} implying in particular that

lim supr+U(r)<+.\limsup_{r\rightarrow+\infty}\ell_{U}(r)<+\infty.

Our first main result is about weak semiconvexity and weak semiconcavity bounds for Schrödinger potentials.

Theorem 1.2.

Let Assumption 1.1 hold and (φ,ψ)(\varphi,\psi) be solutions of the Schrödinger system. Then φ,ψ\varphi,\psi are twice differentiable and for all r>0r>0 we have

κψ(r)αψr1fL(r),\kappa_{\psi}(r)\geq\alpha_{\psi}-r^{-1}f_{L}(r), (9)
φ(r)βμαψ(1+Tαψ)+r1fL(r)(1+Tαψ)2,\ell_{\varphi}(r)\leq\beta_{\mu}-\frac{\alpha_{\psi}}{(1+T\alpha_{\psi})}+\frac{r^{-1}f_{L}(r)}{(1+T\alpha_{\psi})^{2}}, (10)

where αψ>αν1/T\alpha_{\psi}>\alpha_{\nu}-1/T can be taken to be the smallest solution of the fixed point equation

α=αν1T+G(α,2)2T2,α(αν1/T,+)\alpha=\alpha_{\nu}-\frac{1}{T}+\frac{G(\alpha,2)}{2T^{2}},\quad\alpha\in(\alpha_{\nu}-1/T,+\infty) (11)

where for all ααν1/T\alpha\geq\alpha_{\nu}-1/T:

G(α,u)=inf{s0:F(α,s)u},u>0,F(α,s)=βμs+sT(1+Tα)+s1/2fL(s1/2)(1+Tα)2,s>0.\begin{split}G(\alpha,u)&=\inf\{s\geq 0:F(\alpha,s)\geq u\},\quad u>0,\\ F(\alpha,s)&=\beta_{\mu}s+\frac{s}{T(1+T\alpha)}+\frac{s^{1/2}f_{L}(s^{1/2})}{(1+T\alpha)^{2}},\quad s>0.\end{split} (12)

There seems to be no closed form expression for the solutions of the fixed point equation (11). However, it is possible to obtain explicit non trivial upper and lower bounds.

Corollary 1.1.

Let α¯\bar{\alpha} be a fixed point solution of (11). Then we have

αν21TL2T2ανβμ+12(αν+LT2βμαν)2+4ανT2βμα¯αν21T+12αν2+4ανT2βμ.\frac{\alpha_{\nu}}{2}-\frac{1}{T}-\frac{L}{2T^{2}\alpha_{\nu}\beta_{\mu}}+\frac{1}{2}\sqrt{\big{(}\alpha_{\nu}+\frac{L}{T^{2}\beta_{\mu}\alpha_{\nu}}\big{)}^{2}+\frac{4\alpha_{\nu}}{T^{2}\beta_{\mu}}}\leq\bar{\alpha}\leq\frac{\alpha_{\nu}}{2}-\frac{1}{T}+\frac{1}{2}\sqrt{\alpha^{2}_{\nu}+\frac{4\alpha_{\nu}}{T^{2}\beta_{\mu}}}.
Remark 1.3.

It is proven at Lemma 3.2 that F(α,)F(\alpha,\cdot) is increasing on (0,+)(0,+\infty) for all α>1/T\alpha>-1/T. G(α,)G(\alpha,\cdot) is therefore its inverse.

Remark 1.4.

It is possible to check that if (H2) holds with L=0L=0, Theorem 1.2 recovers the conclusion of [11, Theorem 4],after a change of variable. To be more precise, the potentials (φε,ψε)(\varphi_{\varepsilon},\psi_{\varepsilon}) considered there are related to the couple (φ,ψ)(\varphi,\psi) appearing in (8) by choosing ε=T\varepsilon=T and setting

φε=ε(φUμ+||22ε),ψε=ε(ψUν+||22ε).\varphi_{\varepsilon}=\varepsilon\Big{(}\varphi-U^{\mu}+\frac{|\cdot|^{2}}{2\varepsilon}\Big{)},\quad\psi_{\varepsilon}=\varepsilon\Big{(}\psi-U^{\nu}+\frac{|\cdot|^{2}}{2\varepsilon}\Big{)}.
Remark 1.5.

The rescaled potential TφT\varphi converges to the Brenier potential in the small noise limit [35]. As explained in the introduction, one cannot deduce from Theorem 1.2 an improvement over Caffarelli’s Theorem [8] by letting T0T\rightarrow 0 in Theorem 1.2.

Our second main result is that the static Schrödinger bridge π^\hat{\pi} satisfies LSI with an explicit constant. We recall here that a probability measure ρ\rho on d\mathbb{R}^{d} satisfies LSI with constant CC if and only if for all positive differentiable function ff

Entρ(f)C2|f|2fdρ,whereEntρ(f)=flogfdρfdρlog(fdρ).\mathrm{Ent}_{\rho}(f)\leq\frac{C}{2}\int\frac{|\nabla f|^{2}}{f}\mathrm{d}\rho,\quad\text{where}\quad\mathrm{Ent}_{\rho}(f)=\int f\log f\mathrm{d}\rho-\int f\mathrm{d}\rho\,\log\Big{(}\int f\mathrm{d}\rho\Big{)}.
Theorem 1.3.

Let Assumption 1.1 hold and assume furthermore that μ\mu satisfies LSI with constant CμC_{\mu}. Then the static Schrödinger bridge π^\hat{\pi} satisfies LSI with constant

max{2Cμ,2CμC0,T+0TCt,Tdt},\max\left\{{2}\,C_{\mu},{2}\,C_{\mu}C_{0,T}+\int_{0}^{T}C_{t,T}\,\mathrm{d}t\right\},

where for all tTt\leq T

Ct,T:=exp(tTαsψds),αtψ:=αψ1+(Tt)αψL(1+(Tt)αψ)2,C_{t,T}:=\exp\Big{(}-\int_{t}^{T}\alpha^{\psi}_{s}\mathrm{d}s\Big{)},\quad\alpha^{\psi}_{t}:=\frac{\alpha_{\psi}}{1+(T-t)\alpha_{\psi}}-\frac{L}{(1+(T-t)\alpha_{\psi})^{2}},

and αψ\alpha_{\psi} is as in Theorem 1.2.

It is well known that LSI has a number of remarkable consequences including, but certainly not limited to, spectral gaps and concentration of measure inequalities for Lipschitz observables.

Remark 1.6.

It is worth noticing that if UνU^{\nu} is the sum of a strongly convex potential and a Lipschitz perturbation with second derivative bounded below, then (6) holds. Moreover, the perturbation needs not to be of bounded support, covering many interesting scenarios as double wells or multiple-wells potentials. At the moment of writing, it is not clear whether or not (6) implies that ν\nu is a bounded or log-Lipschitz perturbation of a strongly log-concave probability measure, a situation where the results of [28][1] already ensure that ν\nu satisfies a logarithmic Sobolev inequality.

Remark 1.7.

By taking μ\mu to be a Gaussian distribution, we obtain as a corollary of Theorem 1.3 that any probability ν\nu fulfilling (6) satisfies a logarithmic Sobolev inequality, though the constant we exhibit here is not optimal. Indeed, the LSI constant for ν\nu is deduced by marginalization from the LSI constant of π^\hat{\pi}. Obviously, estimating the LSI constant for π^\hat{\pi} is a much more difficult task than estimating the LSI constant of its marginal in particular because π^\hat{\pi} does not admit an explicit expression. However, looking at limiting cases, Schrödinger potentials become explicit, and the LSI constant for ν\nu can be more precisely estimated by constructing Lipschitz maps between some nice distribution and ν\nu arguing on the basis of Theorem 2.1. In particular, setting T=1T=1 and choosing μ=δ0\mu=\delta_{0} allows to recover the setting in which the ”Brownian transport map” [33] is constructed. Changing the reference measure into the stationary Ornstein-Uhlenbeck process, choosing μ\mu to be the standard Gaussian distribution and setting T=+T=+\infty allows to deploy the technique of heat flow maps [34]. These limiting scenarios are in some sense orthogonal to the scope of this work, that is to gain a better understanding on potentials when they cannot be computed in closed form. They are nevertheless of clear interest and will be analyzed in detail in forthcoming work.

2 Invariant sets of weakly convex functions for the HJB flow

We introduce the notation

UtT,g(x):=logPTtexp(g)(x)=log(1(2π(Tt))d/2exp(|yx|22(Tt)g(y))dy).U^{{T},{g}}_{t}(x):=-\log P_{T-t}\exp(-g)(x)=-\log\left(\frac{1}{(2\pi(T-t))^{d/2}}\int\exp\Big{(}-\frac{|y-x|^{2}}{2(T-t)}-g(y)\Big{)}\mathrm{d}y\right). (13)

With this notation at hand, (8) rewrites as follows:

{φ=UμU0T,ψ,ψ=UνU0T,φ.\begin{cases}\varphi=U^{\mu}-U^{{T},{\psi}}_{0},\\ \psi=U^{\nu}-U^{{T},{\varphi}}_{0}.\end{cases} (14)

It is well known that under mild conditions on gg, the map [0,T]×d(t,x)UtT,g(x)[0,T]\times\mathbb{R}^{d}\ni(t,x)\mapsto U^{{T},{g}}_{t}(x) is a classical solution of the HJB equation

{tφt(x)+12Δφt(x)12|φt|2(x)=0,φT(x)=g(x).\begin{cases}\partial_{t}\varphi_{t}(x)+\frac{1}{2}\Delta\varphi_{t}(x)-\frac{1}{2}|\nabla\varphi_{t}|^{2}(x)=0,\\ \varphi_{T}(x)=g(x).\end{cases} (15)

In the next theorem, we construct for any L>0L>0 a set of weakly convex functions L\mathcal{F}_{L} that is shown to be invariant for the HJB flow. In the proof, and in the rest of the paper we shall denote by [,][\cdot,\cdot] the quadratic covariation of two Itô processes.

Theorem 2.1.

Fix L>0L>0 and define

L={gC1(d):κg(r)r1fL(r)r>0}.\mathcal{F}_{L}=\{g\in C^{1}(\mathbb{R}^{d}):\kappa_{g}(r)\geq-r^{-1}f_{L}(r)\quad\forall r>0\}.

Then for all 0tT<+0\leq t\leq T<+\infty we have

gLUtT,gL.g\in\mathcal{F}_{L}\Rightarrow U^{{T},{g}}_{t}\in\mathcal{F}_{L}. (16)

The fact that convexity of the terminal condition in the HJB equation (15) implies convexity of the solution at all times is equivalent to the fact that the heat flow preserves log-concavity and has been known for a long time, see [7]. Theorem 2.1 offers a significant generalization of this property, by showing that there exist weaker properties than pointwise convexity that are transferred from the terminal condition to the solutions of the HJB equation. It can be checked that fLf_{L} solves the ODE

ff(r)+2f′′(r)=0r>0,f(0)=0,f(0)=L.ff^{\prime}(r)+2f^{\prime\prime}(r)=0\quad\forall r>0,\quad f(0)=0,f^{\prime}(0)=L. (17)

To verify the above, it suffices to compute

fL(r)=Lcosh2(rL1/2/2),fL′′(r)=L3/2sinh(rL1/2/2)cosh3(rL1/2/2)f_{L}^{\prime}(r)=\frac{L}{\cosh^{2}(rL^{1/2}/2)},\quad f^{\prime\prime}_{L}(r)=-L^{3/2}\frac{\sinh(rL^{1/2}/2)}{\cosh^{3}(rL^{1/2}/2)}

Moreover, we recall here some useful properties of fLf_{L}:

fL(r)>0,fL(r)>0,fL′′(r)<0,fL(r)rfL(r)r>0.f_{L}(r)>0,\,f^{\prime}_{L}(r)>0,\,f^{\prime\prime}_{L}(r)<0,\,f_{L}(r)\geq rf^{\prime}_{L}(r)\quad\forall r>0. (18)

The condition ff(r)+2f′′(r)0ff^{\prime}(r)+2f^{\prime\prime}(r)\leq 0 appears naturally in the main coupling argument of Theorem 2.1 and we have defined the functions fLf_{L} ad hoc in order to saturate this differential inequality. We are now in position to prove Theorem 2.1. As anticipated above, the proof relies on the analysis of coupling by reflection along the characteristics of the HJB equation. In doing so, we heavily rely on a connection with stochastic control. More precisely, the HJB characteristic

dXt=UtT,g(Xt)dt+dBt,X0=x,\mathrm{d}X_{t}=-\nabla U^{{T},{g}}_{t}(X_{t})\mathrm{d}t+\mathrm{d}B_{t},\quad X_{0}=x,

is the optimal process for the stochastic control problem

inf(us)s[0,T]𝔼[0T12|us|2ds+g(XTu,x)]s.tdXsu,x=usds+dBs,X0u,x=x.\begin{split}\inf_{(u_{s})_{s\in[0,T]}}\,\,&\mathbb{E}\Big{[}\int_{0}^{T}\frac{1}{2}|u_{s}|^{2}\mathrm{d}s+g(X^{u,x}_{T})\Big{]}\\ &\text{s.t}\quad\mathrm{d}X^{u,x}_{s}=u_{s}\mathrm{d}s+\mathrm{d}B_{s},\quad X^{u,x}_{0}=x.\end{split}

In particular, the stochastic maximum principle [36] for this control problem grants that the process (UtT,g(Xt))t[0,T](\nabla U^{{T},{g}}_{t}(X_{t}))_{t\in[0,T]} is a martingale, and we will use this fact in the proof of Theorem 2.1 giving a self contained proof for the reader’s convenience. In the recent article [14, Thm 1.3] Hessian bounds for HJB equations originating from stochastic control problems are obtained by means of coupling techniques. These are two-sided bounds that require an a priori knowledge of global Lipschitz bounds on solutions of the HJB equation to hold. The one-sided estimates of Theorem 2.1 do not require any Lipschitz property of solutions and their proof require finer arguments than those used in [14].

Proof.

We first assume w.l.o.g. that t=0t=0 and work under the additional assumption that

gC3(d),supxd|2g|(x)<+.g\in C^{3}(\mathbb{R}^{d}),\quad\sup_{x\in\mathbb{R}^{d}}|\nabla^{2}g|(x)<+\infty. (19)

Combining the above with gLg\in\mathcal{F}_{L}, we can justify differentiation under the integral sign in (13) and establish that

[0,T]×d(t,x)UtT,g(x)[0,T]\times\mathbb{R}^{d}\ni(t,x)\mapsto U^{T,g}_{t}(x)

is a classical solution of (15) such that

[0,T]×d(t,x)UtT,g(x)[0,T]\times\mathbb{R}^{d}\ni(t,x)\mapsto\nabla U^{T,g}_{t}(x)

is continuously differentiable in tt as well as twice continuously differentiable and uniformly Lipschitz in xx. Under these regularity assumptions, for given x,x^dx,\hat{x}\in\mathbb{R}^{d}, coupling by reflection of two diffusions started at xx and x^\hat{x} respectively and whose drift field is UtT,g-\nabla U^{{T},{g}}_{t} is well defined, see [22]. That is to say, there exist a stochastic process (Xt,X^t)0tT(X_{t},\hat{X}_{t})_{0\leq t\leq T} with (X0,X^0)=(x,x^)(X_{0},\hat{X}_{0})=(x,\hat{x}) and two Brownian motions (Bt,B^t)0tT(B_{t},\hat{B}_{t})_{0\leq t\leq T} all defined on the same probability space and such that

{dXt=UtT,g(Xt)dt+dBt,for 0tT,dX^t=UtT,g(X^t)dt+dB^t,for 0tτXt=X^t for t>τ,\begin{cases}\mathrm{d}X_{t}=-\nabla U^{{T},{g}}_{t}(X_{t})\mathrm{d}t+\mathrm{d}B_{t},\quad&\mbox{for $0\leq t\leq T$,}\\ \mathrm{d}\hat{X}_{t}=-\nabla U^{{T},{g}}_{t}(\hat{X}_{t})\mathrm{d}t+\mathrm{d}\hat{B}_{t},\quad&\mbox{for $0\leq t\leq\tau$, $X_{t}=\hat{X}_{t}$ for $t>\tau$,}\\ \end{cases}

where

et=rt1(XtX^t),rt=|XtX^t|,dB^t=dBt2etet,dBt\mathrm{e}_{t}=r^{-1}_{t}(X_{t}-\hat{X}_{t}),\quad r_{t}=|X_{t}-\hat{X}_{t}|,\quad\mathrm{d}\hat{B}_{t}=\mathrm{d}B_{t}-2\mathrm{e}_{t}\langle\mathrm{e}_{t},\mathrm{d}B_{t}\rangle

and

τ=inf{t[0,T]:Xt=X^t}T.\tau=\inf\{t\in[0,T]:X_{t}=\hat{X}_{t}\}\wedge T.

In particular, (B^t)0tT(\hat{B}_{t})_{0\leq t\leq T} is a Brownian motion by Lévy’s characterization. We now define

𝒰:[0,T]×d×d,𝒰t(x,x^)={|xx^|1UtT,g(x)UtT,g(x^),xx^,if xx^,0if x=x^,\mathcal{U}:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\longrightarrow\mathbb{R},\quad\mathcal{U}_{t}(x,\hat{x})=\begin{cases}|x-\hat{x}|^{-1}\langle\nabla U^{{T},{g}}_{t}(x)-\nabla U^{{T},{g}}_{t}(\hat{x}),x-\hat{x}\rangle,\quad&\mbox{if $x\neq\hat{x}$,}\\ 0\quad&\mbox{if $x=\hat{x}$,}\end{cases}

and proceed to prove that (𝒰t(Xt,X^t))0tT(\mathcal{U}_{t}(X_{t},\hat{X}_{t}))_{0\leq t\leq T} is a supermartingale. To this aim, we first deduce from (15) and Itô’s formula that

dUtT,g(Xt)=dMt,dUtT,g(X^t)=dM^t\mathrm{d}\nabla U^{{T},{g}}_{t}(X_{t})=\mathrm{d}M_{t},\quad\mathrm{d}\nabla U^{{T},{g}}_{t}(\hat{X}_{t})=\mathrm{d}\hat{M}_{t} (20)

where M,M^M_{\cdot},\hat{M}_{\cdot} are square integrable martingales. Indeed we find from Itô’s formula

dUtT,g(Xt)=(tUtT,g(Xt)2UtT,gUtT,g(Xt)+12ΔUtT,g(Xt))dt+2UtT,g(Xt)dBt=(15)2UtT,g(Xt)dBt,\begin{split}\mathrm{d}\nabla U^{{T},{g}}_{t}(X_{t})&=\Big{(}\partial_{t}\nabla U^{{T},{g}}_{t}(X_{t})-\nabla^{2}U^{{T},{g}}_{t}\nabla U^{{T},{g}}_{t}(X_{t})+\frac{1}{2}\Delta\nabla U^{{T},{g}}_{t}(X_{t})\Big{)}\mathrm{d}t+\nabla^{2}U^{{T},{g}}_{t}(X_{t})\cdot\mathrm{d}B_{t}\\ &\stackrel{{\scriptstyle\eqref{eq:HJB}}}{{=}}\nabla^{2}U^{{T},{g}}_{t}(X_{t})\cdot\mathrm{d}B_{t},\end{split}

and a completely analogous argument shows that UtT,g(X^t)\nabla U^{{T},{g}}_{t}(\hat{X}_{t}) is a square integrable martingale. We shall also prove separately at Lemma 2.1 that

det=rt1projet(UtT,g(Xt)UtT,g(X^t))dtt<τ,\mathrm{d}\mathrm{e}_{t}=-r^{-1}_{t}\mathrm{proj}_{\mathrm{e}^{\bot}_{t}}(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))\mathrm{d}t\quad\forall t<\tau, (21)

where projet\mathrm{proj}_{\mathrm{e}^{\bot}_{t}} denotes the orthogonal projection on the orthogonal complement of the linear subspace generated by et\mathrm{e}_{t}. Combining together (20) and (21) we find that d𝒰t(Xt,X^t)=0\mathrm{d}\mathcal{U}_{t}(X_{t},\hat{X}_{t})=0 for tτt\geq\tau, whereas for t<τt<\tau

d𝒰t(Xt,X^t)=UtT,g(Xt)UtT,g(X^t),det+et,d(UtT,g(Xt)UtT,g(X^t))+d[(UT,g(X)UT,g(X^)),e]t=(20)+(21)rt1|projet(UtT,g(Xt)UtT,g(X^t))|2dt+dM~t.\begin{split}\mathrm{d}\mathcal{U}_{t}(X_{t},\hat{X}_{t})&=\langle\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}),\mathrm{d}\mathrm{e}_{t}\rangle\\ &+\langle\mathrm{e}_{t},\mathrm{d}(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))\rangle+\mathrm{d}[(\nabla U^{{T},{g}}_{\cdot}(X_{\cdot})-\nabla U^{{T},{g}}_{\cdot}(\hat{X}_{\cdot})),\mathrm{e}_{\cdot}]_{t}\\ &\stackrel{{\scriptstyle\eqref{eq:propagation_2}+\eqref{eq:propagation_1}}}{{=}}-r^{-1}_{t}|\mathrm{proj}_{\mathrm{e}^{\bot}_{t}}(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))|^{2}\mathrm{d}t+\mathrm{d}\tilde{M}_{t}.\end{split}

proving that (𝒰t(Xt,X^t))0tT(\mathcal{U}_{t}(X_{t},\hat{X}_{t}))_{0\leq t\leq T} is a supermartingale. In the above, M~\tilde{M}_{\cdot} denotes a square integrable martingale and to obtain the last equality we used that the quadratic variation term vanishes because of (21). Next, arguing exactly as in [22, Eq. 60] (see also (25) below for more details) on the basis of Itô’s formula and invoking (17) we get

dfL(rt)=[fL(rt)𝒰t(Xt,X^t)+2fL′′(rt)]dt+dNt=(17)fL(rt)[𝒰t(Xt,X^t)+fL(rt)]dt+dNt,\begin{split}\mathrm{d}f_{L}(r_{t})&=[-f^{\prime}_{L}(r_{t})\mathcal{U}_{t}(X_{t},\hat{X}_{t})+2f^{\prime\prime}_{L}(r_{t})]\mathrm{d}t+\mathrm{d}N_{t}\\ &\stackrel{{\scriptstyle\eqref{eq:ODE_f}}}{{=}}-f^{\prime}_{L}(r_{t})[\mathcal{U}_{t}(X_{t},\hat{X}_{t})+f_{L}(r_{t})]\mathrm{d}t+\mathrm{d}N_{t},\end{split}

where NN_{\cdot} is a square integrable martingale. It then follows that

d(𝒰t(Xt,X^t)+fL(rt))fL(rt)(𝒰t(Xt,X^t)+fL(rt))dt+dNt+dM~t.\mathrm{d}\big{(}\mathcal{U}_{t}(X_{t},\hat{X}_{t})+f_{L}(r_{t})\big{)}\leq-f^{\prime}_{L}(r_{t})\Big{(}\mathcal{U}_{t}(X_{t},\hat{X}_{t})+f_{L}(r_{t})\Big{)}\mathrm{d}t+\mathrm{d}N_{t}+\mathrm{d}\tilde{M}_{t}. (22)

from which we deduce that the process

Γt=exp(0tfL(rs)ds)(𝒰t(Xt,X^t)+fL(rt))\Gamma_{t}=\exp\Big{(}\int_{0}^{t}f^{\prime}_{L}(r_{s})\mathrm{d}s\Big{)}\big{(}\mathcal{U}_{t}(X_{t},\hat{X}_{t})+f_{L}(r_{t})\big{)}

is a supermartingale and in particular is decreasing on average. This gives

|xx^|1U0T,g(x)U0T,g(x^),xx^+fL(|xx^|)=𝔼[Γ0]𝔼[ΓT]𝔼[exp(0TfL(rs)ds)(|XTX^T|κg(|XTX^T|)+fL(|XTX^T|))]0,\begin{split}&|x-\hat{x}|^{-1}\langle\nabla U^{{T},{g}}_{0}(x)-\nabla U^{{T},{g}}_{0}(\hat{x}),x-\hat{x}\rangle+f_{L}(|x-\hat{x}|)=\mathbb{E}[\Gamma_{0}]\\ &\geq\mathbb{E}[\Gamma_{T}]\geq\mathbb{E}\Big{[}\exp\Big{(}\int_{0}^{T}f^{\prime}_{L}(r_{s})\mathrm{d}s\Big{)}\big{(}|X_{T}-\hat{X}_{T}|\kappa_{g}(|X_{T}-\hat{X}_{T}|)+f_{L}(|X_{T}-\hat{X}_{T}|)\big{)}\Big{]}\geq 0,\end{split}

where the last inequality follows from gLg\in\mathcal{F}_{L}. We have thus completed the proof under the additional assumption (19). In order to remove it, consider any gLg\in\mathcal{F}_{L}. Then there exist (gn)L(g^{n})\subseteq\mathcal{F}_{L} such that (19) holds for any of the gng^{n}, gngg^{n}\rightarrow g pointwise and gng^{n} is uniformly bounded below. From this, one can prove that U0gn,TU0g,T\nabla U^{{g^{n}},{T}}_{0}\rightarrow\nabla U^{{g},{T}}_{0} pointwise by differentiating (13) under the integral sign. Using this result in combination with the fact that (16) holds for any gng^{n} allows to reach the desired conclusion. ∎

Lemma 2.1.

Under the same assumptions and notations of Theorem 2.1 we have

det=rt1projet(UtT,g(Xt)UtT,g(X^t))dtt<τ.\mathrm{d}\mathrm{e}_{t}=-r^{-1}_{t}\mathrm{proj}_{\mathrm{e}^{\bot}_{t}}(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))\mathrm{d}t\quad\forall t<\tau.
Proof.

Recall that if θ:d\theta:\mathbb{R}^{d}\rightarrow\mathbb{R} is the map z|z|z\mapsto|z|, then we have

θ(z)=z|z|,2θ(z)=I|z|zz|z|3,z0.\nabla\theta(z)=\frac{z}{|z|},\quad\nabla^{2}\theta(z)=\frac{\mathrm{I}}{|z|}-\frac{zz^{\top}}{|z|^{3}},\quad z\neq 0. (23)

The proof consist of several applications of Itô’s formula. We first observe that for t<τt<\tau

d(XtX^t)=(UtT,g(Xt)UtT,g(X^t))dt+2etdWt,withdWt=et,dBt.\mathrm{d}(X_{t}-\hat{X}_{t})=-(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))\mathrm{d}t+2\mathrm{e}_{t}\mathrm{d}W_{t},\quad\text{with}\quad\mathrm{d}W_{t}=\langle\mathrm{e}_{t},\mathrm{d}B_{t}\rangle. (24)

Note that (Wt)0tT(W_{t})_{0\leq t\leq T} is a Brownian motion by Lévy’s characterization. Thus, invoking (23) (or refferring directly to [22, Eq. 60] we obtain

drt=UtT,g(Xt)UtT,g(X^t),etdt+2dWt,\mathrm{d}r_{t}=-\langle\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}),\mathrm{e}_{t}\rangle\mathrm{d}t+2\mathrm{d}W_{t}, (25)

whence

drt1=rt2drt+rt3d[r]t=(rt2UtT,g(Xt)UtT,g(X^t),et+4rt3)dt2rt2dWt.\begin{split}\mathrm{d}r^{-1}_{t}&=-r^{-2}_{t}\mathrm{d}r_{t}+r^{-3}_{t}\mathrm{d}[r]_{t}\\ &=\Big{(}r^{-2}_{t}\langle\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}),\mathrm{e}_{t}\rangle+4r^{-3}_{t}\Big{)}\mathrm{d}t-2r^{-2}_{t}\mathrm{d}W_{t}.\end{split} (26)

Combining (24) with (26) we find that for t<τt<\tau

det=d(rt1(XtX^t))=rt1d(XtX^t)+(XtX^t)d(rt1)+d[XX^,r1]t=rt1(UtT,g(Xt)UtT,g(X^t))dt+2rt1etdWt+(rt2UtT,g(Xt)UtT,g(X^t),et+4rt3)(XtX^t)dt2rt2(XtX^t)dWt4rt2etdt=rt1(UtT,g(Xt)UtT,g(X^t)UtT,g(Xt)UtT,g(X^t),etet)dt=(rt1)projet(UtT,g(Xt)UtT,g(X^t))dt.\begin{split}\mathrm{d}\mathrm{e}_{t}&=\mathrm{d}\big{(}r^{-1}_{t}(X_{t}-\hat{X}_{t}))\\ &=r^{-1}_{t}\mathrm{d}(X_{t}-\hat{X}_{t})+(X_{t}-\hat{X}_{t})\mathrm{d}(r^{-1}_{t})+\mathrm{d}[X_{\cdot}-\hat{X}_{\cdot},r^{-1}_{\cdot}]_{t}\\ &=-r^{-1}_{t}(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))\mathrm{d}t+2r^{-1}_{t}\mathrm{e}_{t}\mathrm{d}W_{t}\\ &+\Big{(}r^{-2}_{t}\langle\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}),\mathrm{e}_{t}\rangle+4r^{-3}_{t}\Big{)}(X_{t}-\hat{X}_{t})\mathrm{d}t\\ &-2r^{-2}_{t}(X_{t}-\hat{X}_{t})\mathrm{d}W_{t}-4r^{-2}_{t}\mathrm{e}_{t}\mathrm{d}t\\ &=-r^{-1}_{t}\Big{(}\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t})-\langle\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}),\mathrm{e}_{t}\rangle\mathrm{e}_{t}\Big{)}\mathrm{d}t\\ &=-(r^{-1}_{t})\mathrm{proj}_{\mathrm{e}_{t}^{\bot}}(\nabla U^{{T},{g}}_{t}(X_{t})-\nabla U^{{T},{g}}_{t}(\hat{X}_{t}))\mathrm{d}t.\end{split}

3 Second order bounds for Schrödinger potentials

From now on Assumption 1.1 is in force, even if we do not specify it. Moreover, since we show at Proposition 5.1 in the appendix that (𝐇𝟐)(\mathbf{H2}^{\prime}) implies (𝐇𝟐)(\mathbf{H2}), we shall always assume that (𝐇𝟐)(\mathbf{H2}) holds in the sequel. The next two subsections are devoted to establish the key estimates needed in the proof of Theorem 1.2, that is carried out immediately afterwards.

3.1 Weak semiconvexity of ψ\psi implies weak semiconcavity of φ\varphi

We begin this section with a useful reminder of the definition of FF, first given at (12).

F(α,s)=βμs+sT(1+Tα)+s1/2fL(s1/2)(1+Tα)2,s>0.F(\alpha,s)=\beta_{\mu}s+\frac{s}{T(1+T\alpha)}+\frac{s^{1/2}f_{L}(s^{1/2})}{(1+T\alpha)^{2}},\quad s>0.
Lemma 3.1.

Assume that α>1/T\alpha>-1/T exists such that

κψ(r)αr1fL(r)r>0.\kappa_{\psi}(r)\geq\alpha-r^{-1}f_{L}(r)\quad\forall r>0.

Then we have

κU0T,ψ(r)α1+Tαr1fL(r)(1+Tα)2r>0.\kappa_{U^{T,\psi}_{0}}(r)\geq\frac{\alpha}{1+T\alpha}-\frac{r^{-1}f_{L}(r)}{(1+T\alpha)^{2}}\quad\forall r>0. (27)

In particular,

φ(r)βμα1+Tα+r1fL(r)(1+Tα)2=r2F(α,r2)1Tr>0.\ell_{\varphi}(r)\leq\beta_{\mu}-\frac{\alpha}{1+T\alpha}+\frac{r^{-1}f_{L}(r)}{(1+T\alpha)^{2}}=r^{-2}F(\alpha,r^{2})-\frac{1}{T}\quad\forall r>0. (28)
Proof.

We define

ψ^()=ψ()α2||2.\hat{\psi}(\cdot)=\psi(\cdot)-\frac{\alpha}{2}|\cdot|^{2}.

and note that ψ^L\hat{\psi}\in\mathcal{F}_{L} by construction. We claim that

U0T,ψ(x)=α2(1+Tα)|x|2+U0T/(1+Tα),ψ^((1+Tα)1x)+C,U^{T,\psi}_{0}(x)=\frac{\alpha}{2(1+T\alpha)}|x|^{2}+U^{T/(1+T\alpha),\hat{\psi}}_{0}((1+T\alpha)^{-1}x)+C, (29)

where CC is some constant independent of xx. Indeed we have

U0T,ψ(x)d2log(2πT)=logexp(|yx|22Tα2|y|2ψ^(y))dy=logexp(α|x|22(1+Tα)1+Tα2T|y(1+Tα)1x|2ψ^(y))dy=α|x|22(1+Tα)+U0T/(1+Tα),ψ^((1+Tα)1x)d2log(2πT/(1+Tα))\begin{split}U^{T,\psi}_{0}(x)-\frac{d}{2}\log(2\pi T)&=-\log\int\exp\Big{(}-\frac{|y-x|^{2}}{2T}-\frac{\alpha}{2}|y|^{2}-\hat{\psi}(y)\Big{)}\mathrm{d}y\\ &=-\log\int\exp\Big{(}-\frac{\alpha|x|^{2}}{2(1+T\alpha)}-\frac{1+T\alpha}{2T}|y-(1+T\alpha)^{-1}x|^{2}-\hat{\psi}(y)\Big{)}\mathrm{d}y\\ &=\frac{\alpha|x|^{2}}{2(1+T\alpha)}+U^{T/(1+T\alpha),\hat{\psi}}_{0}((1+T\alpha)^{-1}x)-\frac{d}{2}\log(2\pi T/(1+T\alpha))\end{split}

Since ψ^L\hat{\psi}\in\mathcal{F}_{L}, we can invoke Theorem 2.1 in (29) to prove (27). The estimate (28) is immediately deduced from (27) recalling the relation (14) and using Assumption 1.1. ∎

3.2 Weak semiconcavity of φ\varphi implies weak semiconvexity of ψ\psi

We begin by recording some useful properties of the functions F(,)F(\cdot,\cdot) and G(,)G(\cdot,\cdot).

Lemma 3.2.

Let T,βμ>0,L0T,\beta_{\mu}>0,L\geq 0 be given.

  1. (i)

    For any α>1/T\alpha>-1/T the function

    sF(α,s)s\mapsto F(\alpha,s)

    is concave and increasing [0,+)[0,+\infty).

  2. (ii)

    αG(α,2)\alpha\mapsto G(\alpha,2) is positive and non decreasing over (1T,+)(-\frac{1}{T},+\infty). Moreover,

    supα>1/TG(α,2)12βμ.\sup_{\alpha>-1/T}G(\alpha,2)\leq\frac{1}{2\beta_{\mu}}. (30)
  3. (iii)

    The fixed point equation (11) admits at least one solution on (αν1/T,+)(\alpha_{\nu}-1/T,+\infty) and αν1/T\alpha_{\nu}-1/T is not an accumulation point for the set of solutions.

Proof.

We begin with the proof of (i). To this aim, we observe that fLf_{L} is increasing on [0,+)[0,+\infty) and therefore so is ss1/2fL(s1/2)s\mapsto s^{1/2}f_{L}(s^{1/2}). Therefore

ddsF(α,s)βμ+1T(1+Tα)>0,\frac{\mathrm{d}}{\mathrm{d}s}F(\alpha,s)\geq\beta_{\mu}+\frac{1}{T(1+T\alpha)}>0,

where we used α>1/T\alpha>-1/T in the last inequality. To prove concavity, we observe that

d2du2(u1/2fL(u1/2))|u=s=s1/24fL′′(s1/2)+s3/24(fL(s1/2)s1/2fL(s1/2))<(18)0.\frac{\mathrm{d}^{2}}{\mathrm{d}u^{2}}\Big{(}u^{1/2}f_{L}(u^{1/2})\Big{)}\Big{|}_{u=s}=\frac{s^{-1/2}}{4}f^{{}^{\prime\prime}}_{L}(s^{1/2})+\frac{s^{-3/2}}{4}(f^{{}^{\prime}}_{L}(s^{1/2})s^{1/2}-f_{L}(s^{1/2}))\stackrel{{\scriptstyle\eqref{eq:basic properties}}}{{<}}0.

Thus ss1/2fL(s1/2)s\mapsto s^{1/2}f_{L}(s^{1/2}) is concave and so is F(α,)F(\alpha,\cdot). We now move on to the proof of (ii) by first showing that G(,2)G(\cdot,2) is positive and then showing that it is increasing. If this was not the case then G(α,2)=0G(\alpha,2)=0 for some α>1/T\alpha>-1/T and therefore there exists a sequence (sn)n0(s_{n})_{n\geq 0} such that sn0s_{n}\rightarrow 0 and F(α,sn)2F(\alpha,s_{n})\geq 2. But this is impossible since lims0F(α,sn)=0.\lim_{s\downarrow 0}F(\alpha,s_{n})=0. Next, we observe that F(α,s)F(\alpha,s) is increasing in ss from item (i)(i) and decreasing in α\alpha for α(1/T,+)\alpha\in(-1/T,+\infty). For this reason, for any uu and αα\alpha^{\prime}\geq\alpha we have

{s:F(α,s)u}{s:F(α,s)u}\{s:F(\alpha^{\prime},s)\geq u\}\subseteq\{s:F(\alpha,s)\geq u\}

and therefore

G(α,u)G(α,u).G(\alpha^{\prime},u)\geq G(\alpha,u).

We complete the proof of (ii) by showing that (30) holds. To see this, using fL(r)0f_{L}(r)\geq 0 we obtain that for any α>1/T\alpha>-1/T

F(α,s)βμss>0.F(\alpha,s)\geq\beta_{\mu}s\quad\forall s>0.

But then we obtain directly from (12) that G(α,2)1/(2βμ)G(\alpha,2)\leq 1/(2\beta_{\mu}) thus proving (30). To prove (iii), we introduce

h:[αν1T,+),h(α):=α(αν1T+G(α,2)2T2)h:[\alpha_{\nu}-\frac{1}{T},+\infty)\longrightarrow\mathbb{R},\quad h(\alpha):=\alpha-\Big{(}\alpha_{\nu}-\frac{1}{T}+\frac{G(\alpha,2)}{2T^{2}}\Big{)}

Note that that hh is continuous on its domain since G(,2)G(\cdot,2) is so. Therefore, to reach the conclusion it suffices to show that

h(αν1T)<0,limα+h(α)=+.h\Big{(}\alpha_{\nu}-\frac{1}{T}\Big{)}<0,\quad\lim_{\alpha\rightarrow+\infty}h(\alpha)=+\infty. (31)

The first inequality is a direct consequence of G(αν1/T,2)>0G(\alpha_{\nu}-1/T,2)>0, that we have already proven. Finally, the fact that hh diverges at infinity is a consequence of (30). ∎

We shall now introduce the modified potential ψ¯\bar{\psi} as follows

ψ¯(y)=T(ψ(y)Uν(y)+|y|22T),\bar{\psi}(y)=T\Big{(}\psi(y)-U^{\nu}(y)+\frac{|y|^{2}}{2T}\Big{)}, (32)

It has been proven at [11, Lemma 1] that the Hessian of ψ¯\bar{\psi} relates to the covariance matrix of the conditional distributions of the static Schrödinger bridge π^\hat{\pi}. That is to say,

2ψ¯(y)=1TCovXπ^y(X)\nabla^{2}\bar{\psi}(y)=\frac{1}{T}\mathrm{Cov}_{X\sim\hat{\pi}^{y}}(X) (33)

where π^y\hat{\pi}^{y} is (a version of) the conditional distribution of π^\hat{\pi} that, in view of (8) has the following form:

π^y(dx)=exp(Vπ^y(x))dxexp(Vπ^y(x¯))dx¯,Vπ^y(x):=φ(x)+|x|22TxyT.\hat{\pi}^{y}(\mathrm{d}x)=\frac{\exp(-V^{\hat{\pi}^{y}}(x))\mathrm{d}x}{\int\exp(-V^{\hat{\pi}^{y}}(\bar{x}))\mathrm{d}\bar{x}},\quad V^{\hat{\pi}^{y}}(x):=\varphi(x)+\frac{|x|^{2}}{2T}-\frac{xy}{T}. (34)

We shall give an independent proof of (33) under additional regularity assumptions at Proposition 5.2 in the Appendix for the readers’ convenience. A consequence of (33) is that ψ¯\bar{\psi} is convex and we obtain from (32) that

κψ(r)αν1Tr1fL(r)r>0.\kappa_{\psi}(r)\geq\alpha_{\nu}-\frac{1}{T}-r^{-1}f_{L}(r)\quad\forall r>0. (35)

This is a first crude weak semiconvexity bound on ψ\psi upon which Theorem 1.2 improves by means of a recursive argument. We show in the forthcoming Lemma how to deduce weak semiconvexity of ψ\psi from weak semiconcavity of φ\varphi. In the L=0L=0 setting, this step is carried out in [11] invoking the Cramer-Rao inequality, whose application is not justified in the present more general setup.

Lemma 3.3.

Assume that α>1/T\alpha>-1/T exists such that

φ(r)1T+r2F(α,r2)r>0.\ell_{\varphi}(r)\leq-\frac{1}{T}+r^{-2}F(\alpha,r^{2})\quad\forall r>0. (36)

Then

κψ(r)αν1T+G(α,2)2T2r1fL(r)r>0.\kappa_{\psi}(r)\geq\alpha_{\nu}-\frac{1}{T}+\frac{G(\alpha,2)}{2T^{2}}-r^{-1}f_{L}(r)\quad\forall r>0.
Proof.

Recalling the definition of Vπ^yV^{\hat{\pi}^{y}} given at (34) we observe that the standing assumptions imply

Vπ^y(r)(34)φ(r)+1T=(36)r2F(α,r2)r>0.\ell_{V^{\hat{\pi}^{y}}}(r)\stackrel{{\scriptstyle\eqref{eq:cond_distr}}}{{\leq}}\ell_{\varphi}(r)+\frac{1}{T}\stackrel{{\scriptstyle\eqref{eq:convexity_propagation_3}}}{{=}}r^{-2}F(\alpha,r^{2})\quad\forall r>0. (37)

In view of (33), we now proceed to bound VarXπ^y(X1)\mathrm{Var}_{X\sim\hat{\pi}^{y}}(X_{1}) from below for a given yy, where we adopted the notational convention X=(X1,,Xd)X=(X_{1},\ldots,X_{d}) for the components of random vectors. We first observe that the variance-bias decomposition formula implies

VarXπ^y(X1)𝔼Xπ^y[VarXπ^y(X1|X2,,Xd)].\mathrm{Var}_{X\sim\hat{\pi}^{y}}(X_{1})\geq\mathbb{E}_{X\sim\hat{\pi}^{y}}[\mathrm{Var}_{X\sim\hat{\pi}^{y}}(X_{1}|X_{2},\ldots,X_{d})]. (38)

Moreover, we define for any z=(z2,,zd)z=(z_{2},\ldots,z_{d})

Vπ^y,z():=Vπ^y(,z),π^y,z(dx)=exp(Vπ^y,z(x))dxexp(Vπ^y,z(x¯))dx¯V^{\hat{\pi}^{y,z}}(\cdot):=V^{\hat{\pi}^{y}}(\cdot,z),\quad\hat{\pi}^{y,z}(\mathrm{d}x)=\frac{\exp(-V^{\hat{\pi}^{y,z}}(x))\mathrm{d}x}{\int\exp(-V^{\hat{\pi}^{y,z}}(\bar{x}))\mathrm{d}\bar{x}}

and observe that if Xπ^yX\sim\hat{\pi}^{y}, then the conditional distribution of X1X_{1} under given {(X2,,Xd)=(z2,,zd)}\{(X_{2},\ldots,X_{d})=(z_{2},\ldots,z_{d})\} is precisely π^y,z\hat{\pi}^{y,z}. This gives the formula

VarXπ^y(X1|X2=z2,,Xd=zd)=12|xx^|2π^y,z(dx)π^y,z(dx^)\mathrm{Var}_{X\sim\hat{\pi}^{y}}(X_{1}|X_{2}=z_{2},\ldots,X_{d}=z_{d})=\frac{1}{2}\int|x-\hat{x}|^{2}\hat{\pi}^{y,z}(\mathrm{d}x)\hat{\pi}^{y,z}(\mathrm{d}\hat{x})

With this notation at hand, and with the help of the following identities, that can be obtained by one-dimensional integration by parts,

1=xVπ^y,z(x)xπ^y,z(dx),0=xVπ^y,z(x)π^y,z(dx)1=\int\partial_{x}V^{\hat{\pi}^{y,z}}(x)\,x\,\,\hat{\pi}^{y,z}(\mathrm{d}x),\quad 0=\int\partial_{x}V^{\hat{\pi}^{y,z}}(x)\hat{\pi}^{y,z}(\mathrm{d}x)

we find that, uniformly in zd1z\in\mathbb{R}^{d-1},

1=12(xVπ^y,z(x)xVπ^y,z(x^))(xx^)π^y,z(dx)π^y,z(dx^)=12Vπ^y(x,z)Vπ^y(x^,z),(x,z)(x^,z)π^y,z(dx)π^y,z(dx^)(37)12F(α,|xx^|2)π^y,z(dx)π^y,z(dx^)12F(α,2VarXπ^y(X1|X2=z2,,Xd=zd))\begin{split}1&=\frac{1}{2}\int(\partial_{x}V^{\hat{\pi}^{y,z}}(x)-\partial_{x}V^{\hat{\pi}^{y,z}}(\hat{x}))(x-\hat{x})\hat{\pi}^{y,z}(\mathrm{d}x)\hat{\pi}^{y,z}(\mathrm{d}\hat{x})\\ &=\frac{1}{2}\int\langle\nabla V^{\hat{\pi}^{y}}(x,z)-\nabla V^{\hat{\pi}^{y}}(\hat{x},z),(x,z)-(\hat{x},z)\rangle\hat{\pi}^{y,z}(\mathrm{d}x)\hat{\pi}^{y,z}(\mathrm{d}\hat{x})\\ &\stackrel{{\scriptstyle\eqref{eq:convexity_propagation_1}}}{{\leq}}\frac{1}{2}\int F(\alpha,|x-\hat{x}|^{2})\hat{\pi}^{y,z}(\mathrm{d}x)\hat{\pi}^{y,z}(\mathrm{d}\hat{x})\\ &\leq\frac{1}{2}F(\alpha,2\mathrm{Var}_{X\sim\hat{\pi}^{y}}(X_{1}|X_{2}=z_{2},\ldots,X_{d}=z_{d}))\end{split}

where to establish the last inequality we used the concavity of FF (see Lemma 3.2(i)) and Jensen’s inequality. Since α>1/T\alpha>-1/T, invoking again Lemma 3.2(i) we have that sF(α,s)s\mapsto F(\alpha,s) is non decreasing. But then, we get from (38) and the last bound that

VarXπ^y(X1)12G(α,2),yd.\mathrm{Var}_{X\sim\hat{\pi}^{y}}(X_{1})\geq\frac{1}{2}G(\alpha,2),\quad\forall y\in\mathbb{R}^{d}.

Next, we observe that, because of the fact that if φ()\varphi(\cdot) satisfies (36) then so does φ(O)\varphi(\mathrm{O}\cdot) for any orthonormal matrix O\mathrm{O}, repeating the argument above yields

VarXπ^y(v,X)12G(α,2),y,vds.t.|v|=1.\mathrm{Var}_{X\sim\hat{\pi}^{y}}(\langle v,X\rangle)\geq\frac{1}{2}G(\alpha,2),\quad\forall y,v\in\mathbb{R}^{d}\,\,\text{s.t.}\,\,|v|=1.

Recalling that 2ψ¯(y)=1TCovXπ^y(X)\nabla^{2}\bar{\psi}(y)=\frac{1}{T}\mathrm{Cov}_{X\sim\hat{\pi}^{y}}(X) with ψ¯\bar{\psi} defined by (32), we find

v,2ψ¯(y)vG(α,2)2T|v|2v,yd.\langle v,\nabla^{2}\bar{\psi}(y)v\rangle\geq\frac{G(\alpha,2)}{2T}|v|^{2}\quad\forall v,y\in\mathbb{R}^{d}. (39)

But then, rewriting (32) as

ψ()=Uν()||22T+ψ¯()T,\psi(\cdot)=U^{\nu}(\cdot)-\frac{|\cdot|^{2}}{2T}+\frac{\bar{\psi}(\cdot)}{T},

we immediately obtain that for all r>0r>0

κψ(r)κUν(r)1T+1Tκψ¯(y)(r)αν1T+G(α,2)2T2r1fL(r),\begin{split}\kappa_{\psi}(r)&\geq\kappa_{U^{\nu}}(r)-\frac{1}{T}+\frac{1}{T}\kappa_{\bar{\psi}(y)}(r)\\ &\geq\alpha_{\nu}-\frac{1}{T}+\frac{G(\alpha,2)}{2T^{2}}-r^{-1}f_{L}(r),\end{split}

where we used (39) and hypothesis (6) to obtain the last inequality. ∎

3.3 Proof of Theorem 1.2

The proof is obtained by combining the results of the former two sections through a fixed point argument.

Proof of Theorem 1.2.

We define a sequence (αn)n0(\alpha^{n})_{n\geq 0} via

α0=αν1T,αn=αν1T+G(αn1,2)2T2,n1.\alpha^{0}=\alpha_{\nu}-\frac{1}{T},\quad\alpha^{n}=\alpha_{\nu}-\frac{1}{T}+\frac{G(\alpha^{n-1},2)}{2T^{2}},\quad n\geq 1.

Using Lemma 3.2(ii) and an induction argument, we obtain that α1α0\alpha^{1}\geq\alpha^{0} and (αn)n0(\alpha^{n})_{n\geq 0} is a non decreasing sequence. Moreover, (αn)n0(\alpha^{n})_{n\geq 0} is a bounded sequence by (30) and therefore it admits a finite limit α\alpha^{*}. By continuity of G(,2)G(\cdot,2), we know that α>αν1/T\alpha^{*}>\alpha_{\nu}-1/T and α\alpha^{*} satisfies the fixed point equation (11). To conclude the proof, we show by induction that

κψ(r)αnr1fL(r)n1.\kappa_{\psi}(r)\geq\alpha^{n}-r^{-1}f_{L}(r)\quad\forall n\geq 1. (40)

The case n=0n=0 is (35). For the inductive step, suppose (40) holds for a given nn. Then Lemma 3.1 gives that

φ(r)r2F(αn,r2)1Tr>0.\ell_{\varphi}(r)\leq r^{-2}F(\alpha^{n},r^{2})-\frac{1}{T}\quad\forall r>0.

But then, an application of Lemma 3.3 proves that for all r>0r>0 we have

κψ(r)αν1T+G(αn,2)2T2r1fL(r)=αn+1r1fL(r).\kappa_{\psi}(r)\geq\alpha_{\nu}-\frac{1}{T}+\frac{G(\alpha^{n},2)}{2T^{2}}-r^{-1}f_{L}(r)=\alpha^{n+1}-r^{-1}f_{L}(r).

The proof of (9) is now finished. To conclude, we observe that (10) follows directly from (9) and Lemma 3.3.

Let us now prove Corollary 1.1

Proof of Corollary 1.1.

We first prove the upper bound. To do so, we observe that

F(α,s)(βμ+1T(1+Tα))s,ααν1/T,s0.F(\alpha,s)\geq(\beta_{\mu}+\frac{1}{T(1+T\alpha)})s,\quad\forall\alpha\geq\alpha_{\nu}-1/T,s\geq 0.

But then,

G(α,2)2T21T2βμ+1(α+1/T).\frac{G(\alpha,2)}{2T^{2}}\leq\frac{1}{T^{2}\beta_{\mu}+\frac{1}{(\alpha+1/T)}}.

Since α¯\bar{\alpha} is a fixed point, we obtain

α¯+1Tαν+1T2βμ+1(α¯+1/T).\bar{\alpha}+\frac{1}{T}\leq\alpha_{\nu}+\frac{1}{T^{2}\beta_{\mu}+\frac{1}{(\bar{\alpha}+1/T)}}.

If we now define a¯=α¯+1/T\bar{a}=\bar{\alpha}+1/T, the above implies

a¯αν+a¯T2βμa¯+1.\bar{a}\leq\alpha_{\nu}+\frac{\bar{a}}{T^{2}\beta_{\mu}\bar{a}+1}.

Since a¯>0\bar{a}>0, we can rewrite the last inequality in the equivalent form

T2βμa¯2T2ανβμa¯αν0.T^{2}\beta_{\mu}\bar{a}^{2}-T^{2}\alpha_{\nu}\beta_{\mu}\bar{a}-\alpha_{\nu}\leq 0.

Solving this differential inequality yields

a¯αν2+12αν2+4ανT2βμ.\bar{a}\leq\frac{\alpha_{\nu}}{2}+\frac{1}{2}\sqrt{\alpha^{2}_{\nu}+\frac{4\alpha_{\nu}}{T^{2}\beta_{\mu}}}.

The desired result follows from α¯=a¯1T\bar{\alpha}=\bar{a}-\frac{1}{T}. We now move to the proof of the lower bound. First, we recalling that fL(r)Lrf_{L}(r)\leq Lr, we obtain

F(α,s)(βμ+1T(1+Tα)+L(1+Tα)2)s.F(\alpha,s)\leq\big{(}\beta_{\mu}+\frac{1}{T(1+T\alpha)}+\frac{L}{(1+T\alpha)^{2}}\big{)}s.

Using that α¯αν1/T\bar{\alpha}\geq\alpha_{\nu}-1/T, we obtain that for all s>0s>0

F(α¯,s)(βμ+1T(1+Tα¯)+LTαν(1+Tα¯))s.F(\bar{\alpha},s)\leq\big{(}\beta_{\mu}+\frac{1}{T(1+T\bar{\alpha})}+\frac{L}{T\alpha_{\nu}(1+T\bar{\alpha})}\big{)}s.

But then,

G(α¯,2)2T21T2βμ+(1+L/αν)(α¯+1/T).\frac{G(\bar{\alpha},2)}{2T^{2}}\geq\frac{1}{T^{2}\beta_{\mu}+\frac{(1+L/\alpha_{\nu})}{(\bar{\alpha}+1/T)}}.

Setting a¯=α+1/T\bar{a}=\alpha+1/T, we deduce from the fact that α¯\bar{\alpha} is a fixed point that

a¯αν+a¯a¯T2βμ+(1+L/αν).\bar{a}\geq\alpha_{\nu}+\frac{\bar{a}}{\bar{a}T^{2}\beta_{\mu}+(1+L/\alpha_{\nu})}.

Using a¯>0\bar{a}>0 we rewrite the last inequality in the equivalent form

T2βμa¯2+(L/ανT2ανβμ)a¯(αν+L)0.T^{2}\beta_{\mu}\bar{a}^{2}+(L/\alpha_{\nu}-T^{2}\alpha_{\nu}\beta_{\mu})\bar{a}-(\alpha_{\nu}+L)\geq 0.

Solving this differential inequality yields

a¯αν2+L2T2ανβμ+12(αν+LT2ανβμ)2+4ανT2βμ.\bar{a}\geq\frac{\alpha_{\nu}}{2}+\frac{L}{2T^{2}\alpha_{\nu}\beta_{\mu}}+\frac{1}{2}\sqrt{\big{(}\alpha_{\nu}+\frac{L}{T^{2}\alpha_{\nu}\beta_{\mu}}\big{)}^{2}+\frac{4\alpha_{\nu}}{T^{2}\beta_{\mu}}}\,\,.

The desired conclusion follows from α¯=a¯1/T\bar{\alpha}=\bar{a}-1/T. ∎

4 Logarithmic Sobolev inequality for Schrödinger bridges

This section is devoted to the proof of Theorem 1.3 and is structured as follows: we first recall known facts about logarithmic Sobolev inequalities and gradient estimates for diffusion semigroups whose proofs can be found e.g. in [2] and eventually prove at Lemma 4.1 a sufficient condition for the two-times distribution of a diffusion process to satisfy LSI. Though such a result may not appear surprising, we could not find it in this form in the existing literature. We then proceed to elucidate the connection between Schrödinger bridges and Doob hh-transforms at Lemma 4.2, and then finally prove Theorem 1.3.

Local LSIs and gradient estimates

Let [0,T]×d(t,x)Ut(x)[0,T^{\prime}]\times\mathbb{R}^{d}\ni(t,x)\mapsto U_{t}(x) be continuous in the time variable, twice differentiable and uniformly Lipschitz in the space variable. We consider the time-inhomogeneous semigroup (Ps,t)0stT(P_{s,t})_{0\leq s\leq t\leq T^{\prime}} generated by the diffusion process whose generator at time tt acts on smooth functions with bounded support as follows

f12ΔfUt,f.f\mapsto\frac{1}{2}\Delta f-\langle\nabla U_{t},\nabla f\rangle.

Moreover, we define for all t[0,T]t\in[0,T^{\prime}]

αt=infx,vd,|v|=1v,2Ut(x),v.\alpha_{t}=\inf_{x,v\in\mathbb{R}^{d},|v|=1}\langle v,\nabla^{2}U_{t}(x),v\rangle.

We now recall some basic fact about gradient estimates and local LSIs for the semigroup (Ps,t)0stT(P_{s,t})_{0\leq s\leq t\leq T^{\prime}}. For time-homogeneous semigroups these facts are well known and can be found e.g. in [2]: the adaptation to the time-inhomogeneous setting is straightforward. The first result we shall need afterwards is the gradient estimate (see [2, Thm. 3.3.18])

|Pt,Tf|(x)Ct,TPt,T(|f|)(x),Ct,T=exp(tTαsds),|\nabla P_{t,T^{\prime}}f|(x)\leq C_{t,T^{\prime}}\,P_{t,T^{\prime}}(|\nabla f|)(x),\quad C_{t,T^{\prime}}=\exp\Big{(}-\int_{t}^{T^{\prime}}\alpha_{s}\mathrm{d}s\Big{)}, (41)

that holds for all (t,x)[0,T]×d(t,x)\in[0,T^{\prime}]\times\mathbb{R}^{d} and any continuously differentiable ff. Moreover, the local logarithmic Sobolev inequalities (see [2, Thm. 5.5.2])

(P0,Tflogf)(x)(P0,Tf)(x)log(P0,Tf)(x)C~0,T2P0,T(|f|2/f)(x),C~0,T=0TCt,Tdt(P_{0,T^{\prime}}f\log f)(x)-(P_{0,T^{\prime}}f)(x)\log(P_{0,T^{\prime}}f)(x)\leq\frac{\tilde{C}_{0,T^{\prime}}}{2}P_{0,T^{\prime}}(|\nabla f|^{2}/f)(x),\quad\tilde{C}_{0,T^{\prime}}=\int_{0}^{T^{\prime}}C_{t,T^{\prime}}\,\mathrm{d}t (42)

hold for all xdx\in\mathbb{R}^{d} and all positive continuously differentiable ff. In the next Lemma we show how to obtain LSI for the joint law at times 0 and TT^{\prime} of a diffusion process with initial distribution μ\mu and drift Ut-\nabla U_{t}, that is to say for the coupling π\pi defined by

d×df(x,y)π(dxdy)=dP0,Tf(x,)(x)μ(dx)f>0.\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}f(x,y)\pi(\mathrm{d}x\mathrm{d}y)=\int_{\mathbb{R}^{d}}P_{0,T^{\prime}}f(x,\cdot)(x)\mu(\mathrm{d}x)\quad\forall f>0. (43)
Lemma 4.1.

Assume that μ\mu satisfies LSI with constant CμC_{\mu} and let π\pi be as in (43). Then π\pi satisfies LSI with constant

max{2Cμ,2CμC0,T+0TCt,Tdt}.\max\{2C_{\mu},{2}C_{\mu}C_{0,T^{\prime}}+\int_{0}^{T^{\prime}}C_{t,T^{\prime}}\mathrm{d}t\}.

The proof is carried out by carefully ”mixing” the local (conditional) LSIs (42) with the help of gradient estimates. Similar arguments and ideas can be found e.g. in [6, 27].

Proof.

Let f>0f>0 be continuously differentiable. We recall the decomposition of the entropy formula (see [30, Thm. 2.4])

Entπ(f)=Entμ(f0)+dEntπx(fx)f0(x)μ(dx),\mathrm{Ent}_{\pi}(f)=\mathrm{Ent}_{\mu}(f_{0})+\int_{\mathbb{R}^{d}}\mathrm{Ent}_{\pi^{x}}(f^{x})f_{0}(x)\mu(\mathrm{d}x), (44)

where we adopted the following conventions

f0(x)=(P0,Tf(x,))(x),fx(y)=f(x,y)/f0(x),g(y)πx(dy)=(P0,Tg)(x)g>0.f_{0}(x)=(P_{0,T^{\prime}}f(x,\cdot))(x),\quad f^{x}(y)=f(x,y)/f_{0}(x),\quad\int g(y)\pi^{x}(\mathrm{d}y)=\Big{(}P_{0,T^{\prime}}g\Big{)}(x)\,\,\forall g>0. (45)

The proof is carried out in two steps. In a first step, we bound the second term in (44) by means of the conditional LSIs. In the second step, we bound the first term using the LSI for μ\mu and gradient estimates.

  • Step 1 The local logarithmic Sobolev inequalities (42) imply that

    Entπx(fx)=P0,T(fxlogfx)(x)(P0,TfxlogP0,Tfx)(x)C~0,T2f0(x)|yfx(y)|2/fx(y)πx(dy)(45)C~0,T2f0(x)|yf(x,y)|2/f(x,y)πx(dy).\begin{split}\mathrm{Ent}_{\pi^{x}}(f^{x})&=P_{0,T^{\prime}}\big{(}f^{x}\log f^{x}\big{)}(x)-\big{(}P_{0,T^{\prime}}f^{x}\log P_{0,T^{\prime}}f^{x}\big{)}(x)\\ &\leq\frac{\tilde{C}_{0,T^{\prime}}}{2f_{0}(x)}\,\int|\nabla_{y}f^{x}(y)|^{2}/f^{x}(y)\pi^{x}(\mathrm{d}y)\\ &\stackrel{{\scriptstyle\eqref{eq:conventions}}}{{\leq}}\frac{\tilde{C}_{0,T^{\prime}}}{2f_{0}(x)}\,\int|\nabla_{y}f(x,y)|^{2}/f(x,y)\pi^{x}(\mathrm{d}y).\end{split}

    uniformly in xdx\in\mathbb{R}^{d}.Integrating this inequality and using (43) gives

    Entπx(fx)f0(x)μ(dx)C~0,T2|yf(x,y)|2f(x,y)π(dxdy).\int\mathrm{Ent}_{\pi^{x}}(f^{x})f_{0}(x)\mu(\mathrm{d}x)\leq\frac{\tilde{C}_{0,T^{\prime}}}{2}\,\int\frac{|\nabla_{y}f(x,y)|^{2}}{f(x,y)}\pi(\mathrm{d}x\mathrm{d}y). (46)
  • Step 2 We start with the observation that

    xf0(x)=(45)P0,Txf(x,)(x)+z(P0,Tf(x,)(z))|z=x.\nabla_{x}f_{0}(x)\stackrel{{\scriptstyle\eqref{eq:conventions}}}{{=}}P_{0,T^{\prime}}\nabla_{x}f(x,\cdot)(x)+\nabla_{z}\big{(}P_{0,T^{\prime}}f(x,\cdot)(z)\big{)}\Big{|}_{z=x}.

    But then, using the LSI for μ\mu and Young’s inequality we obtain

    Entμ(f0)Cμ2|xf0(x)|2/f0(x)μ(dx)Cμ|P0,T(xf(x,))(x)|2(P0,Tf(x,))1(x)μ(dx)+Cμ|zP0,T(f(x,))(z)|2|z=x(P0,Tf(x,))1(x)μ(dx).\begin{split}\mathrm{Ent}_{\mu}(f_{0})&\leq\frac{C_{\mu}}{2}\int|\nabla_{x}f_{0}(x)|^{2}/f_{0}(x)\mu(\mathrm{d}x)\\ &\leq C_{\mu}\int|P_{0,T^{\prime}}(\nabla_{x}f(x,\cdot))(x)|^{2}(P_{0,T^{\prime}}f(x,\cdot))^{-1}(x)\,\mu(\mathrm{d}x)\\ &+C_{\mu}\int|\nabla_{z}P_{0,T^{\prime}}(f(x,\cdot))(z)|^{2}\Big{|}_{z=x}(P_{0,T^{\prime}}f(x,\cdot))^{-1}(x)\,\mu(\mathrm{d}x).\end{split} (47)

    For the first summand on the rhs of (47), we can argue on the basis of Jensen’s inequality applied to the convex function a,ba2/ba,b\mapsto a^{2}/b to obtain

    |P0,T(xf(x,))(x)|2(P0,Tf(x,))1(x)μ(dx)CμP0,T(|xf(x,)|2/f(x,))(x)μ(dx)=Cμ|xf(x,y)|2/f(x,y)π(dxdy),\begin{split}&\int|P_{0,T^{\prime}}(\nabla_{x}f(x,\cdot))(x)|^{2}(P_{0,T^{\prime}}f(x,\cdot))^{-1}(x)\mu(\mathrm{d}x)\\ &\leq C_{\mu}\int P_{0,T^{\prime}}\Big{(}|\nabla_{x}f(x,\cdot)|^{2}/f(x,\cdot)\Big{)}(x)\mu(\mathrm{d}x)\\ &=C_{\mu}\int|\nabla_{x}f(x,y)|^{2}/f(x,y)\pi(\mathrm{d}x\mathrm{d}y),\end{split} (48)

    where we used (43) to obtain the last identity. For the second summand on the rhs of (47), we first invoke the gradient estimate (41) and eventually apply again Jensen’s inequality as we did in the previous calculation to obtain

    Cμ|zP0,T(f(x,))(z)|2|z=x(P0,Tf(x,))1(x)μ(dx)(41)CμC0,T(P0,T(|yf(x,)|)(x))2(P0,Tf(x,))1(x)μ(dx)JensenCμC0,TP0,T(|yf(x,)|2/f(x,))(x)μ(dx)=(43)CμC0,T|yf(x,y)|2/f(x,y)π(dxdy).\begin{split}&C_{\mu}\int|\nabla_{z}P_{0,T^{\prime}}(f(x,\cdot))(z)|^{2}\Big{|}_{z=x}(P_{0,T^{\prime}}f(x,\cdot))^{-1}(x)\mu(\mathrm{d}x)\\ &{\stackrel{{\scriptstyle\eqref{eq:grad_est}}}{{\leq}}}C_{\mu}C_{0,T^{\prime}}\int(P_{0,T^{\prime}}(|\nabla_{y}f(x,\cdot)|)(x))^{2}(P_{0,T}f(x,\cdot))^{-1}(x)\mu(\mathrm{d}x)\\ &{\stackrel{{\scriptstyle\text{Jensen}}}{{\leq}}}C_{\mu}C_{0,T^{\prime}}\int P_{0,T^{\prime}}\Big{(}|\nabla_{y}f(x,\cdot)|^{2}/f(x,\cdot)\Big{)}(x)\mu(\mathrm{d}x)\\ &{\stackrel{{\scriptstyle\eqref{eq:diffusion_coupling}}}{{=}}}C_{\mu}C_{0,T^{\prime}}\int|\nabla_{y}f(x,y)|^{2}/f(x,y)\,\pi(\mathrm{d}x\mathrm{d}y).\end{split} (49)

    Plugging in (48)-(49) into (47) we obtain

    Entμ(f0)Cμ|xf(x,y)|2/f(x,y)π(dxdy)+CμC0,T|yf(x,y)|2/f(x,y)π(dxdy).\begin{split}\mathrm{Ent}_{\mu}(f_{0})&\leq C_{\mu}\int|\nabla_{x}f(x,y)|^{2}/f(x,y)\pi(\mathrm{d}x\mathrm{d}y)\\ &+C_{\mu}C_{0,T^{\prime}}\int|\nabla_{y}f(x,y)|^{2}/f(x,y)\,\pi(\mathrm{d}x\mathrm{d}y).\end{split} (50)

To conclude the proof, we combine the strength of the bounds (46) and (50) with the entropy decomposition formula (44) to obtain

Entπ(f)Cμ|xf(x,y)|2/f(x,y)π(dxdy)+(C~0,T/2+CμC0,T)|yf(x,y)|2/f(x,y)π(dxdy)max{Cμ,CμC0,T+C~0,T/2}|f|2/fdπ,\begin{split}\mathrm{Ent}_{\pi}(f)&\leq C_{\mu}\int|\nabla_{x}f(x,y)|^{2}/f(x,y)\pi(\mathrm{d}x\mathrm{d}y)\\ &+(\tilde{C}_{0,T^{\prime}}/2+C_{\mu}C_{0,T^{\prime}})\int|\nabla_{y}f(x,y)|^{2}/f(x,y)\,\pi(\mathrm{d}x\mathrm{d}y)\\ &\leq\max\{C_{\mu},C_{\mu}C_{0,T^{\prime}}+\tilde{C}_{0,T^{\prime}}/2\}\int|\nabla f|^{2}/f\,\mathrm{d}\pi,\end{split}

which is the desired result. ∎

In the next lemma, we represent an approximated version of the static Schrödinger bridge (2) through a diffusion process. It is a classical result saying that Schrödinger bridges are indeed Doob’s h-transforms, see e.g. [31, Sec. 4][16].

Lemma 4.2.

Let Assumption 1.1 hold and π^\hat{\pi} be the static Schrödinger bridge (2). For any ε(0,T)\varepsilon\in(0,T) define

π^ε(dxdy):=(2π(Tε))d/2exp(φ(x)UTεT,ψ(y)|yx|22(Tε))dxdy.\hat{\pi}^{\varepsilon}(\mathrm{d}x\,\mathrm{d}y):=(2\pi(T-\varepsilon))^{-d/2}\exp\Big{(}-\varphi(x)-U^{{T},{\psi}}_{T-\varepsilon}(y)-\frac{|y-x|^{2}}{2(T-\varepsilon)}\Big{)}\mathrm{d}x\,\mathrm{d}y. (51)

Then π^ε\hat{\pi}^{\varepsilon} has the form (43) for T=TεT^{\prime}=T-\varepsilon, where (Ps,t)0stTε(P_{s,t})_{0\leq s\leq t\leq T-\varepsilon} is the time-inhomogeneous semigroup associated with the generator acting on smooth test functions as follows

f12ΔfUtT,ψ,f,t[0,Tε].f\mapsto\frac{1}{2}\Delta f-\langle\nabla U^{{T},{\psi}}_{t},\nabla f\rangle,\quad\,\,t\in[0,T-\varepsilon]. (52)
Proof.

Let ψ\psi be the Schrödinger potential in (8). Invoking Theorem 1.2 we obtain that (27) holds with α=αψ\alpha=\alpha_{\psi}. Moreover, it is well known that (see [33, Eq 3.3] for example) UtT,ψ(Tt)1\ell_{U^{{T},{\psi}}_{t}}\leq(T-t)^{-1}, i.e. the Hessian of UtT,ψU^{{T},{\psi}}_{t} is bounded above by (Tt)1(T-t)^{-1}. Thereofore, the vector field [0,Tε]×d(t,x)UtT,ψ(x)[0,T-\varepsilon]\times\mathbb{R}^{d}\ni(t,x)\mapsto-\nabla U^{{T},{\psi}}_{t}(x) is uniformly Lipschitz w.r.t. the space variable for any ε(0,T)\varepsilon\in(0,T). This classically implies existence and uniqueness of strong solutions for the stochastic differential equation

dXt=UtT,ψ(Xt)dt+dBt,X0μ\mathrm{d}X_{t}=-\nabla U^{{T},{\psi}}_{t}(X_{t})\mathrm{d}t+\mathrm{d}B_{t},\quad X_{0}\sim\mu (53)

over any time interval [0,Tε][0,T-\varepsilon] and we shall denote by ε\mathbb{Q}^{\varepsilon} the law of the solution on C([0,Tε];d)C([0,T-\varepsilon];\mathbb{R}^{d}). Next, we denote by ε\mathbb{P}^{\varepsilon} the law on law on C([0,Tε];d)C([0,T-\varepsilon];\mathbb{R}^{d}) of a Brownian motion started at μ\mu. By Girsanov’s Theorem, see [29] for a version that applies in the current setting, we know that

dεdε(ω)=exp(0TεUtT,ψ(ωt)dωt120Tε|UtT,ψ(ωt)|2dt)εa.s.,\frac{\mathrm{d}\mathbb{Q}^{\varepsilon}}{\mathrm{d}\mathbb{P}^{\varepsilon}}(\omega)=\exp\Big{(}-\int_{0}^{T-\varepsilon}\nabla U^{{T},{\psi}}_{t}(\omega_{t})\mathrm{d}\omega_{t}-\frac{1}{2}\int_{0}^{T-\varepsilon}|\nabla U^{{T},{\psi}}_{t}(\omega_{t})|^{2}\mathrm{d}t\Big{)}\quad\mathbb{P}^{\varepsilon}-\text{a.s.},

where we denote by ω\omega the typical element of the canonical space C([0,Tε];d)C([0,T-\varepsilon];\mathbb{R}^{d}). Using Itô formula we rewrite the above as

dεdε(ω)=exp(U0T,ψ(ω0)UTεT,ψ(ωTε)+0Tε(tUtT,ψ+12ΔUtT,ψ12|UtT,ψ|2)(ωt)dt)=exp(Uμ(ω0)φ(ω0)UTεT,ψ(ωTε))\begin{split}\frac{\mathrm{d}\mathbb{Q}^{\varepsilon}}{\mathrm{d}\mathbb{P}^{\varepsilon}}(\omega)&=\exp\Big{(}U^{{T},{\psi}}_{0}(\omega_{0})-U^{{T},{\psi}}_{T-\varepsilon}(\omega_{T-\varepsilon})+\int_{0}^{T-\varepsilon}\Big{(}\partial_{t}U^{{T},{\psi}}_{t}+\frac{1}{2}\Delta U^{{T},{\psi}}_{t}-\frac{1}{2}|\nabla U^{{T},{\psi}}_{t}|^{2}\Big{)}(\omega_{t})\mathrm{d}t\Big{)}\\ &=\exp(U^{\mu}(\omega_{0})-\varphi(\omega_{0})-U^{{T},{\psi}}_{T-\varepsilon}(\omega_{T-\varepsilon}))\end{split}

where we used the Schrödinger system (8) and the HJB equation (15) to obtain the last expression. Indeed because of Theorem 1.2 one can deduce that [0,T]×d(t,x)UtT,ψ(x)[0,T]\times\mathbb{R}^{d}\ni(t,x)\mapsto U^{{T},{\psi}}_{t}(x), is a classical solution of (15) by differentiating under the integral sign in (13). From this, we deduce that

d0,Tεεd0,Tεε(x,y)=exp(Uμ(x)φ(x)UTεT,ψ(y))0Tεa.s.,\frac{\mathrm{d}\mathbb{Q}^{\varepsilon}_{0,T-\varepsilon}}{\mathrm{d}\mathbb{P}^{\varepsilon}_{0,T-\varepsilon}}(x,y)=\exp\big{(}U^{\mu}(x)-\varphi(x)-U^{{T},{\psi}}_{T-\varepsilon}(y)\big{)}\quad\mathbb{P}^{\varepsilon}_{0T}-\text{a.s.},

where 0,Tεε\mathbb{Q}^{\varepsilon}_{0,T-\varepsilon} (resp. 0,Tεε\mathbb{P}^{\varepsilon}_{0,T-\varepsilon}) denotes the joint distribution of ε\mathbb{Q}^{\varepsilon} (resp. ε\mathbb{P}^{\varepsilon}) at times 0 and TεT-\varepsilon. Since

0,Tεε(dxdy)=(2π(Tε))d/2exp(Uμ(x))exp(|yx|22(Tε))dxdy,\mathbb{P}^{\varepsilon}_{0,T-\varepsilon}(\mathrm{d}x\,\mathrm{d}y)=(2\pi(T-\varepsilon))^{-d/2}\exp(-U^{\mu}(x))\exp\Big{(}-\frac{|y-x|^{2}}{2(T-\varepsilon)}\Big{)}\mathrm{d}x\,\mathrm{d}y,

we conclude that

0,Tεε(dxdy)=(2π(Tε))d/2exp(φ(x)UTεT,ψ(y)|yx|22(Tε))dxdy.\mathbb{Q}^{\varepsilon}_{0,T-\varepsilon}(\mathrm{d}x\mathrm{d}y)=(2\pi(T-\varepsilon))^{-d/2}\exp\Big{(}-\varphi(x)-U^{{T},{\psi}}_{T-\varepsilon}(y)-\frac{|y-x|^{2}}{2(T-\varepsilon)}\Big{)}\mathrm{d}x\,\mathrm{d}y.

But then 0,Tεε=π^ε\mathbb{Q}^{\varepsilon}_{0,T-\varepsilon}=\hat{\pi}^{\varepsilon}, where π^ε\hat{\pi}^{\varepsilon} is defined at (51). To conclude, we recall that 0,Tε\mathbb{Q}_{0,T-\varepsilon} has the desired form (43) where (Ps,t)0stTε(P_{s,t})_{0\leq s\leq t\leq T-\varepsilon} is indeed the semigroup generated by (52). ∎

Proof of Theorem 1.3.

We know by Lemma 4.2 that π^ε\hat{\pi}^{\varepsilon} has the form (43) for T=TεT^{\prime}=T-\varepsilon and the inhomogeneous semigroup generated by (52). We now set for t[0,T]t\in[0,T]

αtψ=infx,vd,|v|=1v,2Utψ,T(x),v\alpha^{\psi}_{t}=\inf_{x,v\in\mathbb{R}^{d},|v|=1}\langle v,\nabla^{2}U^{{\psi},{T}}_{t}(x),v\rangle

and proceed to estimate αtψ\alpha^{\psi}_{t} from below. Invoking Theorem 1.2 we obtain that (27) holds with α=αψ\alpha=\alpha_{\psi}. That is to say, the estimate

κUtT,ψ(r)αψ1+(Tt)αψr1fL(r)(1+(Tt)αψ)2\kappa_{U^{T,\psi}_{t}}(r)\geq\frac{\alpha^{\psi}}{1+(T-t)\alpha^{\psi}}-\frac{r^{-1}f_{L}(r)}{(1+(T-t)\alpha^{\psi})^{2}}

holds uniformly on r>0r>0 and 0tT0\leq t\leq T. From here, using the concavity of fLf_{L} and fL(0)=Lf^{\prime}_{L}(0)=L we obtain

αtψαψ1+(Tt)αψL(1+(Tt)αψ)2.\alpha^{\psi}_{t}\geq\frac{\alpha^{\psi}}{1+(T-t)\alpha^{\psi}}-\frac{L}{(1+(T-t)\alpha^{\psi})^{2}}.

We can now apply Lemma 4.1 to obtain that π^ε\hat{\pi}^{\varepsilon} satisfies LSI with constant given by

ηε:=max{2Cμ,2CμC0,Tε+0TεCt,Tεdt}.\eta_{\varepsilon}:=\max\{2C_{\mu},{2}C_{\mu}C_{0,T-\varepsilon}+\int_{0}^{T-\varepsilon}C_{t,T-\varepsilon}\mathrm{d}t\}.

Next, observe that the weak convexity bounds on ψ\psi of Theorem 1.2 imply that UTεT,ψU^{{T},{\psi}}_{T-\varepsilon} converges to ψ\psi pointwise as ε0\varepsilon\rightarrow 0. But then, we have that π^ε\hat{\pi}^{\varepsilon} converges in total variation to π^\hat{\pi} by Scheffé’s Lemma. Take now any continuously differentiable function ff bounded above and below by positive constants and with bounded derivative. Letting ε0\varepsilon\rightarrow 0 in

Entπ^ε(f)ηε2|f|2f(x,y)π^ε(dxdy)\mathrm{Ent}_{\hat{\pi}^{\varepsilon}}(f)\leq\frac{\eta_{\varepsilon}}{2}\int\frac{|\nabla f|^{2}}{f}(x,y)\,\hat{\pi}^{\varepsilon}(\mathrm{d}x\,\mathrm{d}y)

and using the convergence in variation of π^ε\hat{\pi}^{\varepsilon} obtain that LSI holds for ff under π^\hat{\pi} with the desired constant. The extension to a general positive and continuously differentiable functions is achieved through a standard approximation argument where f()f(\cdot) is approximated by fχ(N1)+N1f\,\chi(N^{-1}\cdot)+N^{-1} with χ()\chi(\cdot) a smooth cutoff function.

5 Appendix

Proposition 5.1.

Assume that UU satisfies (4) for some α>0,L,R0\alpha>0,L^{\prime},R\geq 0. Then

κU(r)αr1fL(r)r>0.\kappa_{U}(r)\geq\alpha-r^{-1}f_{L}(r)\quad\forall r>0.

with LL given by (7).

Proof.

If r>Rr>R the claim is a simple consequence of fL(r)0f_{L}(r)\geq 0. If rRr\leq R, using (18) to get that rr1fL(r)r^{\prime}\mapsto r^{\prime-1}f_{L}(r^{\prime}) is non increasing on (0,+)(0,+\infty), we obtain

r1fL(r)R1fL(R)=L,r^{-1}f_{L}(r)\geq R^{-1}f_{L}(R)=L^{\prime},

from which the conclusion follows. ∎

Proposition 5.2.

Let Assumption 1.1 hold and assume furthermore that there exist ε,γ>0\varepsilon,\gamma^{\prime}>0 such that

exp(γ|x|1+ε)μ(dx)<+.\int\exp(\gamma^{\prime}|x|^{1+\varepsilon})\mu(\mathrm{d}x)<+\infty. (54)

Moreover, let ψ¯\bar{\psi} be as in (32). Then ψ¯\bar{\psi} is twice differentiable and we have

2ψ¯(y)=1TCovXπ^y(X)yd,\nabla^{2}\bar{\psi}(y)=\frac{1}{T}\mathrm{Cov}_{X\sim\hat{\pi}^{y}}(X)\quad\forall y\in\mathbb{R}^{d},

where π^y\hat{\pi}^{y} is given by (34).

Proof.

From (8) we obtain that

ψ¯(y)+d2log(π)=Tlogdexp(φ(x)|x|22T+x,yT)dx.\bar{\psi}(y)+\frac{d}{2}\log(\pi)=T\log\int_{\mathbb{R}^{d}}\exp\Big{(}-\varphi(x)-\frac{|x|^{2}}{2T}+\frac{\langle x,y\rangle}{T}\Big{)}\mathrm{d}x. (55)

From Assumption 1.1, (8) and (54) it follows that

d×dexp(γ|x|1+εφ(x)ψ(y)|xy|22T)dxdy<+,\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}\exp\Big{(}\gamma^{\prime}|x|^{1+\varepsilon}-\varphi(x)-\psi(y)-\frac{|x-y|^{2}}{2T}\Big{)}\mathrm{d}x\,\mathrm{d}y<+\infty,

whence the existence of some yy^{\prime} such that

d×dexp(γ|x|1+εφ(x)|x|22T+x,yT)dx<+.\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}\exp\left(\gamma^{\prime}|x|^{1+\varepsilon}-\varphi(x)-\frac{|x|^{2}}{2T}+\frac{\langle x,y^{\prime}\rangle}{T}\right)\mathrm{d}x<+\infty.

From this, we easily obtain that for all γ<γ\gamma<\gamma^{\prime}

dexp(γ|x|1+εφ(x)|x|22T+x,yT)dx<+yd.\int_{\mathbb{R}^{d}}\exp\left(\gamma|x|^{1+\varepsilon}-\varphi(x)-\frac{|x|^{2}}{2T}+\frac{\langle x,y\rangle}{T}\right)\mathrm{d}x<+\infty\quad\forall y\in\mathbb{R}^{d}. (56)

Thanks to (56) we can apply the dominated convergence theorem and differentiate under the integral sign in (15) to obtain that ψ¯\bar{\psi} is differentiable and

ψ¯(y)=xexp(φ(x)|x|22T+x,yT)dxexp(φ(x¯)|x¯|22T+x¯,yT)dx¯=(34)𝔼Xπ^y[X]\nabla\bar{\psi}(y)=\frac{\int x\exp(-\varphi(x)-\frac{|x|^{2}}{2T}+\frac{\langle x,y\rangle}{T})\mathrm{d}x}{\int\exp(-\varphi(\bar{x})-\frac{|\bar{x}|^{2}}{2T}+\frac{\langle\bar{x},y\rangle}{T})\mathrm{d}\bar{x}}\stackrel{{\scriptstyle\eqref{eq:cond_distr}}}{{=}}\mathbb{E}_{X\sim\hat{\pi}^{y}}[X]

Using once again (56) to differentiate under the integral sign in (55) we conclude that ψ¯\bar{\psi} is twice differentiable and that (33) holds. ∎

References

  • [1] Shigeki Aida and Ichiro Shigekawa. Logarithmic sobolev inequalities and spectral gaps: perturbation theory. Journal of Functional Analysis, 126(2):448–475, 1994.
  • [2] Dominique Bakry, Ivan Gentil, and Michel Ledoux. Analysis and geometry of Markov diffusion operators, volume 348. Springer Science & Business Media, 2013.
  • [3] Erhan Bayraktar, Stephan Eckstein, and Xin Zhang. Stability and sample complexity of divergence regularized optimal transport. arXiv preprint arXiv:2212.00367, 2022.
  • [4] Jean-David Benamou. Optimal transportation, modelling and numerical simulation. Acta Numerica, 30:249–325, 2021.
  • [5] Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel Peyré. Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015.
  • [6] Th Bodineau and B Helffer. The log-sobolev inequality for unbounded spin systems. Journal of functional analysis, 166(1):168–178, 1999.
  • [7] H.J. Brascamp and E.H. Lieb. On extensions of the Brunn-Minkowski and Prékopa-leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. Journal of Functional Analysis, 22(4):366–389, 1976.
  • [8] Luis A Caffarelli. Monotonicity properties of optimal transportation and the FKG and related inequalities. Communications in Mathematical Physics, 214(3):547–563, 2000.
  • [9] Y. Chen, T. Georgiou, and M. Pavon. On the relation between optimal transport and Schrödinger bridges: A stochastic control viewpoint. preprint arXiv:1412.4430, 2014.
  • [10] Yongxin Chen, Tryphon T Georgiou, and Michele Pavon. Stochastic Control Liaisons: Richard Sinkhorn Meets Gaspard Monge on a Schrodinger Bridge. SIAM Review, 63(2):249–313, 2021.
  • [11] Sinho Chewi and Aram-Alexandre Pooladian. An entropic generalization of caffarelli’s contraction theorem via covariance inequalities. arXiv preprint arXiv:2203.04954, 2022.
  • [12] Alberto Chiarini, Giovanni Conforti, Giacomo Greco, and Luca Tamanini. Gradient estimates for the Schrödinger potentials: convergence to the Brenier map and quantitative stability. arXiv preprint arXiv:2207.14262, 2022.
  • [13] Giovanni Conforti. A second order equation for Schrödinger bridges with applications to the hot gas experiment and entropic transportation cost. Probability Theory and Related Fields, 174(1-2):1–47, 2019.
  • [14] Giovanni Conforti. Coupling by reflection for controlled diffusion processes: turnpike property and large time behavior of Hamilton Jacobi Bellman equations. Annals of Applied Probability (to appear), 2022.
  • [15] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages 2292–2300, 2013.
  • [16] P. Dai Pra and M. Pavon. On the Markov processes of Schrödinger, the Feynman-Kac formula and stochastic control. In Realization and Modelling in System Theory, volume 3, pages 497–504. Springer, 1990.
  • [17] Valentin De Bortoli, Arnaud Doucet, Jeremy Heng, and James Thornton. Simulating diffusion bridges with score matching. arXiv preprint arXiv:2111.07243, 2021.
  • [18] Valentin De Bortoli, James Thornton, Jeremy Heng, and Arnaud Doucet. Diffusion schrödinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34, 2021.
  • [19] George Deligiannidis, Valentin De Bortoli, and Arnaud Doucet. Quantitative uniform stability of the iterative proportional fitting procedure. arXiv preprint arXiv:2108.08129, 2021.
  • [20] Hacene Djellout, Arnaud Guillin, and Liming Wu. Transportation cost-information inequalities and applications to random dynamical systems and diffusions. The Annals of Probability, 32(3B):2702–2732, 2004.
  • [21] Joseph Doob. Conditional Brownian motion and the boundary limits of harmonic functions. Bulletin de la Société Mathématique de France, 85:431–458, 1957.
  • [22] Andreas Eberle. Reflection couplings and contraction rates for diffusions. Probability Theory and Related Fields, 166(3-4):851–886, 2016.
  • [23] Stephan Eckstein and Marcel Nutz. Quantitative stability of regularized optimal transport and convergence of sinkhorn’s algorithm. arXiv preprint arXiv:2110.06798, 2021.
  • [24] Max Fathi, Nathael Gozlan, and Maxime Prodhomme. A proof of the caffarelli contraction theorem via entropic regularization. Calculus of Variations and Partial Differential Equations, 59(96), 2020.
  • [25] Ivan Gentil, Christian Léonard, and Luigia Ripani. Dynamical aspects of the generalized schrödinger problem via otto calculus–a heuristic point of view. Revista Matemática Iberoamericana, 36(4):1071–1112, 2020.
  • [26] Promit Ghosal and Marcel Nutz. On the convergence rate of sinkhorn’s algorithm. arXiv preprint 2212.06000, 2022.
  • [27] Natalie Grunewald, Felix Otto, Cédric Villani, and Maria G Westdickenberg. A two-scale approach to logarithmic sobolev inequalities and the hydrodynamic limit. In Annales de l’IHP Probabilités et statistiques, volume 45, pages 302–351, 2009.
  • [28] Richard Holley and Daniel W Stroock. Logarithmic sobolev inequalities and stochastic ising models. 1986.
  • [29] Christian Léonard. Girsanov theory under a finite entropy condition. In C. Donati-Martin, A. Lejay, and A. Rouault, editors, Séminaire de Probabilités XLIV, volume 2046 of Lecture Notes in Mathematics, pages 429–465. Springer, 2012.
  • [30] Christian Léonard. Some properties of path measures. In Séminaire de Probabilités XLVI, pages 207–230. Springer, 2014.
  • [31] Christian Léonard. A survey of the Schrödinger problem and some of its connections with optimal transport. Discrete and Continuous Dynamical Systems, 34(4):1533–1574, 2014.
  • [32] Torgny Lindvall and L Cris G Rogers. Coupling of multidimensional diffusions by reflection. The Annals of Probability, pages 860–872, 1986.
  • [33] Dan Mikulincer and Yair Shenfeld. The brownian transport map. arXiv preprint arXiv:2111.11521, 2021.
  • [34] Dan Mikulincer and Yair Shenfeld. On the Lipschitz properties of transportation along heat flows. arXiv preprint arXiv:2201.01382, 2022.
  • [35] Marcel Nutz and Johannes Wiesel. Entropic optimal transport: Convergence of potentials. Probability Theory and Related Fields, 184(1):401–424, 2022.
  • [36] Shige Peng. A general stochastic maximum principle for optimal control problems. SIAM J. Control Optim., 28(4):966–979, 1990.
  • [37] Gabriel Peyré and Marco Cuturi. Computational optimal transport. Foundations and Trends in Machine Learning, 11(5-6):355–607, 2019.
  • [38] Erwin Schrödinger. Über die Umkehrung der Naturgesetze. Sitzungsberichte Preuss. Akad. Wiss. Berlin. Phys. Math., 144:144–153, 1931.
  • [39] Yuyang Shi, Valentin De Bortoli, George Deligiannidis, and Arnaud Doucet. Conditional simulation using diffusion schrödinger bridges. arXiv preprint arXiv:2202.13460, 2022.