This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A dual formula for the noncommutative transport distance

Melchior Wirth IST Austria
Am Campus 1
3400 Klosterneuburg
Austria
melchior.wirth@ist.ac.at
Abstract.

In this article we study the noncommutative transport distance introduced by Carlen and Maas and its entropic regularization defined by Becker and Li. We prove a duality formula that can be understood as a quantum version of the dual Benamou–Brenier formulation of the Wasserstein distance in terms of subsolutions of the Hamilton–Jacobi–Bellmann equation.

Introduction

The theory of optimal transport [Vil03, Vil09] has experienced rapid growth in recent years with applications in diverse fields across pure and applied mathematics. Along with this growth came a lot of interest in extending the methods of optimal transport beyond the scope of its original formulation as an optimization problem for the transport cost between two probability measures.

One such extension deals with “quantum spaces”, where the probability measures are replaced by density matrices or density operators. Most of the work dealing with quantum optimal transport in this sense can be grouped into one of the following two categories. The first approach (see e.g. [CM14, CM17, MM17, CGT18, RD19, BV20, CM20]) relies on a noncommutative analog of the Benamou–Brenier formulation [BB00] of the Wasserstein distance for probability measures on Euclidean space

W22(μ,ν)=inf{01n|vt|2𝑑ρt𝑑t:ρ0=μ,ρ1=ν,ρ˙t+(ρtvt)=0}.W_{2}^{2}(\mu,\nu)=\inf\left\{\int_{0}^{1}\int_{\mathbb{R}^{n}}\lvert v_{t}\rvert^{2}\,d\rho_{t}\,dt:\rho_{0}=\mu,\rho_{1}=\nu,\dot{\rho}_{t}+\nabla\cdot(\rho_{t}v_{t})=0\right\}.

This approach has proven fruitful in applications to noncommutative functional inequalities, similar in spirit to the heuristics known as Otto calculus [CM17, CM20, DR20, WZ20].

The second approach (see e.g. [NGT15, GMP16, Pey+19, Duv20, Pal+20, DPT21]) seeks to find a suitable noncommutative analog of the Monge–Kantorovich formulation [Kan42] of the Wasserstein distance via couplings (or transport plans):

Wpp(μ,ν)=inf{X×Xdp(x,y)𝑑π(x,y):(pr1)#π=μ,(pr2)#π=ν}.W_{p}^{p}(\mu,\nu)=\inf\left\{\int_{X\times X}d^{p}(x,y)\,d\pi(x,y):(\mathrm{pr}_{1})_{\#}\pi=\mu,(\mathrm{pr}_{2})_{\#}\pi=\nu\right\}.

This approach also allows to consider a quantum version of Monge–Kantorovich problem for arbitrary cost functions. So far, possible connections between these two approaches in the quantum world stay elusive.

The focus of this article lies on the noncommutative transport distance 𝒲\mathcal{W} introduced in the first approach. More precisely, we prove a dual formula that is a noncommutative analog of the expression of the classical L2L^{2}-Wasserstein distance in terms of subsolutions of the Hamilton–Jacobi equation [OV00, BGL01]

W22(μ,ν)=12inf{nu1𝑑μnu0𝑑ν:u˙t+12|ut|20}.W_{2}^{2}(\mu,\nu)=\frac{1}{2}\inf\left\{\int_{\mathbb{R}^{n}}u_{1}\,d\mu-\int_{\mathbb{R}^{n}}u_{0}\,d\nu:\dot{u}_{t}+\frac{1}{2}\lvert\nabla u_{t}\rvert^{2}\leq 0\right\}.

This result yields a noncommutative version of the dual formula obtained independently by Erbar, Maas and the author [EMW19] and Gangbo, Li and Mou [GLM19] for the Wasserstein-like transport distance on graphs. In fact, we prove a dual formula that is not only valid for the metric 𝒲\mathcal{W}, but also for the entropic regularization recently introduced by Becker–Li [BL21].

With the notation introduced in the next section, the main result of this article reads as follows.

Theorem.

Let σMn()\sigma\in M_{n}(\mathbb{C}) be an invertible density matrix and (Pt)(P_{t}) an ergodic quantum Markov semigroup on Mn()M_{n}(\mathbb{C}) that satisfies the σ\sigma-DBC. The entropic regularization 𝒲ε\mathcal{W}_{\varepsilon} of noncommutative transport distance induced by (Pt)(P_{t}) satisfies the following dual formula:

12𝒲ε2(ρ0,ρ1)=sup{τ(A(1)ρ1A(0)ρ0)A𝖧𝖩𝖡ε1}.\frac{1}{2}\mathcal{W}_{\varepsilon}^{2}(\rho_{0},\rho_{1})=\sup\{\tau(A(1)\rho_{1}-A(0)\rho_{0})\mid A\in\mathsf{HJB}^{1}_{\varepsilon}\}.

Here 𝖧𝖩𝖡ε1\mathsf{HJB}^{1}_{\varepsilon} stands for the set of all Hamilton–Jacobi–Bellmann subsolutions, a suitable noncommutative variant of solutions of the differential inequality

u˙(t)+12|u(t)|2εΔu(t)0.\dot{u}(t)+\frac{1}{2}\lvert\nabla u(t)\rvert^{2}-\varepsilon\Delta u(t)\leq 0.

Other metrics similar to 𝒲\mathcal{W} also occur in the literature, most notably the one called the “anticommutator case” in [CGT18, Che+20, BL21]. In [Wir18, CM20], a class of such metrics was studied in a systematic way, and our main theorem applies in fact to this wider class of metrics. For the anticommutator case, this duality formula was obtained before in [Che+20].

There are still some very natural questions left open. For one, we do not discuss the existence of optimizers. While for the primal problem this follows from a standard compactness argument, this question is more delicate for the dual problem, even when dealing with probability densities on discrete spaces instead of density matrices, and one has to relax the problem to obtain maximizers (see [GLM19, Sections 6–7]).

Another interesting direction would be to extend the duality result from matrix algebras to infinite-dimensional systems. While a definition of the metric 𝒲\mathcal{W} for quantum Markov semigroups on semi-finite von Neumann algebras is available [Hor18, Wir18], the problem of duality seems to be much harder to address. Even for abstract diffusion semigroups, the best known result only shows that the primal distance is the upper length distance associated with the dual distance and leaves the question of equality open [AES16, Proposition 10.11].

Acknowledgments

The author wants to thank Jan Maas for helpful comments. He acknowledges support from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 716117) and from the Austrian Science Fund (FWF) through project F65.

1. Setting and basic definitions

In this section we introduce basic facts and definitions about quantum Markov semigroups that will be used later on. In particular, we review the definition of the noncommutative transport distance from [CM17] and its entropic regularization introduced in [BL21]. Our notation mostly follows [CM17, CM20].

Let Mn()M_{n}(\mathbb{C}) denote the complex n×nn\times n matrices and let 𝒜\mathcal{A} be a unital \ast-subalgebra of Mn()M_{n}(\mathbb{C}). Let 𝒜h\mathcal{A}_{h} denote the self-adjoint part of 𝒜\mathcal{A}. We write τ\tau for the normalized trace on Mn()M_{n}(\mathbb{C}) and 𝒜\mathfrak{H}_{\mathcal{A}} for the Hilbert space formed by equipping 𝒜\mathcal{A} with the GNS inner product

,A:𝒜×𝒜,(A,B)τ(AB).\langle\cdot,\cdot\rangle_{\mathfrak{H}_{A}}\colon\mathcal{A}\times\mathcal{A}\to\mathbb{C},\,(A,B)\mapsto\tau(A^{\ast}B).

The adjoint of a linear operator 𝒦:𝒜𝒜\mathscr{K}\colon\mathfrak{H}_{\mathcal{A}}\to\mathfrak{H}_{\mathcal{A}} is denoted by 𝒦\mathscr{K}^{\dagger}.

We write 𝔖(𝒜)\mathfrak{S}(\mathcal{A}) for the set of all density matrices on 𝒜\mathcal{A}, that is, all positive elements ρ𝒜\rho\in\mathcal{A} with τ(ρ)=1\tau(\rho)=1. The subset of invertible density matrices is denoted by 𝔖+(𝒜)\mathfrak{S}_{+}(\mathcal{A}).

A quantum Markov semigroup (QMS) on 𝒜\mathcal{A} is a family (Pt)t0(P_{t})_{t\geq 0} of linear operators on 𝒜\mathcal{A} that satisfy the following conditions:

  • PtP_{t} is unital and completely positive for every t0t\geq 0,

  • P0=id𝒜P_{0}=\mathrm{id}_{\mathcal{A}}, Ps+t=PsPtP_{s+t}=P_{s}P_{t} for all s,t0s,t\geq 0,

  • tPtt\mapsto P_{t} is continuous.

We consider a QMS (Pt)(P_{t}) on 𝒜\mathcal{A} which extends to a QMS on Mn()M_{n}(\mathbb{C}) satisfying the σ\sigma-DBC for some density matrix σ𝔖+(A)\sigma\in\mathfrak{S}_{+}(A), that is,

τ((PtA)Bσ)=τ(A(PtB)σ)\tau((P_{t}A)^{\ast}B\sigma)=\tau(A^{\ast}(P_{t}B)\sigma)

for A,B𝒜A,B\in\mathcal{A}. Let \mathscr{L} denote the generator of (Pt)(P_{t}), that is, the linear operator on 𝒜\mathcal{A} given by

(A)=limt0PtAAt.\mathscr{L}(A)=\lim_{t\searrow 0}\frac{P_{t}A-A}{t}.

We further assume that (Pt)(P_{t}) is ergodic (or primitive), that is, the kernel of \mathscr{L} is one-dimensional.

By Alicki’s theorem [Ali76, Theorem 3], [CM17, Theorem 3.1] there exists a finite set 𝒥\mathcal{J}, real numbers ωj\omega_{j} for j𝒥j\in\mathcal{J} and VjMn()V_{j}\in M_{n}(\mathbb{C}) for j𝒥j\in\mathcal{J} with the following properties:

  • τ(VjVk)=δjk\tau(V_{j}^{\ast}V_{k})=\delta_{jk} for j,k𝒥j,k\in\mathcal{J},

  • τ(Vj)=0\tau(V_{j})=0 for j𝒥j\in\mathcal{J},

  • {Vjj𝒥}={Vjj𝒥}\{V_{j}\mid j\in\mathcal{J}\}=\{V_{j}^{\ast}\mid j\in\mathcal{J}\},

  • σVjσ1=eωjVj\sigma V_{j}\sigma^{-1}=e^{-\omega_{j}}V_{j} for j𝒥j\in\mathcal{J}

such that

(A)=j𝒥(eωj/2Vj[A,Vj]eωj/2[A,Vj]Vj)\mathscr{L}(A)=\sum_{j\in\mathcal{J}}\left(e^{-\omega_{j}/2}V_{j}^{\ast}[A,V_{j}]-e^{\omega_{j}/2}[A,V_{j}]V_{j}^{\ast}\right)

for A𝒜A\in\mathcal{A}.

The numbers ωj\omega_{j} are called Bohr frequencies of \mathscr{L} and are uniquely determined by (Pt)(P_{t}).

The matrices VjV_{j} are not uniquely determined by (Pt)(P_{t}) and σ\sigma, but in the following we will fix a set {Vjj𝒥}\{V_{j}\mid j\in\mathcal{J}\} that satisfies the preceding conditions.

Let

𝒜,𝒥=j𝒥𝒜(j),\mathfrak{H}_{\mathcal{A},\mathcal{J}}=\bigoplus_{j\in\mathcal{J}}\mathfrak{H}_{\mathcal{A}}^{(j)},

where 𝒜(j)\mathfrak{H}_{\mathcal{A}}^{(j)} is a copy of 𝒜\mathfrak{H}_{\mathcal{A}} for j𝒥j\in\mathcal{J}. We write j\partial_{j} for [Vj,][V_{j},\cdot\,] and

:𝒜𝒜,𝒥,(A)=(j(A))j𝒥.\nabla\colon\mathfrak{H}_{\mathcal{A}}\to\mathfrak{H}_{\mathcal{A},\mathcal{J}},\,\nabla(A)=(\partial_{j}(A))_{j\in\mathcal{J}}.

We write div\operatorname{div} for the adjoint of -\nabla, that is,

div=j𝒥j.\operatorname{div}=-\sum_{j\in\mathcal{J}}\partial_{j}^{\dagger}.

For X𝒜+X\in\mathcal{A}_{+} and α\alpha\in\mathbb{R} define

[X]α:𝒜𝒜,[X]α(A)=01eα(s1/2)XsAX1s𝑑s.[X]_{\alpha}\colon\mathfrak{H}_{\mathcal{A}}\to\mathfrak{H}_{\mathcal{A}},\,[X]_{\alpha}(A)=\int_{0}^{1}e^{\alpha(s-1/2)}X^{s}AX^{1-s}\,ds.

Given α=(αj)j𝒥\vec{\alpha}=(\alpha_{j})_{j\in\mathcal{J}}, we define

[X]α:𝒜,𝒥𝒜,𝒥,(Aj)j𝒥([X]αjAj)j𝒥.[X]_{\vec{\alpha}}\colon\mathfrak{H}_{\mathcal{A},\mathcal{J}}\to\mathfrak{H}_{\mathcal{A},\mathcal{J}},\,(A_{j})_{j\in\mathcal{J}}\mapsto([X]_{\alpha_{j}}A_{j})_{j\in\mathcal{J}}.

Now let ω=(ωj)j𝒥\vec{\omega}=(\omega_{j})_{j\in\mathcal{J}} with the Bohr frequencies ωj\omega_{j} of \mathscr{L}. For ε0\varepsilon\geq 0 we write 𝖢𝖤ε(ρ0,ρ1)\mathsf{CE}_{\varepsilon}(\rho_{0},\rho_{1}) for the set of all pairs (ρ,𝐕)(\rho,\mathbf{V}) such that ρH1([0,1];𝔖+(𝒜))\rho\in H^{1}([0,1];\mathfrak{S}_{+}(\mathcal{A})) with ρ(0)=ρ0\rho(0)=\rho_{0}, ρ(1)=ρ1\rho(1)=\rho_{1}, 𝐕L2([0,1];𝒜,𝒥)\mathbf{V}\in L^{2}([0,1];\mathfrak{H}_{\mathcal{A},\mathcal{J}}) and

ρ˙(t)+div𝐕(t)=ερ(t)\dot{\rho}(t)+\operatorname{div}\mathbf{V}(t)=\varepsilon\mathscr{L}^{\dagger}\rho(t)

for a.e. t[0,1]t\in[0,1].

We define a metric 𝒲ε\mathcal{W}_{\varepsilon} on 𝔖+(𝒜)\mathfrak{S}_{+}(\mathcal{A}) by

𝒲ε2(ρ0,ρ1)=inf(ρ,𝐕)𝖢𝖤ε(ρ0,ρ1)01𝐕(t),[ρ(t)]ω1𝐕(t)𝑑t.\mathcal{W}_{\varepsilon}^{2}(\rho_{0},\rho_{1})=\inf_{(\rho,\mathbf{V})\in\mathsf{CE}_{\varepsilon}(\rho_{0},\rho_{1})}\int_{0}^{1}\langle\mathbf{V}(t),[\rho(t)]_{\vec{\omega}}^{-1}\mathbf{V}(t)\rangle\,dt.

A standard mollification argument shows that the infimum can equivalently be taken over (ρ,𝐕)𝖢𝖤ε(ρ0,ρ1)(\rho,\mathbf{V})\in\mathsf{CE}_{\varepsilon}(\rho_{0},\rho_{1}) with ρC([0,1];𝔖+(𝒜))\rho\in C^{\infty}([0,1];\mathfrak{S}_{+}(\mathcal{A})).

For ε=0\varepsilon=0, this is the noncommutative transport distance 𝒲\mathcal{W} introduced in [CM17] (as distance function associated with a Riemannian metric on 𝔖(𝒜)+\mathfrak{S}(\mathcal{A})_{+}), and for ε>0\varepsilon>0, this is the entropic regularization of 𝒲\mathcal{W} introduced in [BL21].

By a substitution one can reformulate the minimization problem for 𝒲ε\mathcal{W}_{\varepsilon} in such a way that the constraint becomes independent from ε\varepsilon. For that purpose define the relative entropy of ρ𝔖+(𝒜)\rho\in\mathfrak{S}_{+}(\mathcal{A}) with respect to σ\sigma by

D(ρσ)=τ(ρ(logρlogσ))D(\rho\|\sigma)=\tau(\rho(\log\rho-\log\sigma))

and the Fisher information of ρ𝔖+(𝒜)\rho\in\mathfrak{S}_{+}(\mathcal{A}) by

(ρ)=[ρ]ω(logρlogσ),(logρlogσ)𝒜,𝒥.\mathcal{I}(\rho)=\langle[\rho]_{\vec{\omega}}\nabla(\log\rho-\log\sigma),\nabla(\log\rho-\log\sigma)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}.

According to [BL21, Theorem 1], one has

𝒲ε2(ρ0,ρ1)\displaystyle\mathcal{W}_{\varepsilon}^{2}(\rho_{0},\rho_{1}) =inf(ρ,𝐖)𝖢𝖤0(ρ0,ρ1)01(𝐖(t),[ρ(t)]ω1𝐖(t)+ε2(ρ(t)))𝑑t\displaystyle=\inf_{(\rho,\mathbf{W})\in\mathsf{CE}_{0}(\rho_{0},\rho_{1})}\int_{0}^{1}(\langle\mathbf{W}(t),[\rho(t)]_{\vec{\omega}}^{-1}\mathbf{W}(t)\rangle+\varepsilon^{2}\mathcal{I}(\rho(t)))\,dt
+2ε(D(ρ1σ)D(ρ0σ)).\displaystyle\quad+2\varepsilon(D(\rho_{1}\|\sigma)-D(\rho_{0}\|\sigma)).

The metric 𝒲\mathcal{W} is intimately connected to the relative entropy and therefore well-suited to study its decay properties along the quantum Markov semigroup. For other applications, variants of the metric 𝒲\mathcal{W} have also proven useful (e.g. [CGT18, Che+20]), for which the operator [ρ]ω[\rho]_{\vec{\omega}} is replaced. A systematic framework of these metrics has been developed in [Wir18, CM20]. It can be conveniently phrased in terms of so-called operator connections.

Let HH be an infinite-dimensional Hilbert space. A map Λ:B(H)+×B(H)+B(H)+\Lambda\colon B(H)_{+}\times B(H)_{+}\to B(H)_{+} is called an operator connection [KA80] if

  • ACA\leq C and BDB\leq D imply Λ(A,B)Λ(C,D)\Lambda(A,B)\leq\Lambda(C,D),

  • CΛ(A,B)CΛ(CAC,CBC)C\Lambda(A,B)C\leq\Lambda(CAC,CBC),

  • AnAA_{n}\searrow A, BnBB_{n}\searrow B imply Λ(An,Bn)Λ(A,B)\Lambda(A_{n},B_{n})\searrow\Lambda(A,B).

For example, for every α\alpha\in\mathbb{R} the map

Λα:(A,B)01eα(s1/2)AsB1s𝑑s\Lambda_{\alpha}\colon(A,B)\mapsto\int_{0}^{1}e^{\alpha(s-1/2)}A^{s}B^{1-s}\,ds

is an operator connection.

It can be shown that every operator connection Λ\Lambda satisfies

UΛ(A,B)U=Λ(UAU,UBU)U^{\ast}\Lambda(A,B)U=\Lambda(U^{\ast}AU,U^{\ast}BU)

for A,BB(H)+A,B\in B(H)_{+} and unitary UB(H)U\in B(H) [KA80, Section 2]. Embedding n\mathbb{C}^{n} into HH, one can view A,BMn()A,B\in M_{n}(\mathbb{C}) as bounded linear operators on HH, and the unitary invariance of Λ\Lambda ensures that Λ(A,B)\Lambda(A,B) does not depend on the embedding of n\mathbb{C}^{n} into HH.

For X𝒜X\in\mathcal{A} define

L(X)\displaystyle L(X) :𝒜𝒜,AXA\displaystyle\colon\mathfrak{H}_{\mathcal{A}}\to\mathfrak{H}_{\mathcal{A}},\,A\mapsto XA
R(X)\displaystyle R(X) :𝒜𝒜,AAX.\displaystyle\colon\mathfrak{H}_{\mathcal{A}}\to\mathfrak{H}_{\mathcal{A}},\,A\mapsto AX.

With this notation we can write

[ρ]Λ=Λ(L(ρ),R(ρ)).[\rho]_{\Lambda}=\Lambda(L(\rho),R(\rho)).

Since L(X)L(X) and R(X)R(X) commute, we have

(\ast) Λ(L(X),R(X))A=k,l=1nΛ(λk,λl)EkAEl\Lambda(L(X),R(X))A=\sum_{k,l=1}^{n}\Lambda(\lambda_{k},\lambda_{l})E_{k}AE_{l}

for X𝒜+X\in\mathcal{A}_{+} and A𝒜A\in\mathfrak{H}_{\mathcal{A}}, where (λk)(\lambda_{k}) are the eigenvalues of XX and EkE_{k} the corresponding spectral projections.

More generally let Λ=(Λj)j𝒥\vec{\Lambda}=(\Lambda_{j})_{j\in\mathcal{J}} be a family of operator connections and define

[ρ]Λj\displaystyle[\rho]_{\Lambda_{j}} =Λj(L(ρ),R(ρ)),\displaystyle=\Lambda_{j}(L(\rho),R(\rho)),
[ρ]Λ\displaystyle[\rho]_{\vec{\Lambda}} =j𝒥[ρ]Λj.\displaystyle=\bigoplus_{j\in\mathcal{J}}[\rho]_{\Lambda_{j}}.

Then one can define a distance 𝒲Λ\mathcal{W}_{\vec{\Lambda}} by

𝒲Λ,ε(ρ0,ρ0)2=inf(ρ,𝐕)𝖢𝖤ε(ρ0,ρ1)01[ρ(t)]Λ1𝐕(t),𝐕(t)𝒜,𝒥𝑑t.\displaystyle\mathcal{W}_{\vec{\Lambda},\varepsilon}(\rho_{0},\rho_{0})^{2}=\inf_{(\rho,\mathbf{V})\in\mathsf{CE}_{\varepsilon}(\rho_{0},\rho_{1})}\int_{0}^{1}\langle[\rho(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}(t),\mathbf{V}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt.

.

If Λj=Λωj\Lambda_{j}=\Lambda_{\omega_{j}} as above, then we retain the original metric 𝒲ε\mathcal{W}_{\varepsilon}, while for Λj(A,B)=12(A+B)\Lambda_{j}(A,B)=\frac{1}{2}(A+B) (and ε=0\varepsilon=0) one obtains the distance studied in [CGT18, Che+20].

Later we will make the additional assumption that Λj(A,B)=Λj(B,A)\Lambda_{j^{\ast}}(A,B)=\Lambda_{j}(B,A). It follows from the representation theorem of operator means [KA80] that the class of metrics 𝒲Λ,0\mathcal{W}_{\vec{\Lambda},0} with Λ\vec{\Lambda} subject to this symmetry condition is exactly the class of metrics satisfying Assumptions 7.2 and 9.5 in [CM20].

For technical reasons, it can be useful to allow for curves of density matrices that are not necessarily invertible. For this purpose, we make the following convention: If 𝒦:𝒜,𝒥𝒜,𝒥\mathcal{K}\colon\mathfrak{H}_{\mathcal{A},\mathcal{J}}\to\mathfrak{H}_{\mathcal{A},\mathcal{J}} is a positive operator and 𝐕𝒜,𝒥\mathbf{V}\in\mathfrak{H}_{\mathcal{A},\mathcal{J}}, we define

𝐕,𝒦1𝐕𝒜,𝒥={𝒦𝐖,𝐖𝒜,𝒥if 𝐕(ker𝒦),𝒦𝐖=𝐕,otherwise.\langle\mathbf{V},\mathcal{K}^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}=\begin{cases}\langle\mathcal{K}\mathbf{W},\mathbf{W}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}&\text{if }\mathbf{V}\in(\ker\mathcal{K})^{\perp},\mathcal{K}\mathbf{W}=\mathbf{V},\\ \infty&\text{otherwise}.\end{cases}

Since (ker𝒦)=ran𝒦(\ker\mathcal{K})^{\perp}=\operatorname{ran}\mathcal{K} and 𝒦\mathcal{K} is injective on (ker𝒦)(\ker\mathcal{K})^{\perp}, the element 𝐖\mathbf{W} in this definition exists and is unique. Moreover, this convention is clearly consistent with the usual definition if 𝒦\mathcal{K} is invertible.

Alternatively, as a direct consequence of the spectral theorem, 𝐕,𝒦1𝐕𝒜,𝒥\langle\mathbf{V},\mathcal{K}^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}} can be expressed as

𝐕,𝒦1𝐕𝒜,𝒥=k=1m1λk|𝐕,𝐖k𝒜,𝒥|2,\langle\mathbf{V},\mathcal{K}^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}=\sum_{k=1}^{m}\frac{1}{\lambda_{k}}\lvert\langle\mathbf{V},\mathbf{W}_{k}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\rvert^{2},

where λ1,,λm\lambda_{1},\dots,\lambda_{m} are the eigenvalues of 𝒦\mathcal{K} and 𝒲1,,Wm\mathcal{W}_{1},\dots,W_{m} an orthonormal basis of corresponding eigenvectors.

Lemma 1.1.

If 𝒦n:𝒜,𝒥𝒜,𝒥\mathcal{K}_{n}\colon\mathfrak{H}_{\mathcal{A},\mathcal{J}}\to\mathfrak{H}_{\mathcal{A},\mathcal{J}}, nn\in\mathbb{N}, are positive invertible operators that converge monotonically decreasing to 𝒦\mathcal{K}, then

𝐕,𝒦n1𝐕𝒜,𝒥𝐕,𝒦1𝐕𝒜,𝒥\langle\mathbf{V},\mathcal{K}_{n}^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\nearrow\langle\langle\mathbf{V},\mathcal{K}^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}

for all 𝐕𝒜,𝒥\mathbf{V}\in\mathfrak{H}_{\mathcal{A},\mathcal{J}}.

Proof.

From the spectral expression it is easy to see that

𝐕,𝒦n1𝐕𝒜,𝒥=supδ>0𝐕,(𝒦n+δ)1𝐕𝒜,𝒥\langle\mathbf{V},\mathcal{K}_{n}^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}=\sup_{\delta>0}\langle\mathbf{V},(\mathcal{K}_{n}+\delta)^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}

and the same for 𝒦n\mathcal{K}_{n} replaced by 𝒦\mathcal{K}. Moreover, since 𝒦n𝒦\mathcal{K}_{n}\searrow\mathcal{K}, we have (𝒦n+δ)1(𝒦+δ)1(\mathcal{K}_{n}+\delta)^{-1}\nearrow(\mathcal{K}+\delta)^{-1}. Thus

𝐕,𝒦1𝐕𝒜,𝒥\displaystyle\langle\mathbf{V},\mathcal{K}^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}} =supδ>0𝐕,(𝒦+δ)1𝐕𝒜,𝒥\displaystyle=\sup_{\delta>0}\langle\mathbf{V},(\mathcal{K}+\delta)^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}
=supδ>0supn𝐕,(𝒦n+δ)1𝐕𝒜,𝒥\displaystyle=\sup_{\delta>0}\sup_{n\in\mathbb{N}}\langle\mathbf{V},(\mathcal{K}_{n}+\delta)^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}
=supnsupδ>0𝐕,(𝒦n+δ)1𝐕𝒜,𝒥\displaystyle=\sup_{n\in\mathbb{N}}\sup_{\delta>0}\langle\mathbf{V},(\mathcal{K}_{n}+\delta)^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}
=supn𝐕,𝒦n1𝐕𝒜,𝒥.\displaystyle=\sup_{n\in\mathbb{N}}\langle\mathbf{V},\mathcal{K}_{n}^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}.

Since (𝐕,𝒦n1𝐕𝒜,𝒥)(\langle\mathbf{V},\mathcal{K}_{n}^{-1}\mathbf{V}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}) is monotonically increasing, this settles the claim. ∎

Write 𝖢𝖤ε(ρ0,ρ1)\mathsf{CE}_{\varepsilon}^{\prime}(\rho_{0},\rho_{1}) for the set of all pairs (ρ,𝐕)(\rho,\mathbf{V}) such that ρH1([0,1];𝔖(𝒜))\rho\in H^{1}([0,1];\mathfrak{S}(\mathcal{A})) with ρ(0)=ρ0\rho(0)=\rho_{0}, ρ(1)=ρ1\rho(1)=\rho_{1}, 𝐕L2([0,1];𝒜,𝒥)\mathbf{V}\in L^{2}([0,1];\mathfrak{H}_{\mathcal{A},\mathcal{J}}) and

ρ˙(t)+div𝐕(t)=ερ(t)\dot{\rho}(t)+\operatorname{div}\mathbf{V}(t)=\varepsilon\mathscr{L}^{\dagger}\rho(t)

for a.e. t[0,1]t\in[0,1]. The only difference to the definition of 𝖢𝖤ε(ρ0,ρ1)\mathsf{CE}_{\varepsilon}(\rho_{0},\rho_{1}) is that ρ(t)\rho(t) is not assumed to be invertible.

Proposition 1.2.

For ρ0,ρ1𝔖+(𝒜)\rho_{0},\rho_{1}\in\mathfrak{S}_{+}(\mathcal{A}) we have

𝒲Λ,ε2(ρ0,ρ1)=inf(ρ,𝐕)𝖢𝖤ε(ρ0,ρ1)01𝐕(t),[ρ(t)]Λ1𝐕(t)𝑑t.\mathcal{W}_{\vec{\Lambda},\varepsilon}^{2}(\rho_{0},\rho_{1})=\inf_{(\rho,\mathbf{V})\in\mathsf{CE}_{\varepsilon}^{\prime}(\rho_{0},\rho_{1})}\int_{0}^{1}\langle\mathbf{V}(t),[\rho(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}(t)\rangle\,dt.
Proof.

It suffices to show that every curve (ρ,𝐕)𝖢𝖤ε(ρ0,ρ1)(\rho,\mathbf{V})\in\mathsf{CE}_{\varepsilon}^{\prime}(\rho_{0},\rho_{1}) can be approximated by curves in 𝖢𝖤ε(ρ0,ρ1)\mathsf{CE}_{\varepsilon}(\rho_{0},\rho_{1}) such that the action integrals converge.

For that purpose let

ρδ:[0,1]𝔖+(𝒜),t{(1t)ρ0+t1𝒜if t[0,δ],(1δ)ρ((12δ)1(tδ))+δ1𝒜if t(δ,1δ),tρ1+(1t)1𝒜if t[1δ,1].\rho^{\delta}\colon[0,1]\to\mathfrak{S}_{+}(\mathcal{A}),\,t\mapsto\begin{cases}(1-t)\rho_{0}+t1_{\mathcal{A}}&\text{if }t\in[0,\delta],\\ (1-\delta)\rho((1-2\delta)^{-1}(t-\delta))+\delta 1_{\mathcal{A}}&\text{if }t\in(\delta,1-\delta),\\ t\rho_{1}+(1-t)1_{\mathcal{A}}&\text{if }t\in[1-\delta,1].\end{cases}

Since (Pt)(P_{t}) is assumed to be ergodic, for t[0,δ]t\in[0,\delta] there exists 𝐕δ(t)=Uδ(t)\mathbf{V}^{\delta}(t)=\nabla U^{\delta}(t) such that

ρ˙δ(t)+div𝐕δ(t)=1ρ0+div𝐕δ(t)=ε(1t)ρ0=ερδ(t)\dot{\rho}^{\delta}(t)+\operatorname{div}\mathbf{V}^{\delta}(t)=1-\rho_{0}+\operatorname{div}\mathbf{V}^{\delta}(t)=\varepsilon(1-t)\mathscr{L}^{\dagger}\rho_{0}=\varepsilon\mathscr{L}^{\dagger}\rho^{\delta}(t)

and

𝐕δ(t)𝒜,𝒥Cε(1t)ρ01+ρ0𝒜\lVert\mathbf{V}^{\delta}(t)\rVert_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\leq C\lVert\varepsilon(1-t)\mathscr{L}^{\dagger}\rho_{0}-1+\rho_{0}\rVert_{\mathfrak{H}_{\mathcal{A}}}

with a constant C>0C>0 (depending only on the spectral gap of \mathscr{L}). In particular, it is bounded independent of δ\delta.

Moreover, if λ\lambda is the smallest eigenvalue of ρ0\rho_{0}, which is strictly positive by assumption, then ρδ(t)((1t)λ+t)1𝒜λ1𝒜\rho^{\delta}(t)\geq((1-t)\lambda+t)1_{\mathcal{A}}\geq\lambda 1_{\mathcal{A}}.

Thus

0δ𝐕δ(t),[ρδ(t)]Λ1𝐕δ(t)𝒜,𝒥𝑑t\displaystyle\int_{0}^{\delta}\langle\mathbf{V}^{\delta}(t),[\rho^{\delta}(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}^{\delta}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt 0δ𝐕δ(t),[λ1𝒜]Λ1𝐕δ(t)𝒜,𝒥𝑑t\displaystyle\leq\int_{0}^{\delta}\langle\mathbf{V}^{\delta}(t),[\lambda 1_{\mathcal{A}}]_{\vec{\Lambda}}^{-1}\mathbf{V}^{\delta}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt
[λ1𝒜]Λ10δ𝐕δ(t)𝒜,𝒥2𝑑t\displaystyle\leq\lVert[\lambda 1_{\mathcal{A}}]_{\vec{\Lambda}}^{-1}\rVert\int_{0}^{\delta}\lVert\mathbf{V}^{\delta}(t)\rVert_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}^{2}\,dt
0\displaystyle\to 0

as δ0\delta\to 0. Similarly one can show

limδ01δ1𝐕δ(t),[ρδ(t)]Λ1𝐕δ(t)𝒜,𝒥𝑑t=0.\lim_{\delta\to 0}\int_{1-\delta}^{1}\langle\mathbf{V}^{\delta}(t),[\rho^{\delta}(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}^{\delta}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt=0.

By the same argument as above, for a.e. t(δ,1δ)t\in(\delta,1-\delta) there exists a unique gradient 𝐖δ(t)\mathbf{W}^{\delta}(t) such that

div𝐖δ(t)=2δε12δρ((12δ)1(tδ))\operatorname{div}\mathbf{W}^{\delta}(t)=-\frac{2\delta\varepsilon}{1-2\delta}\mathscr{L}^{\dagger}\rho((1-2\delta)^{-1}(t-\delta))

and

𝐖δ(t)𝒜,𝒥2δε12δρ((12δ)1(tδ))𝒜,𝒥.\lVert\mathbf{W}^{\delta}(t)\rVert_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\leq\frac{2\delta\varepsilon}{1-2\delta}\lVert\mathscr{L}^{\dagger}\rho((1-2\delta)^{-1}(t-\delta))\rVert_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}.

Since ρH1([0,1];𝔖(𝒜))C([0,1];𝔖(𝒜))\rho\in H^{1}([0,1];\mathfrak{S}(\mathcal{A}))\subset C([0,1];\mathfrak{S}(\mathcal{A})), the norm on the right side is bounded independent of δ\delta, so that

𝐖δ(t)𝒜,𝒥C~δ\lVert\mathbf{W}^{\delta}(t)\rVert_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\leq\tilde{C}\delta

with a constant C~>0\tilde{C}>0 independent of δ\delta. As ρδ(t)δ1𝒜\rho^{\delta}(t)\geq\delta 1_{\mathcal{A}} for t(δ,1δ)t\in(\delta,1-\delta), this implies

δ1δ𝐖δ(t),[ρδ(t)]Λ1𝐖δ(t)𝒜,𝒥𝑑t\displaystyle\int_{\delta}^{1-\delta}\langle\mathbf{W}^{\delta}(t),[\rho^{\delta}(t)]_{\vec{\Lambda}}^{-1}\mathbf{W}^{\delta}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt 1δδ1δ𝐖δ(t),[1𝒜]Λ1𝐖δ(t)𝒜,𝒥𝑑t\displaystyle\leq\frac{1}{\delta}\int_{\delta}^{1-\delta}\langle\mathbf{W}^{\delta}(t),[1_{\mathcal{A}}]_{\vec{\Lambda}}^{-1}\mathbf{W}^{\delta}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt
C~[1𝒜]Λ1δ\displaystyle\leq\tilde{C}\lVert[1_{\mathcal{A}}]_{\vec{\Lambda}}^{-1}\rVert\delta
0\displaystyle\to 0

as δ0\delta\to 0.

With

𝐕δ(t)=112δ𝐕((12δ)1(tδ))+𝐖δ(t)\mathbf{V}^{\delta}(t)=\frac{1}{1-2\delta}\mathbf{V}((1-2\delta)^{-1}(t-\delta))+\mathbf{W}^{\delta}(t)

we have

ρ˙δ(t)+div𝐕δ(t)=ερδ(t).\dot{\rho}^{\delta}(t)+\operatorname{div}\mathbf{V}^{\delta}(t)=\varepsilon\mathscr{L}\rho^{\delta}(t).

Furthermore,

δ1δ𝐕(tδ12δ),[ρδ(t)]Λ1𝐕(tδ12δ)𝒜,𝒥𝑑t\displaystyle\quad\int_{\delta}^{1-\delta}\left\langle\mathbf{V}\left(\frac{t-\delta}{1-2\delta}\right),[\rho^{\delta}(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}\left(\frac{t-\delta}{1-2\delta}\right)\right\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt
=11δδ1δ𝐕(tδ12δ),[ρ(tδ12δ)+δ1δ]Λ1𝐕(tδ12δ)𝒜,𝒥𝑑t\displaystyle=\frac{1}{1-\delta}\int_{\delta}^{1-\delta}\left\langle\mathbf{V}\left(\frac{t-\delta}{1-2\delta}\right),\left[\rho\left(\frac{t-\delta}{1-2\delta}\right)+\frac{\delta}{1-\delta}\right]_{\vec{\Lambda}}^{-1}\mathbf{V}\left(\frac{t-\delta}{1-2\delta}\right)\right\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt
=12δ1δ01𝐕(s),[ρ(s)+δ1δ]Λ1𝐕(s)𝒜,𝒥𝑑s,\displaystyle=\frac{1-2\delta}{1-\delta}\int_{0}^{1}\langle\mathbf{V}(s),\left[\rho(s)+\frac{\delta}{1-\delta}\right]_{\vec{\Lambda}}^{-1}\mathbf{V}(s)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,ds,

where we used the substitution s=(12δ)1(tδ)s=(1-2\delta)^{-1}(t-\delta) in the last step.

By Lemma 1.1 and the monotone convergence theorem we obtain

limδ0δ1δ𝐕(tδ12δ),[ρδ(t)]Λ1𝐕(tδ12δ)𝒜,𝒥𝑑t\displaystyle\quad\;\lim_{\delta\to 0}\int_{\delta}^{1-\delta}\left\langle\mathbf{V}\left(\frac{t-\delta}{1-2\delta}\right),[\rho^{\delta}(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}\left(\frac{t-\delta}{1-2\delta}\right)\right\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt
=01𝐕(s),[ρ(s)]Λ1𝐕(s)𝒜,𝒥𝑑s.\displaystyle=\int_{0}^{1}\langle\mathbf{V}(s),[\rho(s)]_{\vec{\Lambda}}^{-1}\mathbf{V}(s)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,ds.

Together with the convergence result for 𝐖δ\mathbf{W}^{\delta} from above, this implies

δ1δ𝐕δ(t),[ρδ(t)]Λ1𝐕δ(t)𝒜,𝒥𝑑t01𝐕(t),[ρ(t)]Λ1𝐕(t)𝒜,𝒥𝑑t.\int_{\delta}^{1-\delta}\langle\mathbf{V}^{\delta}(t),[\rho^{\delta}(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}^{\delta}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt\to\int_{0}^{1}\langle\mathbf{V}(t),[\rho(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt.

Altogether we have shown

limδ001𝐕δ(t),[ρδ(t)]Λ1𝐕δ(t)𝒜,𝒥𝑑t=01𝐕(t),[ρ(t)]Λ1𝐕(t)𝒜,𝒥𝑑t.\lim_{\delta\to 0}\int_{0}^{1}\langle\mathbf{V}^{\delta}(t),[\rho^{\delta}(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}^{\delta}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt=\int_{0}^{1}\langle\mathbf{V}(t),[\rho(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt.\qed

2. Real subspaces

Since the proof of the main result relies on convex analysis methods for real Banach spaces, we need to identify suitable real subspaces for our purposes. For 𝒜\mathcal{A} this is simply 𝒜h\mathcal{A}_{h}, but for 𝒜,𝒥\mathfrak{H}_{\mathcal{A},\mathcal{J}} this is less obvious and will be done in the following.

For j𝒥j\in\mathcal{J} denote by jj^{\ast} the unique index in 𝒥\mathcal{J} such that Vj=VjV_{j}^{\ast}=V_{j^{\ast}}. Let ~𝒜(j)\tilde{\mathfrak{H}}_{\mathcal{A}}^{(j)} be the linear span of {XjAA,X𝒜}\{X\partial_{j}A\mid A,X\in\mathcal{A}\}, and define a linear map J:~𝒜(j)~𝒜(j)J\colon\tilde{\mathfrak{H}}_{\mathcal{A}}^{(j)}\to\tilde{\mathfrak{H}}_{\mathcal{A}}^{(j^{\ast})} by

J(XjA)=j(A)X.\displaystyle J(X\partial_{j}A)=\partial_{j^{\ast}}(A^{\ast})X^{\ast}.

By the product rule, (jA)X(\partial_{j}A)X also belongs to ~𝒜,𝒥(j)\tilde{\mathfrak{H}}_{\mathcal{A},\mathcal{J}}^{(j)} and J((jA)X)=Xj(A)J((\partial_{j}A)X)=X^{\ast}\partial_{j^{\ast}}(A^{\ast}).

Lemma 2.1.

The map JJ is anti-unitary.

Proof.

For A,B,X,Y𝒜A,B,X,Y\in\mathcal{A} we have

J(XjA),J(YjB)𝒜\displaystyle\langle J(X\partial_{j}A),J(Y\partial_{j}B)\rangle_{\mathfrak{H}_{\mathcal{A}}} =τ(X(AVjVA)(VjBBVj)Y)\displaystyle=\tau(X(AV_{j}-V_{A})(V_{j}^{\ast}B^{\ast}-B^{\ast}V_{j}^{\ast})Y^{\ast})
=τ((BVjVjB)YX(VjAAVj))\displaystyle=\tau((B^{\ast}V_{j}^{\ast}-V_{j}^{\ast}B^{\ast})Y^{\ast}X(V_{j}A-AV_{j}))
=YjB,XjA.\displaystyle=\langle Y\partial_{j}B,X\partial_{j}A\rangle.\qed

Let

𝒜,𝒥(h)={𝐕j𝒥~𝒜(j)J(𝐕j)=𝐕j}.\mathcal{H}^{(h)}_{\mathcal{A},\mathcal{J}}=\{\mathbf{V}\in\bigoplus_{j\in\mathcal{J}}\tilde{\mathfrak{H}}_{\mathcal{A}}^{(j)}\mid J(\mathbf{V}_{j})=\mathbf{V}_{j^{\ast}}\}.

By the previous lemma, 𝒥,𝒜(h)\mathcal{H}^{(h)}_{\mathcal{J},\mathcal{A}} is a real Hilbert space.

Lemma 2.2.

Let (Λj)j𝒥(\Lambda_{j})_{j\in\mathcal{J}} be a family of operator connections such that Λj(A,B)=Λj(B,A)\Lambda_{j^{\ast}}(A,B)=\Lambda_{j}(B,A) for all j𝒥j\in\mathcal{J}. If A𝒜hA\in\mathcal{A}_{h} and ρ𝔖(𝒜)\rho\in\mathfrak{S}(\mathcal{A}), then A,[ρ]ΛA𝒜,𝒥(h)\nabla A,[\rho]_{\vec{\Lambda}}\nabla A\in\mathcal{H}^{(h)}_{\mathcal{A},\mathcal{J}}.

Proof.

For A\nabla A the statement follows directly from the definitions. For [ρ]ΛA[\rho]_{\vec{\Lambda}}\nabla A first note that

JΛ(L(ρ),R(ρ))=Λ(R(ρ),L(ρ))JJ\Lambda(L(\rho),R(\rho))=\Lambda(R(\rho),L(\rho))J

as a consequence of the spectral representation (\ast1).

Thus

J([ρ]ΛjjA)\displaystyle J([\rho]_{\Lambda_{j}}\partial_{j}A) =JΛj(L(ρ),R(ρ))jA\displaystyle=J\Lambda_{j}(L(\rho),R(\rho))\partial_{j}A
=Λj(R(ρ),L(ρ))JjA\displaystyle=\Lambda_{j}(R(\rho),L(\rho))J\partial_{j}A
=Λj(L(ρ),R(ρ))jA.\displaystyle=\Lambda_{j^{\ast}}(L(\rho),R(\rho))\partial_{j^{\ast}}A.\qed

3. Duality

In this section we prove the duality theorem announced in the introduction. Our strategy follows the same lines as the proof in the commutative case in [EMW19]. It crucially relies on the Rockefellar–Fenchel duality theorem quoted below. Throughout this section we fix a quantum Markov semigroup with generator \mathscr{L} satisfying the σ\sigma-DBC for some σ𝔖+(𝒜)\sigma\in\mathfrak{S}_{+}(\mathcal{A}) and a family (Λj)j𝒥(\Lambda_{j})_{j\in\mathcal{J}} of operator connections such that Λj(A,B)=Λj(B,A)\Lambda_{j^{\ast}}(A,B)=\Lambda_{j}(B,A) for all j𝒥j\in\mathcal{J}.

We need the following definition for the constraint of the dual problem. Here and in the following we write

𝐕,𝐖ρ=𝐕,[ρ]Λ𝐖𝒜,𝒥\langle\mathbf{V},\mathbf{W}\rangle_{\rho}=\langle\mathbf{V},[\rho]_{\vec{\Lambda}}\mathbf{W}\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}

for 𝐕,𝐖𝒜,𝒥\mathbf{V},\mathbf{W}\in\mathfrak{H}_{\mathcal{A},\mathcal{J}} and ρ𝒜+\rho\in\mathcal{A}_{+}.

Definition 3.1.

A function AH1((0,T);𝒜h)A\in H^{1}((0,T);\mathcal{A}_{h}) is said to be a Hamilton–Jacobi–Bellmann subsolution if for a.e. t(0,T)t\in(0,T) we have

τ((A˙(t)+εA(t))ρ)+12A(t)ρ20for all ρ𝔖(A).\displaystyle\tau((\dot{A}(t)+\varepsilon\mathscr{L}A(t))\rho)+\frac{1}{2}\lVert\nabla A(t)\rVert_{\rho}^{2}\leq 0\qquad\text{for all }\rho\in\mathfrak{S}(A).

The set of all Hamilton–Jacobi–Bellmann subsolutions is denoted by 𝖧𝖩𝖡Λ,ε\mathsf{HJB}_{\vec{\Lambda},\varepsilon}.

Our proof will establish equality between the primal and dual problem, but before we begin, let us show that one inequality is actually quite easy to obtain.

Proposition 3.2.

For all ρ0,ρ1𝔖+(𝒜)\rho_{0},\rho_{1}\in\mathfrak{S}_{+}(\mathcal{A}) we have

supA𝖧𝖩𝖡Λ,ετ(A(1)ρ1A(1)ρ0)12inf(ρ,𝐕)𝖢𝖤ε(ρ0,ρ1)01𝐕(t),[ρ(t)]Λ1𝐕(t)𝒜,𝒥𝑑t.\sup_{A\in\mathsf{HJB}_{\vec{\Lambda},\varepsilon}}\tau(A(1)\rho_{1}-A(1)\rho_{0})\leq\frac{1}{2}\inf_{(\rho,\mathbf{V})\in\mathsf{CE}_{\varepsilon}(\rho_{0},\rho_{1})}\int_{0}^{1}\langle\mathbf{V}(t),[\rho(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt.
Proof.

For A𝖧𝖩𝖡Λ,εA\in\mathsf{HJB}_{\vec{\Lambda},\varepsilon} and (ρ,𝐕)𝖢𝖤ε(ρ0,ρ1)(\rho,\mathbf{V})\in\mathsf{CE}_{\varepsilon}(\rho_{0},\rho_{1}) we have

τ(A(1)ρ1A(0)ρ0)\displaystyle\tau(A(1)\rho_{1}-A(0)\rho_{0}) =01τ(A˙(t)ρ(t)+A(t)ρ˙(t))𝑑t\displaystyle=\int_{0}^{1}\tau(\dot{A}(t)\rho(t)+A(t)\dot{\rho}(t))\,dt
01(ετ((A(t))ρ(t))+12A(t)ρ(t)2)𝑑t\displaystyle\leq-\int_{0}^{1}(\varepsilon\tau((\mathscr{L}A(t))\rho(t))+\frac{1}{2}\lVert\nabla A(t)\rVert_{\rho(t)}^{2})\,dt
+01(A(t),𝐕(t)𝒜,𝒥+ετ(A(t)ρ(t)))𝑑t\displaystyle\quad+\int_{0}^{1}(\langle\nabla A(t),\mathbf{V}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}+\varepsilon\tau(A(t)\mathscr{L}^{\dagger}\rho(t)))\,dt
=01[ρ(t)]ω1/2A(t),[ρ(t)]Λ1/2𝐕(t)𝒜,𝒥𝑑t\displaystyle=\int_{0}^{1}\langle[\rho(t)]_{\vec{\omega}}^{1/2}\nabla A(t),[\rho(t)]_{\vec{\Lambda}}^{-1/2}\mathbf{V}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt
1201[ρ(t)]Λ1/2A(t),[ρ(t)]Λ1/2A(t)𝒜,𝒥𝑑t\displaystyle\quad-\frac{1}{2}\int_{0}^{1}\langle[\rho(t)]_{\vec{\Lambda}}^{1/2}\nabla A(t),[\rho(t)]_{\vec{\Lambda}}^{1/2}\nabla A(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt
1201𝐕(t),[ρ(t)]Λ1𝐕(t)𝒜,𝒥𝑑t,\displaystyle\leq\frac{1}{2}\int_{0}^{1}\langle\mathbf{V}(t),[\rho(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt,

where we used A𝖧𝖩𝖡Λ,εA\in\mathsf{HJB}_{\vec{\Lambda},\varepsilon} and (ρ,𝐕)𝖢𝖤ε(ρ0,ρ1)(\rho,\mathbf{V})\in\mathsf{CE}_{\varepsilon}(\rho_{0},\rho_{1}) for the first inequality and Young’s inequality for the second inequality. ∎

To prove actual equality, our crucial tool is the Rockefellar–Fenchel duality theorem (see e.g. [Vil03, Theorem 1.9], which we quote here for the convenience of the reader. Recall that if EE is a (real) normed space, the Legendre–Fenchel transform FF^{\ast} of a proper convex function F:E{}F\colon E\to\mathbb{R}\cup\{\infty\} is defined by

F:E{},F(x)=supxE(x,xF(x)).F^{\ast}\colon E^{\ast}\to\mathbb{R}\cup\{\infty\},\,F^{\ast}(x^{\ast})=\sup_{x\in E}(\langle x^{\ast},x\rangle-F(x)).
Theorem 3.3.

Let EE be a real normed space and F,G:E{}F,G\colon E\to\mathbb{R}\cup\{\infty\} proper convex functions with Legendre–Fenchel transforms F,GF^{\ast},G^{\ast}. If there exists z0Ez_{0}\in E such GG is continuous at z0z_{0} and F(z0),G(z0)<F(z_{0}),G(z_{0})<\infty, then

supzE(F(z)G(z))=minzE(F(z)+G(z)).\sup_{z\in E}(-F(z)-G(z))=\min_{z^{\ast}\in E^{\ast}}(F^{\ast}(z^{\ast})+G^{\ast}(-z^{\ast})).

Before we state the main result, we still need the following useful inequality.

Lemma 3.4.

For any operator connection Λ\Lambda the map

fΛ:𝒜++B(A),A[A]Λf_{\Lambda}\colon\mathcal{A}_{++}\to B(\mathfrak{H}_{A}),\,A\mapsto[A]_{\Lambda}

is smooth and its Fréchet derivative satisfies

dfΛ(B)AfΛ(A)\displaystyle df_{\Lambda}(B)A\geq f_{\Lambda}(A)

for A,B𝒜++A,B\in\mathcal{A}_{++} with equality if A=BA=B.

Proof.

Smoothness of fΛf_{\Lambda} is a consequence of the representation theorem of operator connections [Theorem 3.4][KA80]. For the claim about the Fréchet derivative first note that fΛf_{\Lambda} is concave [KA80, Theorem 3.5]. Therefore d2fΛ(X)[Y,Y]0d^{2}f_{\Lambda}(X)[Y,Y]\leq 0 for all X𝒜++X\in\mathcal{A}_{++} and Y𝒜hY\in\mathcal{A}_{h} by [Han97, Proposition 2.2].

The fundamental theorem of calculus implies

(dfΛ(A)dfΛ(B))(AB)\displaystyle(df_{\Lambda}(A)-df_{\Lambda}(B))(A-B) =01d2fΛ(tA+(1t)B)[AB,AB]𝑑t\displaystyle=\int_{0}^{1}d^{2}f_{\Lambda}(tA+(1-t)B)[A-B,A-B]\,dt
0.\displaystyle\leq 0.

Since fΛf_{\Lambda} is 11-homogeneous, its derivative is 0-homogeneous. Thus, if we replace BB by εB\varepsilon B and let ε\varepsilon\searrow, we obtain

dfΛ(A)AdfΛ(B)A.df_{\Lambda}(A)A\leq df_{\Lambda}(B)A.

Moreover, the 11-homogeneity of fΛf_{\Lambda} implies dfΛ(A)A=fΛ(A)df_{\Lambda}(A)A=f_{\Lambda}(A), which settles the claim. ∎

Theorem 3.5 (Duality formula).

For ρ0,ρ1𝔖+(A)\rho_{0},\rho_{1}\in\mathfrak{S}_{+}(A) we have

12𝒲Λ,ε(ρ0,ρ1)2\displaystyle\frac{1}{2}\mathcal{W}_{\vec{\Lambda},\varepsilon}(\rho_{0},\rho_{1})^{2} =sup{τ(A(1)ρ1)τ(A(0),ρ0):A𝖧𝖩𝖡Λ,ε}\displaystyle=\sup\{\tau(A(1)\rho_{1})-\tau(A(0),\rho_{0}):A\in\mathsf{HJB}_{\vec{\Lambda},\varepsilon}\}
=sup{τ(A(1)ρ1)τ(A(0),ρ0):A𝖧𝖩𝖡Λ,εC([0,1];𝒜)}.\displaystyle=\sup\{\tau(A(1)\rho_{1})-\tau(A(0),\rho_{0}):A\in\mathsf{HJB}_{\vec{\Lambda},\varepsilon}\cap C^{\infty}([0,1];\mathcal{A})\}.
Proof.

The second inequality follows easily by mollifying. We will show the duality formula for Hamilton–Jacobi subsolutions in H1H^{1}. For this purpose we use the Rockefellar–Fenchel duality formula from Theorem 3.3.

Let EE be the real Banach space

H1([0,1];𝒜(h))×L2([0,1];𝒜,𝒥(h)).H^{1}([0,1];\mathcal{H}^{(h)}_{\mathcal{A}})\times L^{2}([0,1];\mathcal{H}^{(h)}_{\mathcal{A},\mathcal{J}}).

By the theory of linear ordinary differential equations, the map

H1([0,1];𝒜(h))𝒜(h)×L2([0,1];𝒜(h)),A(A(0),A˙+εA)H^{1}([0,1];\mathcal{H}^{(h)}_{\mathcal{A}})\to\mathcal{H}^{(h)}_{\mathcal{A}}\times L^{2}([0,1];\mathcal{H}^{(h)}_{\mathcal{A}}),\,A\mapsto(A(0),\dot{A}+\varepsilon\mathscr{L}A)

is a linear isomorphism.

Thus the dual space EE^{\ast} can be isomorphically identified with

𝒜(h)×L2([0,1];𝒜(h))×L2([0,1];𝒜,𝒥(h))\displaystyle\mathcal{H}^{(h)}_{\mathcal{A}}\times L^{2}([0,1];\mathcal{H}^{(h)}_{\mathcal{A}})\times L^{2}([0,1];\mathcal{H}^{(h)}_{\mathcal{A},\mathcal{J}})

via the dual pairing

(A,𝐕),(B,C,𝐖)\displaystyle\langle(A,\mathbf{V}),(B,C,\mathbf{W})\rangle =τ(A(0)B)+01τ((A˙(t)+εA(t))C(t))𝑑t\displaystyle=\tau(A(0)B)+\int_{0}^{1}\tau((\dot{A}(t)+\varepsilon\mathscr{L}A(t))C(t))\,dt
+01𝐕(t),𝐖(t)𝒜,𝒥𝑑t.\displaystyle\quad+\int_{0}^{1}\langle\mathbf{V}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt.

Define functionals F,G:E{}F,G\colon E\longrightarrow\mathbb{R}\cup\{\infty\} by

F(A,𝐕)\displaystyle F(A,\mathbf{V}) ={τ(A(1)ρ1)+τ(A(0)ρ0)if 𝐕=A,otherwise,\displaystyle=\begin{cases}-\tau(A(1)\rho_{1})+\tau(A(0)\rho_{0})&\text{if }\mathbf{V}=\nabla A,\\ \infty&\text{otherwise},\end{cases}
G(A,𝐕)\displaystyle G(A,\mathbf{V}) ={0if (A,𝐕)𝒟,otherwise.\displaystyle=\begin{cases}0&\text{if }(A,\mathbf{V})\in\mathcal{D},\\ \infty&\text{otherwise}.\end{cases}

Here

𝒟={(A,𝐕):τ((A˙(t)+εA(t))ρ)+12𝐕(t)ρ20 for all t[0,1],ρ𝔖(𝒜)}.\mathcal{D}=\{(A,\mathbf{V}):\tau((\dot{A}(t)+\varepsilon\mathscr{L}A(t))\rho)+\frac{1}{2}\lVert\mathbf{V}(t)\rVert_{\rho}^{2}\leq 0\text{ for all }t\in[0,1],\rho\in\mathfrak{S}(\mathcal{A})\}.

It is easy to see that FF and GG are convex. Moreover, for A0(t)=t1𝒜A_{0}(t)=-t1_{\mathcal{A}} and 𝐕0=0\mathbf{V}_{0}=0 we have 𝐕0=A0\mathbf{V}_{0}=\nabla A_{0}, hence F(A0,𝐕0)=0F(A_{0},\mathbf{V}_{0})=0, and

τ((A˙0(t)+εA0(t))ρ)+12𝐕0(t)ρ2=1\tau((\dot{A}_{0}(t)+\varepsilon\mathscr{L}A_{0}(t))\rho)+\frac{1}{2}\lVert\mathbf{V}_{0}(t)\rVert_{\rho}^{2}=-1

for all t[0,1],ρ𝔖(A)t\in[0,1],\;\rho\in\mathfrak{S}(A), hence G(A0,𝐕0)=0G(A_{0},\mathbf{V}_{0})=0. Furthermore, GG is clearly continuous at (A0,𝐕0)(A_{0},\mathbf{V}_{0}).

Moreover,

sup(A,𝐕)E(F(A,𝐕)G(A,𝐕))=supA𝖧𝖩𝖡Λ,ε(ρ0,ρ1)(τ(A(1)ρ1)τ(A(0)ρ0)).\sup_{(A,\mathbf{V})\in E}(-F(A,\mathbf{V})-G(A,\mathbf{V}))=\sup_{A\in\mathsf{HJB}_{\vec{\Lambda},\varepsilon}(\rho_{0},\rho_{1})}(\tau(A(1)\rho_{1})-\tau(A(0)\rho_{0})).

Let us calculate the Legendre transforms of FF and GG, keeping in mind the identification of EE^{\ast}. For FF we obtain

F(B,C,𝐖)\displaystyle F^{\ast}(B,C,\mathbf{W}) =sup(A,𝐕)E{(A,𝐕),(B,C,𝐖)F(A,𝐕)}\displaystyle=\sup_{(A,\mathbf{V})\in E}\bigg{\{}\langle(A,\mathbf{V}),(B,C,\mathbf{W})\rangle-F(A,\mathbf{V})\bigg{\}}
=supA{τ(A(0)B)+01τ((A˙(t)+εA(t))C(t))dt\displaystyle=\sup_{A}\bigg{\{}\tau(A(0)B)+\int_{0}^{1}\tau((\dot{A}(t)+\varepsilon\mathscr{L}A(t))\,C(t))\,dt
+01A(t),𝐖(t)A,𝒥dt+τ(A(1)ρ1)τ(A(0)ρ0)}.\displaystyle\quad+\int_{0}^{1}\langle\nabla A(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{A,\mathcal{J}}}\,dt+\tau(A(1)\rho_{1})-\tau(A(0)\rho_{0})\bigg{\}}.

Since the last expression is homogeneous in AA, we have F(B,C,𝐖)=F^{\ast}(B,C,\mathbf{W})=\infty unless

τ(A(1)ρ1)+τ(A(0)(ρ0B))\displaystyle-\tau(A(1)\rho_{1})+\tau(A(0)(\rho_{0}-B)) =01τ((A˙(t)+εA(t))C(t))𝑑t\displaystyle=\int_{0}^{1}\tau((\dot{A}(t)+\varepsilon\mathscr{L}A(t))\,C(t))\,dt
+01A(t),𝐖(t)𝒜,𝒥𝑑t\displaystyle\quad+\int_{0}^{1}\langle\nabla A(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt

for AH1([0,1];𝒜(h))A\in H^{1}([0,1];\mathcal{H}^{(h)}_{\mathcal{A}}).

This implies C0=(ρ0B)C_{0}=-(\rho_{0}-B) and C1=ρ1C_{1}=-\rho_{1} and

C˙(t)+div𝐖(t)=εC(t).\dot{C}(t)+\operatorname{div}\mathbf{W}(t)=\varepsilon\mathscr{L}^{\dagger}C(t).

Thus

F(B,C,𝐖)={0if (C,𝐖)𝖢𝖤ε′′(ρ0B,ρ1),otherwise.F^{\ast}(B,C,\mathbf{W})=\begin{cases}0&\text{if }(-C,-\mathbf{W})\in\mathsf{CE}_{\varepsilon}^{\prime\prime}(\rho_{0}-B,\rho_{1}),\\ \infty&\text{otherwise}.\end{cases}

Here 𝖢𝖤ε′′(ρ0B,ρ1)\mathsf{CE}_{\varepsilon}^{\prime\prime}(\rho_{0}-B,\rho_{1}) denotes the set of all pairs (X,𝐔)H1((0,1);𝒜(h))×L2((0,1);𝒜,𝒥(h))(X,\mathbf{U})\in H^{1}((0,1);\mathcal{H}^{(h)}_{\mathcal{A}})\times L^{2}((0,1);\mathcal{H}^{(h)}_{\mathcal{A},\mathcal{J}}) satisfying X(0)=ρ0BX(0)=\rho_{0}-B, X(1)=ρ1X(1)=\rho_{1} and

X˙(t)+div𝑼(t)=εX(t).\dot{X}(t)+\operatorname{div}\bm{U}(t)=\varepsilon\mathscr{L}^{\dagger}X(t).

The difference to the definitions of 𝖢𝖤\mathsf{CE} (or 𝖢𝖤\mathsf{CE}^{\prime}) and 𝖢𝖤′′\mathsf{CE}^{\prime\prime} is that we do not make any positivity or normalization constraints. Note however that if (X,𝑼)𝖢𝖤′′(ρ0B,ρ1)(X,\bm{U})\in\mathsf{CE}^{\prime\prime}(\rho_{0}-B,\rho_{1}), then

ddtτ(X(t))=τ(εX(t)div𝑼(t))=0\frac{d}{dt}\tau(X(t))=\tau(\varepsilon\mathscr{L}^{\dagger}X(t)-\operatorname{div}\bm{U}(t))=0

so that τ(X(t))=τ(ρ1)=1\tau(X(t))=\tau(\rho_{1})=1 (and τ(B)=0\tau(B)=0).

Now let us turn to the Legendre transform of GG. We have

G(B,C,𝐖)\displaystyle G^{\ast}(B,C,\mathbf{W}) =sup(A,𝐕)E{(A,𝐕),(B,C,𝐖)G(A,𝐕)}\displaystyle=\sup_{(A,\mathbf{V})\in E}\bigg{\{}\langle(A,\mathbf{V}),(B,C,\mathbf{W})\rangle-G(A,\mathbf{V})\bigg{\}}
=sup(A,𝐕)𝒟{τ(A(0)B)+01τ((A˙(t)+εA(t))C(t))dt\displaystyle=\sup_{(A,\mathbf{V})\in\mathcal{D}}\bigg{\{}\tau(A(0)B)+\int_{0}^{1}\tau((\dot{A}(t)+\varepsilon\mathscr{L}A(t))C(t))\,dt
+01𝐕(t),𝐖(t)𝒜,𝒥)dt}.\displaystyle\qquad\qquad+\int_{0}^{1}\langle\mathbf{V}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}})\,dt\bigg{\}}.

Since (A,𝐕)𝒟(A,\mathbf{V})\in\mathcal{D} implies (A+X,𝐕)𝒟(A+X,\mathbf{V})\in\mathcal{D} for all X𝒜X\in\mathcal{A}, we have G(B,C,𝐕)=G^{\ast}(B,C,\mathbf{V})=\infty unless B=0B=0. Moreover, it follows from the definition of 𝒟\mathcal{D} that G(0,C,𝐖)=G^{\ast}(0,C,\mathbf{W})=\infty unless C(t)0C(t)\geq 0 for a.e. t[0,1]t\in[0,1].

For B=0B=0 we have

G(0,C,𝐖)\displaystyle G^{\ast}(0,C,\mathbf{W}) =sup(A,𝐕)𝒟{01(τ((A˙(t)+εA(t))C(t))+𝐕(t),𝐖(t)𝒜,𝒥)𝑑t}\displaystyle=\sup_{(A,\mathbf{V})\in\mathcal{D}}\left\{\int_{0}^{1}(\tau((\dot{A}(t)+\varepsilon\mathscr{L}A(t))C(t))+\langle\mathbf{V}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}})\,dt\right\}
sup(A,𝐕)𝒟{0112𝐕(t)C(t)2dt\displaystyle\leq\sup_{(A,\mathbf{V})\in\mathcal{D}}\bigg{\{}-\int_{0}^{1}\frac{1}{2}\lVert\mathbf{V}(t)\rVert_{C(t)}^{2}\,dt
+01[C(t)]Λ1/2𝐕(t),[C(t)]Λ1/2𝐖(t)𝒜,𝒥dt}\displaystyle\qquad\qquad+\int_{0}^{1}\langle[C(t)]_{\vec{\Lambda}}^{1/2}\mathbf{V}(t),[C(t)]_{\vec{\Lambda}}^{-1/2}\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt\bigg{\}}
1201[C(t)]Λ1𝐖(t),𝐖(t)𝒜,𝒥𝑑t.\displaystyle\leq\frac{1}{2}\int_{0}^{1}\langle[C(t)]_{\vec{\Lambda}}^{-1}\mathbf{W}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt.

We will show next that the inequalities are in fact equalities. Let Cδ=C+δC^{\delta}=C+\delta and 𝐕δ(t)=[C(t)δ]1𝐖(t)\mathbf{V}^{\delta}(t)=[C(t)^{\delta}]^{-1}\mathbf{W}(t). Moreover, let fj=fΛjf_{j}=f_{\Lambda_{j}} with the notation from Lemma 3.4. Since

𝒜(h),Bj𝒥(dfj(Cδ(t))B)𝐕jδ(t),𝐕jδ(t)𝒜\mathcal{H}^{(h)}_{\mathcal{A}}\to\mathbb{R},\,B\mapsto\sum_{j\in\mathcal{J}}\langle(df_{j}(C^{\delta}(t))B)\mathbf{V}^{\delta}_{j}(t),\mathbf{V}^{\delta}_{j}(t)\rangle_{\mathfrak{H}_{\mathcal{A}}}

is a bounded linear map that depends continuously on tt, there exists a unique continuous map Xδ:[0,1]𝒜(h)X^{\delta}\colon[0,1]\to\mathcal{H}^{(h)}_{\mathcal{A}} such that

τ(BXδ(t))=j𝒥(dfj(Cδ(t))B)𝐕jδ(t),𝐕jδ(t)𝒜\tau(BX^{\delta}(t))=\sum_{j\in\mathcal{J}}\langle(df_{j}(C^{\delta}(t))B)\mathbf{V}^{\delta}_{j}(t),\mathbf{V}^{\delta}_{j}(t)\rangle_{\mathfrak{H}_{\mathcal{A}}}

for every B𝒜(h)B\in\mathcal{H}^{(h)}_{\mathcal{A}} and t[0,1]t\in[0,1].

Let

Aδ:[0,1]𝒜h,Aδ(t)=120tXδ(s)𝑑s.A^{\delta}\colon[0,1]\to\mathcal{A}_{h},\,A^{\delta}(t)=-\frac{1}{2}\int_{0}^{t}X^{\delta}(s)\,ds.

We claim that (Aδ,𝐕δ)𝒟(A^{\delta},\mathbf{V}^{\delta})\in\mathcal{D}. Indeed,

τ(A˙δ(t)ρ)\displaystyle\tau(\dot{A}^{\delta}(t)\rho) =12j𝒥(dfj(Cδ(t))ρ)𝐕jδ(t),𝐕jδ(t)𝒜\displaystyle=-\frac{1}{2}\sum_{j\in\mathcal{J}}\langle(df_{j}(C^{\delta}(t))\rho)\mathbf{V}^{\delta}_{j}(t),\mathbf{V}^{\delta}_{j}(t)\rangle_{\mathfrak{H}_{\mathcal{A}}}
12j𝒥[ρ]Λj𝐕jδ(t),𝐕jδ(t)𝒜\displaystyle\leq-\frac{1}{2}\sum_{j\in\mathcal{J}}\langle[\rho]_{\Lambda_{j}}\mathbf{V}^{\delta}_{j}(t),\mathbf{V}^{\delta}_{j}(t)\rangle_{\mathfrak{H}_{\mathcal{A}}}
=12𝐕δ(t)ρ2,\displaystyle=-\frac{1}{2}\lVert\mathbf{V}^{\delta}(t)\rVert_{\rho}^{2},

where the inequality follows from Lemma 3.4. Note that we have equality for ρ=Cδ(t)\rho=C^{\delta}(t).

In particular, for ρ=C(t)\rho=C(t) we obtain

τ(A˙δ(t)C(t))+𝐕δ(t),𝐖(t)𝒜,𝒥12[C(t)]Λ1𝐖(t),𝐖(t)𝒜,𝒥.\tau(\dot{A}^{\delta}(t)C(t))+\langle\mathbf{V}^{\delta}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\leq\frac{1}{2}\langle[C(t)]_{\vec{\Lambda}}^{-1}\mathbf{W}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}.

On the other hand,

τ(A˙δ(t)C(t))\displaystyle\tau(\dot{A}^{\delta}(t)C(t)) =12j𝒥(dfj(Cδ(t))(Cδ(t)δ))𝐕jδ(t),𝐕jδ(t)A\displaystyle=-\frac{1}{2}\sum_{j\in\mathcal{J}}\langle(df_{j}(C^{\delta}(t))(C^{\delta}(t)-\delta))\mathbf{V}_{j}^{\delta}(t),\mathbf{V}_{j}^{\delta}(t)\rangle_{\mathfrak{H}_{A}}
12[Cδ(t)]Λ𝐕δ(t),𝐕δ(t)𝒜,𝒥+12[δ]Λ𝐕δ(t),𝐕δ(t)𝒜,𝒥\displaystyle\geq-\frac{1}{2}\langle[C^{\delta}(t)]_{\vec{\Lambda}}\mathbf{V}^{\delta}(t),\mathbf{V}^{\delta}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}+\frac{1}{2}\langle[\delta]_{\vec{\Lambda}}\mathbf{V}^{\delta}(t),\mathbf{V}^{\delta}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}
12𝐕δ(t),𝐖(t)𝒜,𝒥,\displaystyle\geq-\frac{1}{2}\langle\mathbf{V}^{\delta}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}},

where we again used Lemma 3.4 for the first inequality.

Put together, we have

12[Cδ(t)]Λ1𝐖(t),𝐖(t)𝒜,𝒥\displaystyle\frac{1}{2}\langle[C^{\delta}(t)]_{\vec{\Lambda}}^{-1}\mathbf{W}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}} τ(A˙δ(t)C(t))+𝐕δ(t),𝐖(t)𝒜,𝒥\displaystyle\leq\tau(\dot{A}^{\delta}(t)C(t))+\langle\mathbf{V}^{\delta}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}
12[C(t)]Λ1𝐖(t),𝐖(t)𝒜,𝒥,\displaystyle\leq\frac{1}{2}\langle[C(t)]_{\vec{\Lambda}}^{-1}\mathbf{W}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}},

and

01(τ(A˙δ(t)C(t))+𝐕δ(t),𝐖(t)𝒜,𝒥)𝑑t1201[C(t)]Λ1𝐖(t),𝐖(t)𝒜,𝒥𝑑t\displaystyle\int_{0}^{1}(\tau(\dot{A}^{\delta}(t)C(t))+\langle\mathbf{V}^{\delta}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}})\,dt\to\frac{1}{2}\int_{0}^{1}\langle[C(t)]_{\vec{\Lambda}}^{-1}\mathbf{W}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt

follows from the monotone convergence theorem.

Hence

G(0,C,𝐖)=1201[C(t)]Λ1𝐖(t),𝐖(t)𝒜,𝒥𝑑tG^{\ast}(0,C,\mathbf{W})=\frac{1}{2}\int_{0}^{1}\langle[C(t)]_{\vec{\Lambda}}^{-1}\mathbf{W}(t),\mathbf{W}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt

if C(t)0C(t)\geq 0 for a.e. t[0,1]t\in[0,1]. Together with the formula for FF^{\ast}, we obtain

inf(B,C,𝐖)E(F(B,C,𝐖)+G(B,C,𝐖))\displaystyle\quad\inf_{(B,C,\mathbf{W})\in E^{\ast}}(F^{\ast}(-B,-C,-\mathbf{W})+G^{\ast}(B,C,\mathbf{W}))
=inf(ρ,𝐕)𝖢𝖤ε(ρ0,ρ1)1201[ρ(t)]Λ1𝐕(t),𝐕(t)𝒜,𝒥𝑑t\displaystyle=\inf_{(\rho,\mathbf{V})\in\mathsf{CE}_{\varepsilon}^{\prime}(\rho_{0},\rho_{1})}\frac{1}{2}\int_{0}^{1}\langle[\rho(t)]_{\vec{\Lambda}}^{-1}\mathbf{V}(t),\mathbf{V}(t)\rangle_{\mathfrak{H}_{\mathcal{A},\mathcal{J}}}\,dt
=12𝒲Λ,ε2(ρ0,ρ1),\displaystyle=\frac{1}{2}\mathcal{W}_{\vec{\Lambda},\varepsilon}^{2}(\rho_{0},\rho_{1}),

where the last equality follows from Proposition 1.2.

An application of the Rockefellar–Fenchel theorem yields the desired conclusion. ∎

References

  • [AES16] Luigi Ambrosio, Matthias Erbar and Giuseppe Savaré “Optimal transport, Cheeger energies and contractivity of dynamic transport distances in extended spaces” In Nonlinear Anal. 137, 2016, pp. 77–134 DOI: 10.1016/j.na.2015.12.006
  • [Ali76] Robert Alicki “On the detailed balance condition for non-Hamiltonian systems” In Rep. Math. Phys. 10.2, 1976, pp. 249–258 DOI: 10.1016/0034-4877(76)90046-X
  • [BB00] Jean-David Benamou and Yann Brenier “A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem” In Numer. Math. 84.3, 2000, pp. 375–393 DOI: 10.1007/s002110050002
  • [BGL01] Sergey G. Bobkov, Ivan Gentil and Michel Ledoux “Hypercontractivity of Hamilton-Jacobi equations” In J. Math. Pures Appl. (9) 80.7, 2001, pp. 669–696 DOI: 10.1016/S0021-7824(01)01208-9
  • [BL21] Simon Becker and Wuchen Li “Quantum Statistical Learning via Quantum Wasserstein Natural Gradient” In Journal of Statistical Physics 182.1 Springer ScienceBusiness Media LLC, 2021 DOI: 10.1007/s10955-020-02682-1
  • [BV20] Yann Brenier and Dmitry Vorotnikov “On optimal transport of matrix-valued measures” In SIAM J. Math. Anal. 52.3, 2020, pp. 2849–2873 DOI: 10.1137/19M1274857
  • [CGT18] Yongxin Chen, Tryphon T. Georgiou and Allen Tannenbaum “Matrix optimal mass transport: a quantum mechanical approach” In IEEE Trans. Automat. Control 63.8, 2018, pp. 2612–2619 DOI: 10.1109/tac.2017.2767707
  • [Che+20] Yongxin Chen, Wilfrid Gangbo, Tryphon T. Georgiou and Allen Tannenbaum “On the matrix Monge-Kantorovich problem” In European J. Appl. Math. 31.4, 2020, pp. 574–600 DOI: 10.1017/s0956792519000172
  • [CM14] Eric A. Carlen and Jan Maas “An analog of the 2-Wasserstein metric in non-commutative probability under which the fermionic Fokker-Planck equation is gradient flow for the entropy” In Comm. Math. Phys. 331.3, 2014, pp. 887–926 DOI: 10.1007/s00220-014-2124-8
  • [CM17] Eric A. Carlen and Jan Maas “Gradient flow and entropy inequalities for quantum Markov semigroups with detailed balance” In J. Funct. Anal. 273.5, 2017, pp. 1810–1869 DOI: 10.1016/j.jfa.2017.05.003
  • [CM20] Eric A. Carlen and Jan Maas “Non-commutative Calculus, Optimal Transport and Functional Inequalities in Dissipative Quantum Systems” In J. Stat. Phys. 178.2, 2020, pp. 319–378 DOI: 10.1007/s10955-019-02434-w
  • [DPT21] Giacomo De Palma and Dario Trevisan “Quantum Optimal Transport with Quantum Channels” In Annales Henri Poincaré Springer ScienceBusiness Media LLC, 2021 DOI: 10.1007/s00023-021-01042-3
  • [DR20] N. Datta and C. Rouzé “Relating Relative Entropy, Optimal Transport and Fisher Information: A Quantum HWI Inequality” In Ann. Henri Poincaré, 2020 DOI: 10.1007/s00023-020-00891-8
  • [Duv20] Rocco Duvenhage “Quadratic Wasserstein metrics for von Neumann algebras via transport plans” In arXiv e-prints 2012.03564, 2020
  • [EMW19] Matthias Erbar, Jan Maas and Melchior Wirth “On the geometry of geodesics in discrete optimal transport” In Calc. Var. Partial Differential Equations 58.1, 2019, pp. Art. 19, 19 DOI: 10.1007/s00526-018-1456-1
  • [GLM19] Wilfrid Gangbo, Wuchen Li and Chenchen Mou “Geodesics of minimal length in the set of probability measures on graphs” In ESAIM Control Optim. Calc. Var. 25, 2019, pp. Paper No. 78, 36 DOI: 10.1051/cocv/2018052
  • [GMP16] François Golse, Clément Mouhot and Thierry Paul “On the mean field and classical limits of quantum mechanics” In Comm. Math. Phys. 343.1, 2016, pp. 165–205 DOI: 10.1007/s00220-015-2485-7
  • [Han97] Frank Hansen “Operator convex functions of several variables” In Publ. Res. Inst. Math. Sci. 33.3, 1997, pp. 443–463 DOI: 10.2977/prims/1195145324
  • [Hor18] David F. Hornshaw L2L^{2}-Wasserstein distances of tracial WW^{*}-algebras and their disintegration problem” In arXiv e-prints 1806.01073, 2018
  • [KA80] Fumio Kubo and Tsuyoshi Ando “Means of positive linear operators” In Math. Ann. 246.3, 1980, pp. 205–224 DOI: 10.1007/BF01371042
  • [Kan42] Leonid Kantorovitch “On the translocation of masses” In C. R. (Doklady) Acad. Sci. URSS (N.S.) 37, 1942, pp. 199–201
  • [MM17] Markus Mittnenzweig and Alexander Mielke “An entropic gradient structure for Lindblad equations and couplings of quantum systems to macroscopic models” In J. Stat. Phys. 167.2, 2017, pp. 205–233 DOI: 10.1007/s10955-017-1756-4
  • [NGT15] Lipeng Ning, Tryphon T. Georgiou and Allen Tannenbaum “On matrix-valued Monge-Kantorovich optimal mass transport” In IEEE Trans. Automat. Control 60.2, 2015, pp. 373–382 DOI: 10.1109/TAC.2014.2350171
  • [OV00] Felix Otto and Cédric Villani “Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality” In J. Funct. Anal. 173.2, 2000, pp. 361–400 DOI: 10.1006/jfan.1999.3557
  • [Pal+20] Giacomo De Palma, Milad Marvian, Dario Trevisan and Seth Lloyd “The quantum Wasserstein distance of order 1” In arXiv e-prints 2009.04469, 2020
  • [Pey+19] Gabriel Peyré, Lénaïc Chizat, François-Xavier Vialard and Justin Solomon “Quantum entropic regularization of matrix-valued optimal transport” In European J. Appl. Math. 30.6, 2019, pp. 1079–1102 DOI: 10.1017/s0956792517000274
  • [RD19] Cambyse Rouzé and Nilanjana Datta “Concentration of quantum states from quantum functional and transportation cost inequalities” In J. Math. Phys. 60.1, 2019, pp. 012202, 22 DOI: 10.1063/1.5023210
  • [Vil03] Cédric Villani “Topics in optimal transportation” 58, Graduate Studies in Mathematics American Mathematical Society, Providence, RI, 2003, pp. xvi+370 DOI: 10.1007/b12016
  • [Vil09] Cédric Villani “Optimal transport. Old an new” 338, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] Springer-Verlag, Berlin, 2009, pp. xxii+973 DOI: 10.1007/978-3-540-71050-9
  • [Wir18] Melchior Wirth “A Noncommutative Transport Metric and Symmetric Quantum Markov Semigroups as Gradient Flows of the Entropy” In arXiv e-prints 1808.05419, 2018
  • [WZ20] Melchior Wirth and Haonan Zhang “Complete gradient estimates of quantum Markov semigroups” In arXiv e-prints 2007.13506, 2020