This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Minimum energy density steering of linear systems with Gromov-Wasserstein terminal cost

Kohei Morimoto and Kenji Kashima The authors are with the Graduate School of Informatics, Kyoto University, Kyoto, Japan kohei.morimoto.73r@st.kyoto-u.ac.jp; kk@i.kyoto-u.ac.jp. This work was supported by JSPS KAKENHI Grant Number JP21H04875 and the joint project of Kyoto University and Toyota Motor Corporation, titled “Advanced Mathematical Science for Mobility Society”.
Abstract

In this paper, we newly formulate and solve the optimal density control problem with Gromov-Wasserstein (GW) terminal cost in discrete-time linear Gaussian systems. Differently from the Wasserstein or Kullback-Leibler distances employed in the existing works, the GW distance quantifies the difference in shapes of the distribution, which is invariant under translation and rotation. Consequently, our formulation allows us to find small energy inputs that achieve the desired shape of the terminal distribution, which has practical applications, e.g., robotic swarms. We demonstrate that the problem can be reduced to a Difference of Convex (DC) programming, which is efficiently solvable through the DC algorithm. Through numerical experiments, we confirm that the state distribution reaches the terminal distribution that can be realized with the minimum control energy among those having the specified shape.

I Introduction

Optimal density control is defined as the problem of controlling the probability distribution of state variables to the desired distribution in a dynamic system. Promising applications of optimal density control include systems in which it is important to manage errors in the state, such as quality control and aircraft control, as well as quantum systems in which the distribution of the state itself is the object of control [1].

The problem addressed in this paper is a variant of the (finite-time) covariance steering problem for discrete-time linear Gaussian systems. Among the long history of this line of research [2, 3], the most related recent works are as follows: hard constraint formulations, as seen in [4, 5, 6, 7], where the terminal state distribution is enforced as a constraint; and soft constraint formulations, as seen in [8, 9, 10], where the Wasserstein distance between the terminal distribution and the target distribution is incorporated as a cost. In particular, Balci et al. [10] presented an optimal distributional control problem for discrete-time linear Gaussian systems using Wasserstein distance as the terminal cost, formulated as semidefinite programming (SDP), a form of convex programming, to derive globally optimal control policies.

To motivate the present work, let us consider the state distribution as an ensemble of particles or a multi-robotic swarm [11]. In such applications, a particular shape of the formation is required to be achieved, but its location and orientation are often irrelevant. For example, they may seek to align in a single row in a two-dimensional region (See Fig. 3 below), or only the configuration may be specified based on inter-agent distance [12]. The aforementioned formulations can address the realization of the configuration with a fixed orientation but cannot address the optimization with respect to the rotation. To tackle this issue, we propose a novel density control problem incorporating the Gromov-Wasserstein distance (GW distance) as the terminal cost [13]. The GW distance is the distance between probability distributions and can measure the closeness of the shape of the probability distributions. By integrating the GW distance between the state and target distributions into the terminal cost, we can formulate the problem of controlling the shape of the state distribution. This problem can be viewed as a simultaneous optimization of the dynamical steering and the rotation of the target shape, which clearly contrasts the existing formulations.

In this study, we focus on scenarios where the initial and target distributions are Gaussian and seek the optimal control policy among linear feedback control laws. While computing the GW distance between arbitrary distributions is challenging, it has recently been shown that the Gaussian Gromov-Wasserstein (GGW) distance, which is a relaxation of the GW distance for normal distributions, can be easily calculated [14]. We show that the optimal density control problem with GW terminal cost can be formulated as a difference of convex (DC) programming problem. We solve the problem by the DC algorithm (DCA) [15], a technique for solving DC programming problems through iterative convex relaxation. Remarkably, the convexified problem is transformed into a SDP form, which can be efficiently solved using standard convex programming solvers.

The rest of the paper is organized as follows. In Section II, we introduce the concept of the GW distance and present the optimal density steering problem with the GW distance as the terminal cost. Section 1 discusses the formulation of the problem as a DC programming problem, highlighting the objective function’s nature as a difference of convex function and deriving the convexified sub-problem used in the DC algorithm. The numerical simulations are presented in Section IV. Finally, we conclude our paper in Section V.

Notation      Let 𝕊n+\mathbb{S}_{n}^{+} and 𝕊n++\mathbb{S}_{n}^{++} denote the sets of nn-dimensional positive semidefinite matrices and positive definite matrices, respectively. Let O(n)O(n) denote the set of nn-dimensional orthogonal matrices. For matrices, F\left\|\cdot\right\|_{F} denotes the Frobenius norm. For a convex function ff, f(x)\partial f(x) denotes the set of subgradients of ff at xx. Let 𝒫(𝒳)\mathcal{P}(\mathcal{X}) denote the set of all probability distributions over 𝒳\mathcal{X}. Let 𝒩n(μ,Σ)\mathcal{N}_{n}(\mu,\Sigma) denote the multivariate normal distribution with mean μn\mu\in\mathbb{R}^{n} and covariance Σ𝕊n+\Sigma\in{\color[rgb]{0,0,0}{\mathbb{S}_{n}^{+}}}. Let 𝒩n\mathscr{N}_{n} denote the set of all nn-dimensional multivariate normal distributions.

II Problem Setting

II-A Gromov-Wasserstein distance

The optimal transport distance is generally defined as the minimized transport cost of transporting one probability distribution to another probability distribution. The GW distance is the distance between probability distributions and is a variant of optimal transport distance, similar to the Wasserstein distance. Given two metric spaces 𝒳,𝒴\mathcal{X},\mathcal{Y}, the set of transports Π\Pi between probability distributions μ𝒫(𝒳)\mu\in\mathcal{P}(\mathcal{X}) and ν𝒫(𝒴)\nu\in\mathcal{P}(\mathcal{Y}) is defined by

Π(μ,ν):={π(x,y)|xπ(x,y)dx=ν(y),yπ(x,y)dy=μ(x)}.\Pi(\mu,\nu):=\{\pi(x,y)|\int_{x}\pi(x,y)dx=\nu(y),\\ \int_{y}\pi(x,y)dy=\mu(x)\}. (1)

Each element π(x,y)\pi(x,y) in Π(μ,ν)\Pi(\mu,\nu) represents how the weight μ(x)\mu(x) at xx is transported to yy, with the condition that xπ(x,y)𝑑x=ν(y)\int_{x}\pi(x,y)dx=\nu(y) ensuring that the transported destination becomes ν(y)\nu(y). The GW distance is defined by

GW2(μ,ν):=infπΠ(μ,ν)(xx𝒳yy𝒴)2π(x,y)π(x,y)dxdydxdy,GW^{2}(\mu,\nu):=\\ \inf_{\pi\in\Pi(\mu,\nu)}\int\int\left(\left\|x-x^{\prime}\right\|_{\mathcal{X}}-\left\|y-y^{\prime}\right\|_{\mathcal{Y}}\right)^{2}\\ \pi(x,y)\pi(x^{\prime},y^{\prime})dxdydx^{\prime}dy^{\prime}, (2)

where xx𝒳\left\|x-x^{\prime}\right\|_{\mathcal{X}} and yy𝒴\left\|y-y^{\prime}\right\|_{\mathcal{Y}} represent the norms in the spaces 𝒳\mathcal{X} and 𝒴\mathcal{Y}, respectively. The GW distance is small when points that are close (resp. farther apart) before transportation are brought closer together (resp. farther apart) after transportation. Conversely, the GW distance increases when points that were initially close are moved farther apart after transportation. Therefore, this definition quantifies the shape difference between two probabilistic distributions. For comparison, recall that the Wasserstein distance is defined as

W2(μ,ν):=infπΠ(μ,ν)d(x,y)2π(x,y)𝑑x𝑑y,W^{2}(\mu,\nu):=\inf_{\pi\in\Pi(\mu,\nu)}\int\int d(x,y)^{2}\pi(x,y)dxdy, (3)

where 𝒳=𝒴\mathcal{X}=\mathcal{Y} and d(,)d(\cdot,\cdot) is a suitable distance on 𝒳\mathcal{X}. While the Wasserstein distance is sensitive to the absolute positions or orientations of the distributions, the GW distance is invariant under isometric transformations such as translations and rotations.

The GW distance involves a non-convex quadratic program over transport π\pi, making it challenging to compute the GW distance between arbitrary probability distributions. Recently, it has been shown that the Gaussian Gromov-Wasserstein (GGW) distance, where the transport is constrained to be Gaussian distribution only, can be explicitly expressed in terms of the parameters of the normal distributions [14]. Specifically, the GGW between Gaussian distributions μ𝒩m\mu\in\mathscr{N}_{m} and ν𝒩n\nu\in\mathscr{N}_{n} is defined by

GGW2(μ,ν):=infπΠ(μ,ν)𝒩m+n(xxyy)2π(x,y)π(x,y)dxdydxdyGGW^{2}(\mu,\nu):=\\ \inf_{\pi\in\Pi(\mu,\nu)\cap\mathscr{N}_{m+n}}\int\int\left(\left\|x-x^{\prime}\right\|-\left\|y-y^{\prime}\right\|\right)^{2}\\ \pi\left(x,y\right)\pi(x^{\prime},y^{\prime})dxdydx^{\prime}dy^{\prime} (4)

where Π(μ,ν)𝒩m+n\Pi(\mu,\nu)\cap\mathscr{N}_{m+n} represents the restriction of the transport to the (m+nm+n)-dimensional Gaussian distribution 𝒩m+n\mathscr{N}_{m+n}. For μ=𝒩m(μ0,Σ0),ν=𝒩n(μ1,Σ1)\mu=\mathcal{N}_{m}(\mu_{0},\Sigma_{0}),\nu=\mathcal{N}_{n}(\mu_{1},\Sigma_{1}), it holds that

GGW2(μ,ν)=4(tr(Σ0)tr(Σ1))2+8D0D1F2,GGW^{2}(\mu,\nu)=\\ 4\left(\operatorname{tr}(\Sigma_{0})-\operatorname{tr}(\Sigma_{1})\right)^{2}+8\left\|D_{0}-D_{1}\right\|_{F}^{2}, (5)

where D0,D1D_{0},D_{1} are the diagonal matrices with the eigenvalues of Σ0,Σ1\Sigma_{0},\Sigma_{1} sorted in descending order, and if mnm\neq n, the missing elements are filled with zeros.

II-B Optimal density steering with Gromov-Wasserstein terminal cost

Let nxn_{x} be the dimension of the state space and nun_{u} the dimension of the input, and consider the following discrete-time linear Gaussian system.

xk+1\displaystyle x_{k+1} =Axk+Buk+wk\displaystyle=Ax_{k}+Bu_{k}+w_{k} (6a)
x0\displaystyle x_{0} =𝒩(0,Σ0)\displaystyle=\mathcal{N}(0,\Sigma_{0}) (6b)
wk\displaystyle w_{k} 𝒩(0,Wk)\displaystyle\sim\mathcal{N}\left(0,W_{k}\right) (6c)

Here, the covariance matrix of the initial Gaussian distribution is Σ0𝕊nx++\Sigma_{0}\in\mathbb{S}_{n_{x}}^{++} and that of the noise is Wk𝕊nx+W_{k}\in\mathbb{S}_{n_{x}}^{+}. For control input uku_{k}, We use a stochastic linear control policy as

uk(x)=𝒩(Kkx,Qk),\displaystyle u_{k}(x)=\mathcal{N}(K_{k}x,Q_{k}), (7)

where KkK_{k} is feedback gain and QkOQ_{k}\succeq O is covariance of Gaussian distribution. We consider the problem of minimizing the sum of the control costs and the Gromov-Wasserstein distance between the terminal distribution ρN=𝒩(μN,ΣN)\rho_{N}=\mathcal{N}(\mu_{N},\Sigma_{N}) (ΣN𝕊nx++\Sigma_{N}\in\mathbb{S}_{n_{x}}^{++}) and the target distribution ρr=𝒩(0,Σr)\rho_{r}=\mathcal{N}(0,\Sigma_{r}) (Σr𝕊nx+\Sigma_{r}\in\mathbb{S}_{n_{x}}^{+}). The objective function is represented as

minKk,Qk\displaystyle\min_{K_{k},Q_{k}} J(Kk,Qk)\displaystyle J(K_{k},Q_{k}) (8a)
J(Kk,Qk)\displaystyle J(K_{k},Q_{k}) =λ𝔼[k=0N1ukTRkuk]+GGW2(ρN,ρr),\displaystyle=\lambda\mathbb{E}\left[\sum_{k=0}^{N-1}{u_{k}^{T}R_{k}u_{k}}\right]+GGW^{2}(\rho_{N},\rho_{r}), (8b)

where Rk𝕊nx++R_{k}\in\mathbb{S}_{n_{x}}^{++} denotes the weights for control cost. Using the control policy (7) in system (6a), the probability distribution of the state xNx_{N} at the terminal time NN will also be the Gaussian distribution. Thus, by substituting equations (5) and (7) into equation (8b), we obtain

J(Kk,Qk)=λk=0N1tr(Rk(KkΣkKkT+Qk))+4(tr(ΣN)tr(Σr))2+8ΣNF216tr(DNDr),J(K_{k},Q_{k})=\lambda\sum_{k=0}^{N-1}{\operatorname{tr}\left(R_{k}\left(K_{k}\Sigma_{k}K_{k}^{T}+Q_{k}\right)\right)}+4\left(\operatorname{tr}(\Sigma_{N})-\operatorname{tr}(\Sigma_{r})\right)^{2}+8\left\|\Sigma_{N}\right\|_{F}^{2}-16\operatorname{tr}{(D_{N}D_{r})}, (9)

where Σk\Sigma_{k} is the covariance matrix of the state xkx_{k}, and DND_{N}, DrD_{r} are diagonal matrices with the eigenvalues of ΣN\Sigma_{N}, Σr\Sigma_{r} arranged in descending order. The dynamics of Σk\Sigma_{k} is given by

Σk+1=AΣkAT+BKkΣkAT+AΣkKkTBT+BKkΣkKkTBT+BQkBT+Wk.\Sigma_{k+1}=A\Sigma_{k}A^{T}+BK_{k}\Sigma_{k}A^{T}+A\Sigma_{k}K_{k}^{T}B^{T}+BK_{k}\Sigma_{k}K_{k}^{T}B^{T}+BQ_{k}B^{T}+W_{k}. (10)

Here, we introduce the variable transformations Mk:=PkΣk1PkT+QkM_{k}:=P_{k}\Sigma_{k}^{-1}P_{k}^{T}+Q_{k} and Pk:=KkΣkP_{k}:=K_{k}\Sigma_{k} as in the Ref. [16, 10]. To guarantee the invertibility of the variable transformation, we need MkOM_{k}\succeq O and MkPkΣk1PkTOM_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T}\succeq O, which implies that the following condition must be satisfied:

[MkPkPkTΣk]O.\displaystyle\begin{bmatrix}M_{k}&P_{k}\\ P_{k}^{T}&\Sigma_{k}\end{bmatrix}\succeq O. (11)

This condition is added to the optimization problem to ensure the feasibility of the solution. Finally, from the (9), (10) and (11), we can write the optimization problem to be solved as follows:

minΣk,Mk,Pk\displaystyle\min_{\begin{subarray}{c}\Sigma_{k},M_{k},P_{k}\end{subarray}} J(ΣN,Mk)\displaystyle\quad J(\Sigma_{N},M_{k}) (12a)
J(ΣN,Mk)=λk=0N1tr(RkMk)+4(tr(ΣN)tr(Σr))2+8ΣNF216tr(DNDr)\displaystyle\begin{split}J(\Sigma_{N},M_{k})&=\lambda\sum_{k=0}^{N-1}\operatorname{tr}\left(R_{k}M_{k}\right)\\ &+4\left(\operatorname{tr}(\Sigma_{N})-\operatorname{tr}(\Sigma_{r})\right)^{2}\\ &+8\left\|\Sigma_{N}\right\|_{F}^{2}-16\operatorname{tr}{(D_{N}D_{r})}\end{split} (12b)
s.t.Σk+1=AkΣkAkT+AkPkTBkT+BkPkAkT+BkMkBkT+Wk\displaystyle\begin{split}\mathrm{s.t.}\quad&\Sigma_{k+1}=A_{k}\Sigma_{k}A_{k}^{T}+A_{k}P_{k}^{T}B_{k}^{T}\\ &+B_{k}P_{k}A_{k}^{T}+B_{k}M_{k}B_{k}^{T}+W_{k}\end{split} (12c)
[MkPkPkTΣk]O\displaystyle\begin{bmatrix}M_{k}&P_{k}\\ P_{k}^{T}&\Sigma_{k}\end{bmatrix}\succeq O (12d)

Although we assumed a stochastic strategy as a control law in (7), the optimal solution turns out to be a deterministic strategy.

Theorem 1

Suppose {Ak}k=0N1\{A_{k}\}_{k=0}^{N-1} are invertible. Then, the optimal policy in problem (12) is deterministic, that is, the optimal solution satisfies Qk=MkKkΣkKkT=OQ_{k}=M_{k}-K_{k}\Sigma_{k}K_{k}^{T}=O.

Proof:

The proof is provided using the same arguments in the Ref. [10]. We utilize the Karush-Kuhn-Tucker conditions. Let EkE_{k} denote the Lagrange multiplier for constraint (12c), and let FkF_{k} denote the Lagrange multiplier for constraint (12d), represented as

Fk=[Fk00Fk01Fk10Fk11].\displaystyle F_{k}=\begin{bmatrix}F_{k}^{00}&F_{k}^{01}\\ F_{k}^{10}&F_{k}^{11}\end{bmatrix}.

From the stationarity condition, we obtain

BkTEkAk+Fk01=0\displaystyle B_{k}^{T}E_{k}A_{k}+F_{k}^{01}=0 (13)
RkBkTEkBk+Fk00=0.\displaystyle R_{k}-B_{k}^{T}E_{k}B_{k}+F_{k}^{00}=0. (14)

The complementary slackness condition yields

[Fk00Fk01Fk10Fk11][MkPkPkTΣk]=O,\displaystyle\begin{bmatrix}F_{k}^{00}&F_{k}^{01}\\ F_{k}^{10}&F_{k}^{11}\end{bmatrix}\begin{bmatrix}M_{k}&P_{k}\\ P_{k}^{T}&\Sigma_{k}\end{bmatrix}=O, (15)

implying

[Fk00Fk01Fk10Fk11][IPkΣk1OI][MkPkΣk1PkTOOΣk]=O\displaystyle\begin{bmatrix}F_{k}^{00}&F_{k}^{01}\\ F_{k}^{10}&F_{k}^{11}\end{bmatrix}\begin{bmatrix}I&P_{k}\Sigma_{k}^{-1}\\ O&I\end{bmatrix}\begin{bmatrix}M_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T}&O\\ O&\Sigma_{k}\end{bmatrix}=O

from the definiteness of Σk\Sigma_{k}. Thus, we obtain

F00(MkPkΣk1PkT)=O\displaystyle F_{00}\left(M_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T}\right)=O (16)
F01T(MkPkΣk1PkT)=O.\displaystyle F_{01}^{T}\left(M_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T}\right)=O. (17)

Subsequently, by combining (13), (17), and the invertibility of AkA_{k}, we derive

EkBk(MkPkΣk1PkT)=O.\displaystyle E_{k}B_{k}\left(M_{k}-P_{k}{\color[rgb]{0,0,0}{\Sigma_{k}^{-1}}}P_{k}^{T}\right)=O. (18)

Finally, from (14), (16), and (18), we conclude that

(RkBkTEkBk)(MkPkΣk1PkT)\displaystyle(R_{k}-B_{k}^{T}E_{k}B_{k})(M_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T})
=Rk(MkPkΣk1PkT)\displaystyle=R_{k}(M_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T})
=O.\displaystyle=O.

and due to the positive definiteness of RkR_{k}, we have MkPkΣk1PkT=OM_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T}=O. ∎

III Formulation as Difference of Convex Programming

Refer to caption
Figure 1: Visualization for the algorithm of DCA.

In this section, we show that problem (12) is a DC programming problem and solve it using the DCA, an optimization method for DC programming. The DC programming problem is an optimization problem whose objective function is a DC function, which is expressed as the difference between two convex functions. Since the DC programming problem is a non-convex optimization problem, finding a global optimum is generally challenging. However, several optimization methods that efficiently find solutions by exploiting the properties of DC functions have been proposed. These include global optimization techniques using branch and bound methods [17], and methods for finding sub-optimal solutions, such as the DCA [15] and the Concave-Convex Procedure (CCCP) [18].

The DCA iteratively constructs a convex upper bound for the objective function, minimizes this upper bound, and then updates the upper bound using the minimizer of the previous iteration. Assume the objective function hh to be optimized is expressed as h(z)=f(z)g(z)h(z)=f(z)-g(z) by convex functions f(z)f(z) and g(z)g(z) defined on a convex set Ω\Omega. The DCA iterates the following steps until convergence:

  1. 1.

    Construct the upper bound h^(z)\hat{h}(z) of h(z)h(z) as

    h^(z)=f(z)snT(zzn),\displaystyle\hat{h}(z)=f(z)-s_{n}^{T}(z-z_{n}),

    where sng(zn)s_{n}\in\partial g(z_{n}).

  2. 2.

    Set zn+1=minzΩh^(z)z_{n+1}=\min_{z\in\Omega}\hat{h}(z).

Because h^(z)\hat{h}(z) is a convex function over the convex set, we are able to minimize this convex subproblem efficiently. It is known [15] that when the optimal value of the problem is finite and the sequences zn{z_{n}}, sn{s_{n}} are bounded, any accumulation points zz^{\infty} of zn{z_{n}} are critical points of fgf-g, which implies 0(fg)(z)0\in\partial(f-g)(z^{\infty}). In Figure 1, we show the visualization of the DCA algorithm.

In the next proposition and theorem, we show that problem (12) is a DC programming problem.

Proposition 1 (Anstreicher and Wolkowicz[19])

Let AA and BB be n×nn\times n symmetric matrices decomposed into their eigenvalues as A=VΛVTA=V\Lambda V^{T} and B=WΞWTB=W\Xi W^{T}, respectively. Assume the eigenvalues in WW and VV are arranged in descending order. Then,

maxUO(n)tr(UAUTB)\displaystyle\max_{U\in O(n)}\operatorname{tr}(UAU^{T}B)

has an optimal solution U=WVTU^{*}=WV^{T}, and the optimal value is tr(ΛΞ)\operatorname{tr}(\Lambda\Xi).

Theorem 2

J(ΣN,Mk)J(\Sigma_{N},M_{k}) in (12b) is a DC function.

Proof:

Since λk=0N1(tr(RkMk))+4(tr(ΣN)tr(Σr))2+8ΣNF2\lambda\sum_{k=0}^{N-1}(\operatorname{tr}(R_{k}M_{k}))+4(\operatorname{tr}(\Sigma_{N})-\operatorname{tr}(\Sigma_{r}))^{2}+8\left\|\Sigma_{N}\right\|_{F}^{2} is clearly a convex function, it suffices to show that

g(ΣN):=tr(DNDr)\displaystyle g(\Sigma_{N}):=\operatorname{tr}(D_{N}D_{r}) (19)

is a convex function. From Proposition 1, we have

tr(DNDr)=maxUO(n)tr(UΣNUTΣr).\displaystyle\operatorname{tr}(D_{N}D_{r})=\max_{U\in O(n)}\operatorname{tr}(U\Sigma_{N}U^{T}\Sigma_{r}).

Thus, tr(DNDr)\operatorname{tr}(D_{N}D_{r}) is the maximum of linear functions, making it a convex function with respect to ΣN\Sigma_{N}. Precisely, let us consider a scalar α[0,1]\alpha\in[0,1] and two positive definite matrices Σ\Sigma and Σ\Sigma^{\prime}. Then, the convex combination αΣ+(1α)Σ\alpha\Sigma+(1-\alpha)\Sigma^{\prime} satisfies the following inequality:

maxUO(n)tr(U(αΣ+(1α)Σ)UTΣr)\displaystyle\max_{U\in O(n)}\operatorname{tr}(U\left(\alpha\Sigma+(1-{\color[rgb]{0,0,0}{\alpha}})\Sigma^{\prime}\right)U^{T}\Sigma_{r})
maxUO(n)αtr(UΣUTΣr)+(1α)tr(UΣUTΣr)\displaystyle\leq\max_{U\in O(n)}\alpha\operatorname{tr}(U\Sigma U^{T}\Sigma_{r})+(1-\alpha)\operatorname{tr}(U\Sigma^{\prime}U^{T}\Sigma_{r})
αmaxUO(n)tr(UΣUTΣr)+(1α)maxUO(n)tr(UΣUTΣr).\displaystyle\begin{multlined}\leq\alpha\max_{U\in O(n)}\operatorname{tr}(U\Sigma U^{T}\Sigma_{r})\\ +(1-\alpha)\max_{U^{\prime}\in O(n)}\operatorname{tr}(U^{\prime}\Sigma^{\prime}{U^{\prime}}^{T}\Sigma_{r}).\end{multlined}\leq\alpha\max_{U\in O(n)}\operatorname{tr}(U\Sigma U^{T}\Sigma_{r})\\ +(1-\alpha)\max_{U^{\prime}\in O(n)}\operatorname{tr}(U^{\prime}\Sigma^{\prime}{U^{\prime}}^{T}\Sigma_{r}).

Furthermore, in the next theorem, we derive a subgradient of the concave part of J(Mk,Σk)J(M_{k},\Sigma_{k}) to construct the upper bound in DCA.

Theorem 3

Assume the eigenvalue decompositions: ΣN=VNDNVNT\Sigma_{N}=V_{N}D_{N}V_{N}^{T} and Σr\Sigma_{r} is Σr=VrDrVrT\Sigma_{r}=V_{r}D_{r}V_{r}^{T}. Then, VNDrVNT𝕊nx+V_{N}D_{r}V_{N}^{T}\in\mathbb{S}_{n_{x}}^{+} is a subgradient of g(ΣN)g(\Sigma_{N}) in (19).

Proof:

From Proposition 1, we have

U:=argmaxUO(n)tr(UΣNUTΣr)=VrVNT.\displaystyle U^{*}:=\mathop{\rm arg~{}max}\limits_{U\in O(n)}\operatorname{tr}(U\Sigma_{N}U^{T}\Sigma_{r})=V_{r}V_{N}^{T}.

Therefore, by Danskin’s theorem[20], a subgradient can be obtained by differentiating the function inside the max operation with respect to ΣN\Sigma_{N} and then substituting UU^{*}. Hence,

UTΣrU=VNVrTΣrVrVNT=VNDrVNTg(ΣN).\displaystyle{U^{*}}^{T}\Sigma_{r}U^{*}=V_{N}V_{r}^{T}\Sigma_{r}V_{r}V_{N}^{T}=V_{N}D_{r}V_{N}^{T}\in\partial{\color[rgb]{0,0,0}{g}}(\Sigma_{N}).

Therefore, the convex subproblem in DCA is formulated as

minΣk,Mk,Pkλk=0N1tr(RkMk)+4(tr(ΣN)tr(Σr))2+8ΣNF216tr(ΣNVN(n)TDrVN(n))\displaystyle\begin{split}\min_{\begin{subarray}{c}\Sigma_{k},M_{k},P_{k}\end{subarray}}\quad&\lambda\sum_{k=0}^{N-1}\operatorname{tr}\left(R_{k}M_{k}\right)+\\ &4\left(\operatorname{tr}(\Sigma_{N})-\operatorname{tr}(\Sigma_{r})\right)^{2}\\ +&8\left\|\Sigma_{N}\right\|_{F}^{2}-16\operatorname{tr}{(\Sigma_{N}{V_{N}^{(n)T}}{\color[rgb]{0,0,0}{D_{r}}}V_{N}^{(n)})}\end{split} (20a)
s.t.Σk+1=AkΣkAkT+AkPkTBkT+BkPkAkT+BkMkBkT+Wk\displaystyle\begin{split}\mathrm{s.t.}\quad&\Sigma_{k+1}=A_{k}\Sigma_{k}A_{k}^{T}+A_{k}P_{k}^{T}B_{k}^{T}\\ &+B_{k}P_{k}A_{k}^{T}+B_{k}M_{k}B_{k}^{T}+W_{k}\end{split} (20b)
[MkPkPkTΣk]0\displaystyle\begin{bmatrix}M_{k}&P_{k}\\ P_{k}^{T}&\Sigma_{k}\end{bmatrix}\succeq 0 (20c)

where VN(n)V_{N}^{(n)} is the matrix obtained by decomposing the optimal ΣN\Sigma_{N} in the nn-th iteration of DCA. The term tr(ΣNVN(n)TDrVN(n))\operatorname{tr}{(\Sigma_{N}V_{N}^{(n)T}{\color[rgb]{0,0,0}{D_{r}}}V_{N}^{(n)})} represents linear lower bound of convex function l(ΣN)l(\Sigma_{N}) using a subgradient obtained in Theorem 3. The subproblem is a semidefinite programming problem (SDP), which can be efficiently solved. We use the solution from each optimization step to update the value of VNV_{N}. By iteratively applying the optimization process, the solution progressively approaches a sub-optimal solution for the original problem (12).

In Theorem 1, we showed that the optimal policy for Problem (12) is deterministic. The following theorem shows that the proposed algorithm generates a sequence of deterministic control policies:

Theorem 4

When {Ak}k=0N1\{A_{k}\}_{k=0}^{N-1} are invertible, the optimal policy of the subproblem (20a) is also deterministic.

Proof:

The KKT conditions used in the proof of Theorem 1 also hold in the subproblem (20a). ∎

IV Numerical Experiments

In this section, we perform numerical optimization for problem (12) using DCA. We set the parameters in this experiment as

Ak=[1.00.10.31.0],\displaystyle A_{k}=\begin{bmatrix}1.0&0.1\\ -0.3&1.0\\ \end{bmatrix},~{} Bk=[0.70.4]\displaystyle B_{k}=\begin{bmatrix}0.7\\ 0.4\\ \end{bmatrix}
Σ0=[3003],\displaystyle\Sigma_{0}=\begin{bmatrix}3&0\\ 0&3\\ \end{bmatrix},~{}\par Wk=0.5I2,\displaystyle W_{k}=0.5I_{2},
Rk=1.0,\displaystyle R_{k}=1.0,~{} N=10.\displaystyle N=10.

Figure 2 shows the time evolution of state covariance of the uncontrolled system. For the implementation of the convex subproblem in DCA, we used the MOSEK solver[21] and the CVXPY modeler[22].

Refer to caption
Figure 2: The snapshots of the covariance matrices of uncontrolled system. The colored lines indicate the 2σ2\sigma range of Σk\Sigma_{k}, while the black line represents Σr\Sigma_{r} in (22).

IV-1 Line alignment

First, we consider the case where the desired density is Gaussian ρr=𝒩(0,10)\rho_{r}=\mathcal{N}(0,10), which is not on 2\mathbb{R}^{2}, but on \mathbb{R}. The problem seeks the optimal policy to align the terminal distribution into one line. Note that W(ρN,ρr)W(\rho_{N},\rho_{r}) in (3) does not make sense111One may think we can embed it onto 2\mathbb{R}^{2} (e.g., by (21)) and consider the Wasserstein distance. However, there is a rotational degree of freedom, which affects the resulting distance. We can interpret that our formulation optimizes this rotation in the sense of the required control energy; See Fig. 3. because 𝒳𝒴\mathcal{X}\neq\mathcal{Y}. In contrast, the GW distance GW(ρN,ρr)GW(\rho_{N},\rho_{r}) in (2) is well-defined and, thanks to (5), equivalent to take

Σr=[10000].\displaystyle\Sigma_{r}=\begin{bmatrix}10&0\\ 0&0\\ \end{bmatrix}. (21)

Figure 3 presents the trajectories of one hundred samples from the controlled process when the target distribution is degenerate distribution. It can be seen that the distribution of states actually stretches vertically to achieve a one-line alignment.

Refer to caption
Figure 3: One hundred sample paths of the systems controlled by the optimal policy for λ=1\lambda=1. The blue circle denotes 2σ\sigma range of Σ0\Sigma_{0} and ΣN\Sigma_{N} and the black line denotes the degenerated target distribution Σr\Sigma_{r} in (21).

IV-2 Comparison with the Wasserstein formulation

Let us consider

Σr=[2000.5]\displaystyle\Sigma_{r}=\begin{bmatrix}2&0\\ 0&0.5\\ \end{bmatrix} (22)

for which

GGW2(ρN,ρr)=6711.44\displaystyle GGW^{2}(\rho_{N},\rho_{r})=6711.44 (23)

for the uncontrolled system. Figure 4 shows the snapshots of state covariance under the optimal control input obtained by DCA. It can be observed that the shape of the terminal distribution approaches that of the target distribution as λ\lambda decreases. As shown below, the terminal distribution is the one requiring the least energy among the rotated distributions of the target due to the rotational invariance of the GW distance.

Figure 5 shows the relationship between the optimized control cost term and GW cost term in Eq. (8b) for each λ\lambda. As the value of λ\lambda increases, the control cost rises, while the GW cost decreases. Conversely, as the value of λ\lambda decreases, the control cost diminishes, and the GW cost increases. Also, as λ\lambda becomes smaller, the GW cost is almost close to zero. It is noteworthy that, in comparison to the uncontrolled system in (23), our algorithm achieves a significant reduction in the GW cost.

Finally, we clarify the advantage of our approach over the Wasserstein terminal cost problem [10]. In Fig. 4(a), the obtained terminal distribution is ρN𝒩(0,Σ^r(θGW))\rho_{N}\approx\mathcal{N}(0,\hat{\Sigma}_{r}(\theta_{\rm GW})) with θGW=1.20[rad]\theta_{\rm GW}=1.20\mathrm{\,[rad]} where Σ^r(θ)\hat{\Sigma}_{r}(\theta) is obtained by rotating Σr\Sigma_{r} by an angle θ\theta, i.e.,

Σ^r(θ):=R(θ)TΣrR(θ),R(θ):=[cosθsinθsinθcosθ].\hat{\Sigma}_{r}(\theta):=R(\theta)^{T}\Sigma_{r}R(\theta),\ R(\theta):=\begin{bmatrix}\cos\theta&-\sin\theta\\ \sin\theta&\cos\theta\end{bmatrix}.

It is shown in [10] that we can solve

minKk,Qkλ𝔼[k=0N1ukTRkuk]+W2(ρN,𝒩(0,Σ^r(θ))\displaystyle\min_{K_{k},Q_{k}}\lambda\mathbb{E}\left[\sum_{k=0}^{N-1}{u_{k}^{T}R_{k}u_{k}}\right]+W^{2}(\rho_{N},\mathcal{N}(0,\hat{\Sigma}_{r}(\theta)) (24)

by an SDP. Then, we solved this problem for a sufficiently small λ\lambda (i.e., large terminal cost). The required control energy for the obtained optimal control input (i.e., the first term without λ\lambda in (24)) is denoted by Wopt(θ)W_{\rm opt}(\theta), which is shown in Fig. 6. It is noteworthy that the function exhibits a non-convexity. We can also observe

θGWθ:=argminθWopt(θ),\theta_{\rm GW}\approx\theta^{*}:=\mathop{\rm arg~{}min}\limits_{\theta}W_{\rm opt}(\theta),

which implies that the rotation angle obtained by the GW terminal cost problem minimizes the resulting control energy needed to realize the required shape (specified by Σr\Sigma_{r}). From a computation cost point of view, while finding θ\theta^{*} using the Wasserstein terminal cost approach requires performing optimization to compute Wopt(θ)W_{\rm opt}(\theta) for each θ\theta, our GW terminal cost framework only necessitates solving a single optimization problem. Moreover, our approach remains computationally tractable even in high-dimensional settings, where the Wasserstein terminal cost approach becomes computationally intractable due to the exponential growth of the search space of the rotation matrix.

Refer to caption
(a) λ=1\lambda=1
Refer to caption
(b) λ=100\lambda=100
Refer to caption
(c) λ=10000\lambda=10000
Figure 4: The snapshots of the covariance matrices under the optimized policy for different values of λ\lambda. The colored lines indicate the 2σ2\sigma range of Σk\Sigma_{k}, while the black line represents Σr\Sigma_{r} in (22).
Refer to caption
Figure 5: Relationship between the optimized control cost term and GW cost term in Eq. (8b) for each λ\lambda.
Refer to caption
Figure 6: The required control energy Wopt(θ)W_{\rm opt}(\theta) of the optimal input for the Wasserstein terminal cost problem in (24). The green line and the red dot represent θGW\theta_{\rm GW} and θ\theta^{*}, respectively.

V Conclusion

In this study, we addressed the optimal density control problem with the Gromov-Wasserstein distance as the terminal cost. We showed that the problem is a DC programming problem and proposed an optimization method based on the DC algorithm. Numerical experiments confirmed that the state distribution reaches the terminal distribution that can be realized with the minimum control energy among those having the specified shape.

Future work includes the application of the proposed GW framework to the transport between spaces equipped with different Riemannian metric structures or point clouds. Model predictive formation control based on a fast algorithm for optimal transport [23] is also a promising direction [24]. The convergence and computation complexity of the proposed DC algorithm should also be investigated.

References

  • [1] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal steering of a linear stochastic system to a final probability distribution, part I,” IEEE Transactions on Automatic Control, vol. 61, no. 5, pp. 1158–1169, 2016.
  • [2] B. D. O. Anderson, “The inverse problem of stationary covariance generation,” Journal of Statistical Physics, vol. 1, no. 1, pp. 133–147, 1969.
  • [3] A. F. Hotz and R. E. Skelton, “A covariance control theory,” in 1985 24th IEEE Conference on Decision and Control, 1985, pp. 552–557.
  • [4] F. Liu, G. Rapakoulias, and P. Tsiotras, “Optimal covariance steering for discrete-time linear stochastic systems,” Preprint arXiv:2211.00618, 2022.
  • [5] G. Rapakoulias and P. Tsiotras, “Discrete-time optimal covariance steering via semidefinite programming,” Preprint arXiv:2302.14296, 2023.
  • [6] K. Ito and K. Kashima, “Maximum entropy density control of discrete-time linear systems with quadratic cost,” arXiv:2309.10662, 2023.
  • [7] ——, “Maximum entropy optimal density control of discrete-time linear systems and Schrödinger bridges,” IEEE Transactions on Automatic Control, Early Access, 2024.
  • [8] A. Halder and E. D. Wendel, “Finite horizon linear quadratic Gaussian density regulator with Wasserstein terminal cost,” in 2016 American Control Conference (ACC), 2016, pp. 7249–7254.
  • [9] I. M. Balci and E. Bakolas, “Covariance steering of discrete-time stochastic linear systems based on Wasserstein distance terminal cost,” IEEE Control Systems Letters, vol. 5, no. 6, pp. 2000–2005, 2021.
  • [10] ——, “Exact SDP formulation for discrete-time covariance steering with Wasserstein terminal cost,” arXiv:2205.10740, 2022.
  • [11] V. Krishnan and S. Martínez, “Distributed optimal transport for the deployment of swarms,” in 2018 IEEE Conference on Decision and Control (CDC), 2018, pp. 4583–4588.
  • [12] D. V. Dimarogonas and K. H. Johansson, “On the stability of distance-based formation control,” in 2008 47th IEEE Conference on Decision and Control, 2008, pp. 1200–1205.
  • [13] F. Mémoli, “Gromov-Wasserstein distances and the metric approach to object matching,” Foundations of Computational Mathematics, vol. 11, pp. 417–487, 2011.
  • [14] J. Delon, A. Desolneux, and A. Salmona, “Gromov-Wasserstein distances between Gaussian distributions,” Journal of Applied Probability, vol. 59, no. 4, pp. 1178–1198, 2022.
  • [15] P. D. Tao and L. H. An, “Convex analysis approach to DC programming: Theory, algorithms and applications,” Acta Mathematica Vietnamica, vol. 22, no. 1, pp. 289–355, 1997.
  • [16] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal steering of a linear stochastic system to a final probability distribution, part II,” IEEE Transactions on Automatic Control, vol. 61, no. 5, pp. 1170–1180, 2016.
  • [17] R. Horst and N. V. Thoai, “DC programming: Overview,” Journal of Optimization Theory and Applications, vol. 103, pp. 1–43, 1999.
  • [18] A. L. Yuille and A. Rangarajan, “The concave-convex procedure (CCCP),” in Advances in Neural Information Processing Systems, vol. 14.   MIT Press, 2001.
  • [19] K. Anstreicher and H. Wolkowicz, “On Lagrangian relaxation of quadratic matrix constraints,” SIAM Journal on Matrix Analysis and Applications, vol. 22, no. 1, pp. 41–55, 2000.
  • [20] D. P. Bertsekas, Nonlinear Programming: 2nd Edition.   Athena Scientific, 1999.
  • [21] MOSEK ApS, MOSEK Optimizer API for Python. Version 10.1, 2023.
  • [22] S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling language for convex optimization,” Journal of Machine Learning Research, vol. 17, no. 83, pp. 1–5, 2016.
  • [23] J. Solomon, G. Peyré, V. G. Kim, and S. Sra, “Entropic metric alignment for correspondence problems,” ACM Transactions on Graphics, vol. 35, no. 4, 2016.
  • [24] K. Ito and K. Kashima, “Entropic model predictive optimal transport over dynamical systems,” Automatica, vol. 152, p. 110980, 2023.