Minimum energy density steering of linear systems with Gromov-Wasserstein terminal cost

Kohei Morimoto and Kenji Kashima The authors are with the Graduate School of Informatics, Kyoto University, Kyoto, Japan kohei.morimoto.73r@st.kyoto-u.ac.jp; kk@i.kyoto-u.ac.jp. This work was supported by JSPS KAKENHI Grant Number JP21H04875 and the joint project of Kyoto University and Toyota Motor Corporation, titled “Advanced Mathematical Science for Mobility Society”.

Abstract

In this paper, we newly formulate and solve the optimal density control problem with Gromov-Wasserstein (GW) terminal cost in discrete-time linear Gaussian systems. Differently from the Wasserstein or Kullback-Leibler distances employed in the existing works, the GW distance quantifies the difference in shapes of the distribution, which is invariant under translation and rotation. Consequently, our formulation allows us to find small energy inputs that achieve the desired shape of the terminal distribution, which has practical applications, e.g., robotic swarms. We demonstrate that the problem can be reduced to a Difference of Convex (DC) programming, which is efficiently solvable through the DC algorithm. Through numerical experiments, we confirm that the state distribution reaches the terminal distribution that can be realized with the minimum control energy among those having the specified shape.

I Introduction

Optimal density control is defined as the problem of controlling the probability distribution of state variables to the desired distribution in a dynamic system. Promising applications of optimal density control include systems in which it is important to manage errors in the state, such as quality control and aircraft control, as well as quantum systems in which the distribution of the state itself is the object of control [1].

The problem addressed in this paper is a variant of the (finite-time) covariance steering problem for discrete-time linear Gaussian systems. Among the long history of this line of research [2, 3], the most related recent works are as follows: hard constraint formulations, as seen in [4, 5, 6, 7], where the terminal state distribution is enforced as a constraint; and soft constraint formulations, as seen in [8, 9, 10], where the Wasserstein distance between the terminal distribution and the target distribution is incorporated as a cost. In particular, Balci et al. [10] presented an optimal distributional control problem for discrete-time linear Gaussian systems using Wasserstein distance as the terminal cost, formulated as semidefinite programming (SDP), a form of convex programming, to derive globally optimal control policies.

To motivate the present work, let us consider the state distribution as an ensemble of particles or a multi-robotic swarm [11]. In such applications, a particular shape of the formation is required to be achieved, but its location and orientation are often irrelevant. For example, they may seek to align in a single row in a two-dimensional region (See Fig. 3 below), or only the configuration may be specified based on inter-agent distance [12]. The aforementioned formulations can address the realization of the configuration with a fixed orientation but cannot address the optimization with respect to the rotation. To tackle this issue, we propose a novel density control problem incorporating the Gromov-Wasserstein distance (GW distance) as the terminal cost [13]. The GW distance is the distance between probability distributions and can measure the closeness of the shape of the probability distributions. By integrating the GW distance between the state and target distributions into the terminal cost, we can formulate the problem of controlling the shape of the state distribution. This problem can be viewed as a simultaneous optimization of the dynamical steering and the rotation of the target shape, which clearly contrasts the existing formulations.

In this study, we focus on scenarios where the initial and target distributions are Gaussian and seek the optimal control policy among linear feedback control laws. While computing the GW distance between arbitrary distributions is challenging, it has recently been shown that the Gaussian Gromov-Wasserstein (GGW) distance, which is a relaxation of the GW distance for normal distributions, can be easily calculated [14]. We show that the optimal density control problem with GW terminal cost can be formulated as a difference of convex (DC) programming problem. We solve the problem by the DC algorithm (DCA) [15], a technique for solving DC programming problems through iterative convex relaxation. Remarkably, the convexified problem is transformed into a SDP form, which can be efficiently solved using standard convex programming solvers.

The rest of the paper is organized as follows. In Section II, we introduce the concept of the GW distance and present the optimal density steering problem with the GW distance as the terminal cost. Section 1 discusses the formulation of the problem as a DC programming problem, highlighting the objective function’s nature as a difference of convex function and deriving the convexified sub-problem used in the DC algorithm. The numerical simulations are presented in Section IV. Finally, we conclude our paper in Section V.

Notation Let $\mathbb{S}_{n}^{+}$ and $\mathbb{S}_{n}^{++}$ denote the sets of $n$ -dimensional positive semidefinite matrices and positive definite matrices, respectively. Let $O(n)$ denote the set of $n$ -dimensional orthogonal matrices. For matrices, $\left\|\cdot\right\|_{F}$ denotes the Frobenius norm. For a convex function $f$ , $\partial f(x)$ denotes the set of subgradients of $f$ at $x$ . Let $\mathcal{P}(\mathcal{X})$ denote the set of all probability distributions over $\mathcal{X}$ . Let $\mathcal{N}_{n}(\mu,\Sigma)$ denote the multivariate normal distribution with mean $\mu\in\mathbb{R}^{n}$ and covariance $\Sigma\in{\color[rgb]{0,0,0}{\mathbb{S}_{n}^{+}}}$ . Let $\mathscr{N}_{n}$ denote the set of all $n$ -dimensional multivariate normal distributions.

II Problem Setting

II-A Gromov-Wasserstein distance

The optimal transport distance is generally defined as the minimized transport cost of transporting one probability distribution to another probability distribution. The GW distance is the distance between probability distributions and is a variant of optimal transport distance, similar to the Wasserstein distance. Given two metric spaces $\mathcal{X},\mathcal{Y}$ , the set of transports $\Pi$ between probability distributions $\mu\in\mathcal{P}(\mathcal{X})$ and $\nu\in\mathcal{P}(\mathcal{Y})$ is defined by

\Pi(\mu,\nu):=\{\pi(x,y)|\int_{x}\pi(x,y)dx=\nu(y),\\ \int_{y}\pi(x,y)dy=\mu(x)\}.

(1)

Each element $\pi(x,y)$ in $\Pi(\mu,\nu)$ represents how the weight $\mu(x)$ at $x$ is transported to $y$ , with the condition that $\int_{x}\pi(x,y)dx=\nu(y)$ ensuring that the transported destination becomes $\nu(y)$ . The GW distance is defined by

GW^{2}(\mu,\nu):=\\ \inf_{\pi\in\Pi(\mu,\nu)}\int\int\left(\left\|x-x^{\prime}\right\|_{\mathcal{X}}-\left\|y-y^{\prime}\right\|_{\mathcal{Y}}\right)^{2}\\ \pi(x,y)\pi(x^{\prime},y^{\prime})dxdydx^{\prime}dy^{\prime},

(2)

where $\left\|x-x^{\prime}\right\|_{\mathcal{X}}$ and $\left\|y-y^{\prime}\right\|_{\mathcal{Y}}$ represent the norms in the spaces $\mathcal{X}$ and $\mathcal{Y}$ , respectively. The GW distance is small when points that are close (resp. farther apart) before transportation are brought closer together (resp. farther apart) after transportation. Conversely, the GW distance increases when points that were initially close are moved farther apart after transportation. Therefore, this definition quantifies the shape difference between two probabilistic distributions. For comparison, recall that the Wasserstein distance is defined as

W^{2}(\mu,\nu):=\inf_{\pi\in\Pi(\mu,\nu)}\int\int d(x,y)^{2}\pi(x,y)dxdy,

(3)

where $\mathcal{X}=\mathcal{Y}$ and $d(\cdot,\cdot)$ is a suitable distance on $\mathcal{X}$ . While the Wasserstein distance is sensitive to the absolute positions or orientations of the distributions, the GW distance is invariant under isometric transformations such as translations and rotations.

The GW distance involves a non-convex quadratic program over transport $\pi$ , making it challenging to compute the GW distance between arbitrary probability distributions. Recently, it has been shown that the Gaussian Gromov-Wasserstein (GGW) distance, where the transport is constrained to be Gaussian distribution only, can be explicitly expressed in terms of the parameters of the normal distributions [14]. Specifically, the GGW between Gaussian distributions $\mu\in\mathscr{N}_{m}$ and $\nu\in\mathscr{N}_{n}$ is defined by

GGW^{2}(\mu,\nu):=\\ \inf_{\pi\in\Pi(\mu,\nu)\cap\mathscr{N}_{m+n}}\int\int\left(\left\|x-x^{\prime}\right\|-\left\|y-y^{\prime}\right\|\right)^{2}\\ \pi\left(x,y\right)\pi(x^{\prime},y^{\prime})dxdydx^{\prime}dy^{\prime}

(4)

where $\Pi(\mu,\nu)\cap\mathscr{N}_{m+n}$ represents the restriction of the transport to the ( $m+n$ )-dimensional Gaussian distribution $\mathscr{N}_{m+n}$ . For $\mu=\mathcal{N}_{m}(\mu_{0},\Sigma_{0}),\nu=\mathcal{N}_{n}(\mu_{1},\Sigma_{1})$ , it holds that

GGW^{2}(\mu,\nu)=\\ 4\left(\operatorname{tr}(\Sigma_{0})-\operatorname{tr}(\Sigma_{1})\right)^{2}+8\left\|D_{0}-D_{1}\right\|_{F}^{2},

(5)

where $D_{0},D_{1}$ are the diagonal matrices with the eigenvalues of $\Sigma_{0},\Sigma_{1}$ sorted in descending order, and if $m\neq n$ , the missing elements are filled with zeros.

II-B Optimal density steering with Gromov-Wasserstein terminal cost

Let $n_{x}$ be the dimension of the state space and $n_{u}$ the dimension of the input, and consider the following discrete-time linear Gaussian system.


$\displaystyle x_{k+1}$	$\displaystyle=Ax_{k}+Bu_{k}+w_{k}$	(6a)
$\displaystyle x_{0}$	$\displaystyle=\mathcal{N}(0,\Sigma_{0})$	(6b)
$\displaystyle w_{k}$	$\displaystyle\sim\mathcal{N}\left(0,W_{k}\right)$	(6c)

Here, the covariance matrix of the initial Gaussian distribution is $\Sigma_{0}\in\mathbb{S}_{n_{x}}^{++}$ and that of the noise is $W_{k}\in\mathbb{S}_{n_{x}}^{+}$ . For control input $u_{k}$ , We use a stochastic linear control policy as

\displaystyle u_{k}(x)=\mathcal{N}(K_{k}x,Q_{k}),

(7)

where $K_{k}$ is feedback gain and $Q_{k}\succeq O$ is covariance of Gaussian distribution. We consider the problem of minimizing the sum of the control costs and the Gromov-Wasserstein distance between the terminal distribution $\rho_{N}=\mathcal{N}(\mu_{N},\Sigma_{N})$ ( $\Sigma_{N}\in\mathbb{S}_{n_{x}}^{++}$ ) and the target distribution $\rho_{r}=\mathcal{N}(0,\Sigma_{r})$ ( $\Sigma_{r}\in\mathbb{S}_{n_{x}}^{+}$ ). The objective function is represented as


$\displaystyle\min_{K_{k},Q_{k}}$	$\displaystyle J(K_{k},Q_{k})$	(8a)
$\displaystyle J(K_{k},Q_{k})$	$\displaystyle=\lambda\mathbb{E}\left[\sum_{k=0}^{N-1}{u_{k}^{T}R_{k}u_{k}}\right]+GGW^{2}(\rho_{N},\rho_{r}),$	(8b)

where $R_{k}\in\mathbb{S}_{n_{x}}^{++}$ denotes the weights for control cost. Using the control policy (7) in system (6a), the probability distribution of the state $x_{N}$ at the terminal time $N$ will also be the Gaussian distribution. Thus, by substituting equations (5) and (7) into equation (8b), we obtain

J(K_{k},Q_{k})=\lambda\sum_{k=0}^{N-1}{\operatorname{tr}\left(R_{k}\left(K_{k}\Sigma_{k}K_{k}^{T}+Q_{k}\right)\right)}+4\left(\operatorname{tr}(\Sigma_{N})-\operatorname{tr}(\Sigma_{r})\right)^{2}+8\left\|\Sigma_{N}\right\|_{F}^{2}-16\operatorname{tr}{(D_{N}D_{r})},

(9)

where $\Sigma_{k}$ is the covariance matrix of the state $x_{k}$ , and $D_{N}$ , $D_{r}$ are diagonal matrices with the eigenvalues of $\Sigma_{N}$ , $\Sigma_{r}$ arranged in descending order. The dynamics of $\Sigma_{k}$ is given by

\Sigma_{k+1}=A\Sigma_{k}A^{T}+BK_{k}\Sigma_{k}A^{T}+A\Sigma_{k}K_{k}^{T}B^{T}+BK_{k}\Sigma_{k}K_{k}^{T}B^{T}+BQ_{k}B^{T}+W_{k}.

(10)

Here, we introduce the variable transformations $M_{k}:=P_{k}\Sigma_{k}^{-1}P_{k}^{T}+Q_{k}$ and $P_{k}:=K_{k}\Sigma_{k}$ as in the Ref. [16, 10]. To guarantee the invertibility of the variable transformation, we need $M_{k}\succeq O$ and $M_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T}\succeq O$ , which implies that the following condition must be satisfied:

\displaystyle\begin{bmatrix}M_{k}&P_{k}\\ P_{k}^{T}&\Sigma_{k}\end{bmatrix}\succeq O.

(11)

This condition is added to the optimization problem to ensure the feasibility of the solution. Finally, from the (9), (10) and (11), we can write the optimization problem to be solved as follows:


$\displaystyle\min_{\begin{subarray}{c}\Sigma_{k},M_{k},P_{k}\end{subarray}}$	$\displaystyle\quad J(\Sigma_{N},M_{k})$	(12a)
$\displaystyle\begin{split}J(\Sigma_{N},M_{k})&=\lambda\sum_{k=0}^{N-1}\operatorname{tr}\left(R_{k}M_{k}\right)\\ &+4\left(\operatorname{tr}(\Sigma_{N})-\operatorname{tr}(\Sigma_{r})\right)^{2}\\ &+8\left\\|\Sigma_{N}\right\\|_{F}^{2}-16\operatorname{tr}{(D_{N}D_{r})}\end{split}$		(12b)
$\displaystyle\begin{split}\mathrm{s.t.}\quad&\Sigma_{k+1}=A_{k}\Sigma_{k}A_{k}^{T}+A_{k}P_{k}^{T}B_{k}^{T}\\ &+B_{k}P_{k}A_{k}^{T}+B_{k}M_{k}B_{k}^{T}+W_{k}\end{split}$		(12c)
	$\displaystyle\begin{bmatrix}M_{k}&P_{k}\\ P_{k}^{T}&\Sigma_{k}\end{bmatrix}\succeq O$	(12d)

Although we assumed a stochastic strategy as a control law in (7), the optimal solution turns out to be a deterministic strategy.

Theorem 1

Suppose $\{A_{k}\}_{k=0}^{N-1}$ are invertible. Then, the optimal policy in problem (12) is deterministic, that is, the optimal solution satisfies $Q_{k}=M_{k}-K_{k}\Sigma_{k}K_{k}^{T}=O$ .

Proof:

The proof is provided using the same arguments in the Ref. [10]. We utilize the Karush-Kuhn-Tucker conditions. Let $E_{k}$ denote the Lagrange multiplier for constraint (12c), and let $F_{k}$ denote the Lagrange multiplier for constraint (12d), represented as

\displaystyle F_{k}=\begin{bmatrix}F_{k}^{00}&F_{k}^{01}\\ F_{k}^{10}&F_{k}^{11}\end{bmatrix}.

From the stationarity condition, we obtain

	$\displaystyle B_{k}^{T}E_{k}A_{k}+F_{k}^{01}=0$		(13)
	$\displaystyle R_{k}-B_{k}^{T}E_{k}B_{k}+F_{k}^{00}=0.$		(14)

The complementary slackness condition yields

\displaystyle\begin{bmatrix}F_{k}^{00}&F_{k}^{01}\\ F_{k}^{10}&F_{k}^{11}\end{bmatrix}\begin{bmatrix}M_{k}&P_{k}\\ P_{k}^{T}&\Sigma_{k}\end{bmatrix}=O,

(15)

implying

\displaystyle\begin{bmatrix}F_{k}^{00}&F_{k}^{01}\\ F_{k}^{10}&F_{k}^{11}\end{bmatrix}\begin{bmatrix}I&P_{k}\Sigma_{k}^{-1}\\ O&I\end{bmatrix}\begin{bmatrix}M_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T}&O\\ O&\Sigma_{k}\end{bmatrix}=O

from the definiteness of $\Sigma_{k}$ . Thus, we obtain

	$\displaystyle F_{00}\left(M_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T}\right)=O$		(16)
	$\displaystyle F_{01}^{T}\left(M_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T}\right)=O.$		(17)

Subsequently, by combining (13), (17), and the invertibility of $A_{k}$ , we derive

\displaystyle E_{k}B_{k}\left(M_{k}-P_{k}{\color[rgb]{0,0,0}{\Sigma_{k}^{-1}}}P_{k}^{T}\right)=O.

(18)

Finally, from (14), (16), and (18), we conclude that

	$\displaystyle(R_{k}-B_{k}^{T}E_{k}B_{k})(M_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T})$
	$\displaystyle=R_{k}(M_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T})$
	$\displaystyle=O.$

and due to the positive definiteness of $R_{k}$ , we have $M_{k}-P_{k}\Sigma_{k}^{-1}P_{k}^{T}=O$ . ∎

III Formulation as Difference of Convex Programming

Refer to caption — Figure 1: Visualization for the algorithm of DCA.

In this section, we show that problem (12) is a DC programming problem and solve it using the DCA, an optimization method for DC programming. The DC programming problem is an optimization problem whose objective function is a DC function, which is expressed as the difference between two convex functions. Since the DC programming problem is a non-convex optimization problem, finding a global optimum is generally challenging. However, several optimization methods that efficiently find solutions by exploiting the properties of DC functions have been proposed. These include global optimization techniques using branch and bound methods [17], and methods for finding sub-optimal solutions, such as the DCA [15] and the Concave-Convex Procedure (CCCP) [18].

The DCA iteratively constructs a convex upper bound for the objective function, minimizes this upper bound, and then updates the upper bound using the minimizer of the previous iteration. Assume the objective function $h$ to be optimized is expressed as $h(z)=f(z)-g(z)$ by convex functions $f(z)$ and $g(z)$ defined on a convex set $\Omega$ . The DCA iterates the following steps until convergence:

1.

Construct the upper bound $\hat{h}(z)$ of $h(z)$ as

$\displaystyle\hat{h}(z)=f(z)-s_{n}^{T}(z-z_{n}),$

where $s_{n}\in\partial g(z_{n})$ .
2.

Set $z_{n+1}=\min_{z\in\Omega}\hat{h}(z)$ .

Because $\hat{h}(z)$ is a convex function over the convex set, we are able to minimize this convex subproblem efficiently. It is known [15] that when the optimal value of the problem is finite and the sequences ${z_{n}}$ , ${s_{n}}$ are bounded, any accumulation points $z^{\infty}$ of ${z_{n}}$ are critical points of $f-g$ , which implies $0\in\partial(f-g)(z^{\infty})$ . In Figure 1, we show the visualization of the DCA algorithm.

In the next proposition and theorem, we show that problem (12) is a DC programming problem.

Proposition 1 (Anstreicher and Wolkowicz[19])

Let $A$ and $B$ be $n\times n$ symmetric matrices decomposed into their eigenvalues as $A=V\Lambda V^{T}$ and $B=W\Xi W^{T}$ , respectively. Assume the eigenvalues in $W$ and $V$ are arranged in descending order. Then,

\displaystyle\max_{U\in O(n)}\operatorname{tr}(UAU^{T}B)

has an optimal solution $U^{*}=WV^{T}$ , and the optimal value is $\operatorname{tr}(\Lambda\Xi)$ .

Theorem 2

$J(\Sigma_{N},M_{k})$ in (12b) is a DC function.

Proof:

Since $\lambda\sum_{k=0}^{N-1}(\operatorname{tr}(R_{k}M_{k}))+4(\operatorname{tr}(\Sigma_{N})-\operatorname{tr}(\Sigma_{r}))^{2}+8\left\|\Sigma_{N}\right\|_{F}^{2}$ is clearly a convex function, it suffices to show that

\displaystyle g(\Sigma_{N}):=\operatorname{tr}(D_{N}D_{r})

(19)

is a convex function. From Proposition 1, we have

\displaystyle\operatorname{tr}(D_{N}D_{r})=\max_{U\in O(n)}\operatorname{tr}(U\Sigma_{N}U^{T}\Sigma_{r}).

Thus, $\operatorname{tr}(D_{N}D_{r})$ is the maximum of linear functions, making it a convex function with respect to $\Sigma_{N}$ . Precisely, let us consider a scalar $\alpha\in[0,1]$ and two positive definite matrices $\Sigma$ and $\Sigma^{\prime}$ . Then, the convex combination $\alpha\Sigma+(1-\alpha)\Sigma^{\prime}$ satisfies the following inequality:

	$\displaystyle\max_{U\in O(n)}\operatorname{tr}(U\left(\alpha\Sigma+(1-{\color[rgb]{0,0,0}{\alpha}})\Sigma^{\prime}\right)U^{T}\Sigma_{r})$
	$\displaystyle\leq\max_{U\in O(n)}\alpha\operatorname{tr}(U\Sigma U^{T}\Sigma_{r})+(1-\alpha)\operatorname{tr}(U\Sigma^{\prime}U^{T}\Sigma_{r})$
	$\displaystyle\begin{multlined}\leq\alpha\max_{U\in O(n)}\operatorname{tr}(U\Sigma U^{T}\Sigma_{r})\\ +(1-\alpha)\max_{U^{\prime}\in O(n)}\operatorname{tr}(U^{\prime}\Sigma^{\prime}{U^{\prime}}^{T}\Sigma_{r}).\end{multlined}\leq\alpha\max_{U\in O(n)}\operatorname{tr}(U\Sigma U^{T}\Sigma_{r})\\ +(1-\alpha)\max_{U^{\prime}\in O(n)}\operatorname{tr}(U^{\prime}\Sigma^{\prime}{U^{\prime}}^{T}\Sigma_{r}).$

∎

Furthermore, in the next theorem, we derive a subgradient of the concave part of $J(M_{k},\Sigma_{k})$ to construct the upper bound in DCA.

Theorem 3

Assume the eigenvalue decompositions: $\Sigma_{N}=V_{N}D_{N}V_{N}^{T}$ and $\Sigma_{r}$ is $\Sigma_{r}=V_{r}D_{r}V_{r}^{T}$ . Then, $V_{N}D_{r}V_{N}^{T}\in\mathbb{S}_{n_{x}}^{+}$ is a subgradient of $g(\Sigma_{N})$ in (19).

Proof:

From Proposition 1, we have

\displaystyle U^{*}:=\mathop{\rm arg~{}max}\limits_{U\in O(n)}\operatorname{tr}(U\Sigma_{N}U^{T}\Sigma_{r})=V_{r}V_{N}^{T}.

Therefore, by Danskin’s theorem[20], a subgradient can be obtained by differentiating the function inside the max operation with respect to $\Sigma_{N}$ and then substituting $U^{*}$ . Hence,

\displaystyle{U^{*}}^{T}\Sigma_{r}U^{*}=V_{N}V_{r}^{T}\Sigma_{r}V_{r}V_{N}^{T}=V_{N}D_{r}V_{N}^{T}\in\partial{\color[rgb]{0,0,0}{g}}(\Sigma_{N}).

∎

Therefore, the convex subproblem in DCA is formulated as


$\displaystyle\begin{split}\min_{\begin{subarray}{c}\Sigma_{k},M_{k},P_{k}\end{subarray}}\quad&\lambda\sum_{k=0}^{N-1}\operatorname{tr}\left(R_{k}M_{k}\right)+\\ &4\left(\operatorname{tr}(\Sigma_{N})-\operatorname{tr}(\Sigma_{r})\right)^{2}\\ +&8\left\\|\Sigma_{N}\right\\|_{F}^{2}-16\operatorname{tr}{(\Sigma_{N}{V_{N}^{(n)T}}{\color[rgb]{0,0,0}{D_{r}}}V_{N}^{(n)})}\end{split}$		(20a)
$\displaystyle\begin{split}\mathrm{s.t.}\quad&\Sigma_{k+1}=A_{k}\Sigma_{k}A_{k}^{T}+A_{k}P_{k}^{T}B_{k}^{T}\\ &+B_{k}P_{k}A_{k}^{T}+B_{k}M_{k}B_{k}^{T}+W_{k}\end{split}$		(20b)
	$\displaystyle\begin{bmatrix}M_{k}&P_{k}\\ P_{k}^{T}&\Sigma_{k}\end{bmatrix}\succeq 0$	(20c)

where $V_{N}^{(n)}$ is the matrix obtained by decomposing the optimal $\Sigma_{N}$ in the $n$ -th iteration of DCA. The term $\operatorname{tr}{(\Sigma_{N}V_{N}^{(n)T}{\color[rgb]{0,0,0}{D_{r}}}V_{N}^{(n)})}$ represents linear lower bound of convex function $l(\Sigma_{N})$ using a subgradient obtained in Theorem 3. The subproblem is a semidefinite programming problem (SDP), which can be efficiently solved. We use the solution from each optimization step to update the value of $V_{N}$ . By iteratively applying the optimization process, the solution progressively approaches a sub-optimal solution for the original problem (12).

In Theorem 1, we showed that the optimal policy for Problem (12) is deterministic. The following theorem shows that the proposed algorithm generates a sequence of deterministic control policies:

Theorem 4

When $\{A_{k}\}_{k=0}^{N-1}$ are invertible, the optimal policy of the subproblem (20a) is also deterministic.

Proof:

The KKT conditions used in the proof of Theorem 1 also hold in the subproblem (20a). ∎

IV Numerical Experiments

In this section, we perform numerical optimization for problem (12) using DCA. We set the parameters in this experiment as

	$\displaystyle A_{k}=\begin{bmatrix}1.0&0.1\\ -0.3&1.0\\ \end{bmatrix},~{}$	$\displaystyle B_{k}=\begin{bmatrix}0.7\\ 0.4\\ \end{bmatrix}$
	$\displaystyle\Sigma_{0}=\begin{bmatrix}3&0\\ 0&3\\ \end{bmatrix},~{}\par$	$\displaystyle W_{k}=0.5I_{2},$
	$\displaystyle R_{k}=1.0,~{}$	$\displaystyle N=10.$

Figure 2 shows the time evolution of state covariance of the uncontrolled system. For the implementation of the convex subproblem in DCA, we used the MOSEK solver[21] and the CVXPY modeler[22].

IV-1 Line alignment

First, we consider the case where the desired density is Gaussian $\rho_{r}=\mathcal{N}(0,10)$ , which is not on $\mathbb{R}^{2}$ , but on $\mathbb{R}$ . The problem seeks the optimal policy to align the terminal distribution into one line. Note that $W(\rho_{N},\rho_{r})$ in (3) does not make sense¹¹1One may think we can embed it onto $\mathbb{R}^{2}$ (e.g., by (21)) and consider the Wasserstein distance. However, there is a rotational degree of freedom, which affects the resulting distance. We can interpret that our formulation optimizes this rotation in the sense of the required control energy; See Fig. 3. because $\mathcal{X}\neq\mathcal{Y}$ . In contrast, the GW distance $GW(\rho_{N},\rho_{r})$ in (2) is well-defined and, thanks to (5), equivalent to take

\displaystyle\Sigma_{r}=\begin{bmatrix}10&0\\ 0&0\\ \end{bmatrix}.

(21)

Figure 3 presents the trajectories of one hundred samples from the controlled process when the target distribution is degenerate distribution. It can be seen that the distribution of states actually stretches vertically to achieve a one-line alignment.

IV-2 Comparison with the Wasserstein formulation

Let us consider

\displaystyle\Sigma_{r}=\begin{bmatrix}2&0\\ 0&0.5\\ \end{bmatrix}

(22)

for which

\displaystyle GGW^{2}(\rho_{N},\rho_{r})=6711.44

(23)

for the uncontrolled system. Figure 4 shows the snapshots of state covariance under the optimal control input obtained by DCA. It can be observed that the shape of the terminal distribution approaches that of the target distribution as $\lambda$ decreases. As shown below, the terminal distribution is the one requiring the least energy among the rotated distributions of the target due to the rotational invariance of the GW distance.

Figure 5 shows the relationship between the optimized control cost term and GW cost term in Eq. (8b) for each $\lambda$ . As the value of $\lambda$ increases, the control cost rises, while the GW cost decreases. Conversely, as the value of $\lambda$ decreases, the control cost diminishes, and the GW cost increases. Also, as $\lambda$ becomes smaller, the GW cost is almost close to zero. It is noteworthy that, in comparison to the uncontrolled system in (23), our algorithm achieves a significant reduction in the GW cost.

Finally, we clarify the advantage of our approach over the Wasserstein terminal cost problem [10]. In Fig. 4(a), the obtained terminal distribution is $\rho_{N}\approx\mathcal{N}(0,\hat{\Sigma}_{r}(\theta_{\rm GW}))$ with $\theta_{\rm GW}=1.20\mathrm{\,[rad]}$ where $\hat{\Sigma}_{r}(\theta)$ is obtained by rotating $\Sigma_{r}$ by an angle $\theta$ , i.e.,

\hat{\Sigma}_{r}(\theta):=R(\theta)^{T}\Sigma_{r}R(\theta),\ R(\theta):=\begin{bmatrix}\cos\theta&-\sin\theta\\ \sin\theta&\cos\theta\end{bmatrix}.

It is shown in [10] that we can solve

\displaystyle\min_{K_{k},Q_{k}}\lambda\mathbb{E}\left[\sum_{k=0}^{N-1}{u_{k}^{T}R_{k}u_{k}}\right]+W^{2}(\rho_{N},\mathcal{N}(0,\hat{\Sigma}_{r}(\theta))

(24)

by an SDP. Then, we solved this problem for a sufficiently small $\lambda$ (i.e., large terminal cost). The required control energy for the obtained optimal control input (i.e., the first term without $\lambda$ in (24)) is denoted by $W_{\rm opt}(\theta)$ , which is shown in Fig. 6. It is noteworthy that the function exhibits a non-convexity. We can also observe

\theta_{\rm GW}\approx\theta^{*}:=\mathop{\rm arg~{}min}\limits_{\theta}W_{\rm opt}(\theta),

which implies that the rotation angle obtained by the GW terminal cost problem minimizes the resulting control energy needed to realize the required shape (specified by $\Sigma_{r}$ ). From a computation cost point of view, while finding $\theta^{*}$ using the Wasserstein terminal cost approach requires performing optimization to compute $W_{\rm opt}(\theta)$ for each $\theta$ , our GW terminal cost framework only necessitates solving a single optimization problem. Moreover, our approach remains computationally tractable even in high-dimensional settings, where the Wasserstein terminal cost approach becomes computationally intractable due to the exponential growth of the search space of the rotation matrix.

V Conclusion

In this study, we addressed the optimal density control problem with the Gromov-Wasserstein distance as the terminal cost. We showed that the problem is a DC programming problem and proposed an optimization method based on the DC algorithm. Numerical experiments confirmed that the state distribution reaches the terminal distribution that can be realized with the minimum control energy among those having the specified shape.

Future work includes the application of the proposed GW framework to the transport between spaces equipped with different Riemannian metric structures or point clouds. Model predictive formation control based on a fast algorithm for optimal transport [23] is also a promising direction [24]. The convergence and computation complexity of the proposed DC algorithm should also be investigated.

References

[1] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal steering of a linear stochastic system to a final probability distribution, part I,” IEEE Transactions on Automatic Control, vol. 61, no. 5, pp. 1158–1169, 2016.
[2] B. D. O. Anderson, “The inverse problem of stationary covariance generation,” Journal of Statistical Physics, vol. 1, no. 1, pp. 133–147, 1969.
[3] A. F. Hotz and R. E. Skelton, “A covariance control theory,” in 1985 24th IEEE Conference on Decision and Control, 1985, pp. 552–557.
[4] F. Liu, G. Rapakoulias, and P. Tsiotras, “Optimal covariance steering for discrete-time linear stochastic systems,” Preprint arXiv:2211.00618, 2022.
[5] G. Rapakoulias and P. Tsiotras, “Discrete-time optimal covariance steering via semidefinite programming,” Preprint arXiv:2302.14296, 2023.
[6] K. Ito and K. Kashima, “Maximum entropy density control of discrete-time linear systems with quadratic cost,” arXiv:2309.10662, 2023.
[7] ——, “Maximum entropy optimal density control of discrete-time linear systems and Schrödinger bridges,” IEEE Transactions on Automatic Control, Early Access, 2024.
[8] A. Halder and E. D. Wendel, “Finite horizon linear quadratic Gaussian density regulator with Wasserstein terminal cost,” in 2016 American Control Conference (ACC), 2016, pp. 7249–7254.
[9] I. M. Balci and E. Bakolas, “Covariance steering of discrete-time stochastic linear systems based on Wasserstein distance terminal cost,” IEEE Control Systems Letters, vol. 5, no. 6, pp. 2000–2005, 2021.
[10] ——, “Exact SDP formulation for discrete-time covariance steering with Wasserstein terminal cost,” arXiv:2205.10740, 2022.
[11] V. Krishnan and S. Martínez, “Distributed optimal transport for the deployment of swarms,” in 2018 IEEE Conference on Decision and Control (CDC), 2018, pp. 4583–4588.
[12] D. V. Dimarogonas and K. H. Johansson, “On the stability of distance-based formation control,” in 2008 47th IEEE Conference on Decision and Control, 2008, pp. 1200–1205.
[13] F. Mémoli, “Gromov-Wasserstein distances and the metric approach to object matching,” Foundations of Computational Mathematics, vol. 11, pp. 417–487, 2011.
[14] J. Delon, A. Desolneux, and A. Salmona, “Gromov-Wasserstein distances between Gaussian distributions,” Journal of Applied Probability, vol. 59, no. 4, pp. 1178–1198, 2022.
[15] P. D. Tao and L. H. An, “Convex analysis approach to DC programming: Theory, algorithms and applications,” Acta Mathematica Vietnamica, vol. 22, no. 1, pp. 289–355, 1997.
[16] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal steering of a linear stochastic system to a final probability distribution, part II,” IEEE Transactions on Automatic Control, vol. 61, no. 5, pp. 1170–1180, 2016.
[17] R. Horst and N. V. Thoai, “DC programming: Overview,” Journal of Optimization Theory and Applications, vol. 103, pp. 1–43, 1999.
[18] A. L. Yuille and A. Rangarajan, “The concave-convex procedure (CCCP),” in Advances in Neural Information Processing Systems, vol. 14. MIT Press, 2001.
[19] K. Anstreicher and H. Wolkowicz, “On Lagrangian relaxation of quadratic matrix constraints,” SIAM Journal on Matrix Analysis and Applications, vol. 22, no. 1, pp. 41–55, 2000.
[20] D. P. Bertsekas, Nonlinear Programming: 2nd Edition. Athena Scientific, 1999.
[21] MOSEK ApS, MOSEK Optimizer API for Python. Version 10.1, 2023.
[22] S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling language for convex optimization,” Journal of Machine Learning Research, vol. 17, no. 83, pp. 1–5, 2016.
[23] J. Solomon, G. Peyré, V. G. Kim, and S. Sra, “Entropic metric alignment for correspondence problems,” ACM Transactions on Graphics, vol. 35, no. 4, 2016.
[24] K. Ito and K. Kashima, “Entropic model predictive optimal transport over dynamical systems,” Automatica, vol. 152, p. 110980, 2023.