Minimum energy density steering of linear systems with Gromov-Wasserstein terminal cost
Abstract
In this paper, we newly formulate and solve the optimal density control problem with Gromov-Wasserstein (GW) terminal cost in discrete-time linear Gaussian systems. Differently from the Wasserstein or Kullback-Leibler distances employed in the existing works, the GW distance quantifies the difference in shapes of the distribution, which is invariant under translation and rotation. Consequently, our formulation allows us to find small energy inputs that achieve the desired shape of the terminal distribution, which has practical applications, e.g., robotic swarms. We demonstrate that the problem can be reduced to a Difference of Convex (DC) programming, which is efficiently solvable through the DC algorithm. Through numerical experiments, we confirm that the state distribution reaches the terminal distribution that can be realized with the minimum control energy among those having the specified shape.
I Introduction
Optimal density control is defined as the problem of controlling the probability distribution of state variables to the desired distribution in a dynamic system. Promising applications of optimal density control include systems in which it is important to manage errors in the state, such as quality control and aircraft control, as well as quantum systems in which the distribution of the state itself is the object of control [1].
The problem addressed in this paper is a variant of the (finite-time) covariance steering problem for discrete-time linear Gaussian systems. Among the long history of this line of research [2, 3], the most related recent works are as follows: hard constraint formulations, as seen in [4, 5, 6, 7], where the terminal state distribution is enforced as a constraint; and soft constraint formulations, as seen in [8, 9, 10], where the Wasserstein distance between the terminal distribution and the target distribution is incorporated as a cost. In particular, Balci et al. [10] presented an optimal distributional control problem for discrete-time linear Gaussian systems using Wasserstein distance as the terminal cost, formulated as semidefinite programming (SDP), a form of convex programming, to derive globally optimal control policies.
To motivate the present work, let us consider the state distribution as an ensemble of particles or a multi-robotic swarm [11]. In such applications, a particular shape of the formation is required to be achieved, but its location and orientation are often irrelevant. For example, they may seek to align in a single row in a two-dimensional region (See Fig. 3 below), or only the configuration may be specified based on inter-agent distance [12]. The aforementioned formulations can address the realization of the configuration with a fixed orientation but cannot address the optimization with respect to the rotation. To tackle this issue, we propose a novel density control problem incorporating the Gromov-Wasserstein distance (GW distance) as the terminal cost [13]. The GW distance is the distance between probability distributions and can measure the closeness of the shape of the probability distributions. By integrating the GW distance between the state and target distributions into the terminal cost, we can formulate the problem of controlling the shape of the state distribution. This problem can be viewed as a simultaneous optimization of the dynamical steering and the rotation of the target shape, which clearly contrasts the existing formulations.
In this study, we focus on scenarios where the initial and target distributions are Gaussian and seek the optimal control policy among linear feedback control laws. While computing the GW distance between arbitrary distributions is challenging, it has recently been shown that the Gaussian Gromov-Wasserstein (GGW) distance, which is a relaxation of the GW distance for normal distributions, can be easily calculated [14]. We show that the optimal density control problem with GW terminal cost can be formulated as a difference of convex (DC) programming problem. We solve the problem by the DC algorithm (DCA) [15], a technique for solving DC programming problems through iterative convex relaxation. Remarkably, the convexified problem is transformed into a SDP form, which can be efficiently solved using standard convex programming solvers.
The rest of the paper is organized as follows. In Section II, we introduce the concept of the GW distance and present the optimal density steering problem with the GW distance as the terminal cost. Section 1 discusses the formulation of the problem as a DC programming problem, highlighting the objective function’s nature as a difference of convex function and deriving the convexified sub-problem used in the DC algorithm. The numerical simulations are presented in Section IV. Finally, we conclude our paper in Section V.
Notation Let and denote the sets of -dimensional positive semidefinite matrices and positive definite matrices, respectively. Let denote the set of -dimensional orthogonal matrices. For matrices, denotes the Frobenius norm. For a convex function , denotes the set of subgradients of at . Let denote the set of all probability distributions over . Let denote the multivariate normal distribution with mean and covariance . Let denote the set of all -dimensional multivariate normal distributions.
II Problem Setting
II-A Gromov-Wasserstein distance
The optimal transport distance is generally defined as the minimized transport cost of transporting one probability distribution to another probability distribution. The GW distance is the distance between probability distributions and is a variant of optimal transport distance, similar to the Wasserstein distance. Given two metric spaces , the set of transports between probability distributions and is defined by
(1) |
Each element in represents how the weight at is transported to , with the condition that ensuring that the transported destination becomes . The GW distance is defined by
(2) |
where and represent the norms in the spaces and , respectively. The GW distance is small when points that are close (resp. farther apart) before transportation are brought closer together (resp. farther apart) after transportation. Conversely, the GW distance increases when points that were initially close are moved farther apart after transportation. Therefore, this definition quantifies the shape difference between two probabilistic distributions. For comparison, recall that the Wasserstein distance is defined as
(3) |
where and is a suitable distance on . While the Wasserstein distance is sensitive to the absolute positions or orientations of the distributions, the GW distance is invariant under isometric transformations such as translations and rotations.
The GW distance involves a non-convex quadratic program over transport , making it challenging to compute the GW distance between arbitrary probability distributions. Recently, it has been shown that the Gaussian Gromov-Wasserstein (GGW) distance, where the transport is constrained to be Gaussian distribution only, can be explicitly expressed in terms of the parameters of the normal distributions [14]. Specifically, the GGW between Gaussian distributions and is defined by
(4) |
where represents the restriction of the transport to the ()-dimensional Gaussian distribution . For , it holds that
(5) |
where are the diagonal matrices with the eigenvalues of sorted in descending order, and if , the missing elements are filled with zeros.
II-B Optimal density steering with Gromov-Wasserstein terminal cost
Let be the dimension of the state space and the dimension of the input, and consider the following discrete-time linear Gaussian system.
(6a) | ||||
(6b) | ||||
(6c) |
Here, the covariance matrix of the initial Gaussian distribution is and that of the noise is . For control input , We use a stochastic linear control policy as
(7) |
where is feedback gain and is covariance of Gaussian distribution. We consider the problem of minimizing the sum of the control costs and the Gromov-Wasserstein distance between the terminal distribution () and the target distribution (). The objective function is represented as
(8a) | ||||
(8b) |
where denotes the weights for control cost. Using the control policy (7) in system (6a), the probability distribution of the state at the terminal time will also be the Gaussian distribution. Thus, by substituting equations (5) and (7) into equation (8b), we obtain
(9) |
where is the covariance matrix of the state , and , are diagonal matrices with the eigenvalues of , arranged in descending order. The dynamics of is given by
(10) |
Here, we introduce the variable transformations and as in the Ref. [16, 10]. To guarantee the invertibility of the variable transformation, we need and , which implies that the following condition must be satisfied:
(11) |
This condition is added to the optimization problem to ensure the feasibility of the solution. Finally, from the (9), (10) and (11), we can write the optimization problem to be solved as follows:
(12a) | ||||
(12b) | ||||
(12c) | ||||
(12d) |
Although we assumed a stochastic strategy as a control law in (7), the optimal solution turns out to be a deterministic strategy.
Theorem 1
Suppose are invertible. Then, the optimal policy in problem (12) is deterministic, that is, the optimal solution satisfies .
Proof:
The proof is provided using the same arguments in the Ref. [10]. We utilize the Karush-Kuhn-Tucker conditions. Let denote the Lagrange multiplier for constraint (12c), and let denote the Lagrange multiplier for constraint (12d), represented as
From the stationarity condition, we obtain
(13) | |||
(14) |
The complementary slackness condition yields
(15) |
implying
from the definiteness of . Thus, we obtain
(16) | |||
(17) |
Subsequently, by combining (13), (17), and the invertibility of , we derive
(18) |
Finally, from (14), (16), and (18), we conclude that
and due to the positive definiteness of , we have . ∎
III Formulation as Difference of Convex Programming

In this section, we show that problem (12) is a DC programming problem and solve it using the DCA, an optimization method for DC programming. The DC programming problem is an optimization problem whose objective function is a DC function, which is expressed as the difference between two convex functions. Since the DC programming problem is a non-convex optimization problem, finding a global optimum is generally challenging. However, several optimization methods that efficiently find solutions by exploiting the properties of DC functions have been proposed. These include global optimization techniques using branch and bound methods [17], and methods for finding sub-optimal solutions, such as the DCA [15] and the Concave-Convex Procedure (CCCP) [18].
The DCA iteratively constructs a convex upper bound for the objective function, minimizes this upper bound, and then updates the upper bound using the minimizer of the previous iteration. Assume the objective function to be optimized is expressed as by convex functions and defined on a convex set . The DCA iterates the following steps until convergence:
-
1.
Construct the upper bound of as
where .
-
2.
Set .
Because is a convex function over the convex set, we are able to minimize this convex subproblem efficiently. It is known [15] that when the optimal value of the problem is finite and the sequences , are bounded, any accumulation points of are critical points of , which implies . In Figure 1, we show the visualization of the DCA algorithm.
In the next proposition and theorem, we show that problem (12) is a DC programming problem.
Proposition 1 (Anstreicher and Wolkowicz[19])
Let and be symmetric matrices decomposed into their eigenvalues as and , respectively. Assume the eigenvalues in and are arranged in descending order. Then,
has an optimal solution , and the optimal value is .
Theorem 2
in (12b) is a DC function.
Proof:
Since is clearly a convex function, it suffices to show that
(19) |
is a convex function. From Proposition 1, we have
Thus, is the maximum of linear functions, making it a convex function with respect to . Precisely, let us consider a scalar and two positive definite matrices and . Then, the convex combination satisfies the following inequality:
∎
Furthermore, in the next theorem, we derive a subgradient of the concave part of to construct the upper bound in DCA.
Theorem 3
Assume the eigenvalue decompositions: and is . Then, is a subgradient of in (19).
Proof:
Therefore, the convex subproblem in DCA is formulated as
(20a) | ||||
(20b) | ||||
(20c) |
where is the matrix obtained by decomposing the optimal in the -th iteration of DCA. The term represents linear lower bound of convex function using a subgradient obtained in Theorem 3. The subproblem is a semidefinite programming problem (SDP), which can be efficiently solved. We use the solution from each optimization step to update the value of . By iteratively applying the optimization process, the solution progressively approaches a sub-optimal solution for the original problem (12).
In Theorem 1, we showed that the optimal policy for Problem (12) is deterministic. The following theorem shows that the proposed algorithm generates a sequence of deterministic control policies:
Theorem 4
When are invertible, the optimal policy of the subproblem (20a) is also deterministic.
IV Numerical Experiments
In this section, we perform numerical optimization for problem (12) using DCA. We set the parameters in this experiment as
Figure 2 shows the time evolution of state covariance of the uncontrolled system. For the implementation of the convex subproblem in DCA, we used the MOSEK solver[21] and the CVXPY modeler[22].

IV-1 Line alignment
First, we consider the case where the desired density is Gaussian , which is not on , but on . The problem seeks the optimal policy to align the terminal distribution into one line. Note that in (3) does not make sense111One may think we can embed it onto (e.g., by (21)) and consider the Wasserstein distance. However, there is a rotational degree of freedom, which affects the resulting distance. We can interpret that our formulation optimizes this rotation in the sense of the required control energy; See Fig. 3. because . In contrast, the GW distance in (2) is well-defined and, thanks to (5), equivalent to take
(21) |
Figure 3 presents the trajectories of one hundred samples from the controlled process when the target distribution is degenerate distribution. It can be seen that the distribution of states actually stretches vertically to achieve a one-line alignment.

IV-2 Comparison with the Wasserstein formulation
Let us consider
(22) |
for which
(23) |
for the uncontrolled system. Figure 4 shows the snapshots of state covariance under the optimal control input obtained by DCA. It can be observed that the shape of the terminal distribution approaches that of the target distribution as decreases. As shown below, the terminal distribution is the one requiring the least energy among the rotated distributions of the target due to the rotational invariance of the GW distance.
Figure 5 shows the relationship between the optimized control cost term and GW cost term in Eq. (8b) for each . As the value of increases, the control cost rises, while the GW cost decreases. Conversely, as the value of decreases, the control cost diminishes, and the GW cost increases. Also, as becomes smaller, the GW cost is almost close to zero. It is noteworthy that, in comparison to the uncontrolled system in (23), our algorithm achieves a significant reduction in the GW cost.
Finally, we clarify the advantage of our approach over the Wasserstein terminal cost problem [10]. In Fig. 4(a), the obtained terminal distribution is with where is obtained by rotating by an angle , i.e.,
It is shown in [10] that we can solve
(24) |
by an SDP. Then, we solved this problem for a sufficiently small (i.e., large terminal cost). The required control energy for the obtained optimal control input (i.e., the first term without in (24)) is denoted by , which is shown in Fig. 6. It is noteworthy that the function exhibits a non-convexity. We can also observe
which implies that the rotation angle obtained by the GW terminal cost problem minimizes the resulting control energy needed to realize the required shape (specified by ). From a computation cost point of view, while finding using the Wasserstein terminal cost approach requires performing optimization to compute for each , our GW terminal cost framework only necessitates solving a single optimization problem. Moreover, our approach remains computationally tractable even in high-dimensional settings, where the Wasserstein terminal cost approach becomes computationally intractable due to the exponential growth of the search space of the rotation matrix.





V Conclusion
In this study, we addressed the optimal density control problem with the Gromov-Wasserstein distance as the terminal cost. We showed that the problem is a DC programming problem and proposed an optimization method based on the DC algorithm. Numerical experiments confirmed that the state distribution reaches the terminal distribution that can be realized with the minimum control energy among those having the specified shape.
Future work includes the application of the proposed GW framework to the transport between spaces equipped with different Riemannian metric structures or point clouds. Model predictive formation control based on a fast algorithm for optimal transport [23] is also a promising direction [24]. The convergence and computation complexity of the proposed DC algorithm should also be investigated.
References
- [1] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal steering of a linear stochastic system to a final probability distribution, part I,” IEEE Transactions on Automatic Control, vol. 61, no. 5, pp. 1158–1169, 2016.
- [2] B. D. O. Anderson, “The inverse problem of stationary covariance generation,” Journal of Statistical Physics, vol. 1, no. 1, pp. 133–147, 1969.
- [3] A. F. Hotz and R. E. Skelton, “A covariance control theory,” in 1985 24th IEEE Conference on Decision and Control, 1985, pp. 552–557.
- [4] F. Liu, G. Rapakoulias, and P. Tsiotras, “Optimal covariance steering for discrete-time linear stochastic systems,” Preprint arXiv:2211.00618, 2022.
- [5] G. Rapakoulias and P. Tsiotras, “Discrete-time optimal covariance steering via semidefinite programming,” Preprint arXiv:2302.14296, 2023.
- [6] K. Ito and K. Kashima, “Maximum entropy density control of discrete-time linear systems with quadratic cost,” arXiv:2309.10662, 2023.
- [7] ——, “Maximum entropy optimal density control of discrete-time linear systems and Schrödinger bridges,” IEEE Transactions on Automatic Control, Early Access, 2024.
- [8] A. Halder and E. D. Wendel, “Finite horizon linear quadratic Gaussian density regulator with Wasserstein terminal cost,” in 2016 American Control Conference (ACC), 2016, pp. 7249–7254.
- [9] I. M. Balci and E. Bakolas, “Covariance steering of discrete-time stochastic linear systems based on Wasserstein distance terminal cost,” IEEE Control Systems Letters, vol. 5, no. 6, pp. 2000–2005, 2021.
- [10] ——, “Exact SDP formulation for discrete-time covariance steering with Wasserstein terminal cost,” arXiv:2205.10740, 2022.
- [11] V. Krishnan and S. Martínez, “Distributed optimal transport for the deployment of swarms,” in 2018 IEEE Conference on Decision and Control (CDC), 2018, pp. 4583–4588.
- [12] D. V. Dimarogonas and K. H. Johansson, “On the stability of distance-based formation control,” in 2008 47th IEEE Conference on Decision and Control, 2008, pp. 1200–1205.
- [13] F. Mémoli, “Gromov-Wasserstein distances and the metric approach to object matching,” Foundations of Computational Mathematics, vol. 11, pp. 417–487, 2011.
- [14] J. Delon, A. Desolneux, and A. Salmona, “Gromov-Wasserstein distances between Gaussian distributions,” Journal of Applied Probability, vol. 59, no. 4, pp. 1178–1198, 2022.
- [15] P. D. Tao and L. H. An, “Convex analysis approach to DC programming: Theory, algorithms and applications,” Acta Mathematica Vietnamica, vol. 22, no. 1, pp. 289–355, 1997.
- [16] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal steering of a linear stochastic system to a final probability distribution, part II,” IEEE Transactions on Automatic Control, vol. 61, no. 5, pp. 1170–1180, 2016.
- [17] R. Horst and N. V. Thoai, “DC programming: Overview,” Journal of Optimization Theory and Applications, vol. 103, pp. 1–43, 1999.
- [18] A. L. Yuille and A. Rangarajan, “The concave-convex procedure (CCCP),” in Advances in Neural Information Processing Systems, vol. 14. MIT Press, 2001.
- [19] K. Anstreicher and H. Wolkowicz, “On Lagrangian relaxation of quadratic matrix constraints,” SIAM Journal on Matrix Analysis and Applications, vol. 22, no. 1, pp. 41–55, 2000.
- [20] D. P. Bertsekas, Nonlinear Programming: 2nd Edition. Athena Scientific, 1999.
- [21] MOSEK ApS, MOSEK Optimizer API for Python. Version 10.1, 2023.
- [22] S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling language for convex optimization,” Journal of Machine Learning Research, vol. 17, no. 83, pp. 1–5, 2016.
- [23] J. Solomon, G. Peyré, V. G. Kim, and S. Sra, “Entropic metric alignment for correspondence problems,” ACM Transactions on Graphics, vol. 35, no. 4, 2016.
- [24] K. Ito and K. Kashima, “Entropic model predictive optimal transport over dynamical systems,” Automatica, vol. 152, p. 110980, 2023.