An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem
Abstract
In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently exciting input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. Finally, a method to determine a stabilizing policy to initialize the algorithm using only measured data is proposed.
I INTRODUCTION
Reinforcement learning (RL) is a set of iterative algorithms that allow a system to learn its optimal behavior as it interacts with its environment [1, 2]. In the context of linear optimal control, RL has been used in the last few decades to solve the linear quadratic regulator (LQR) problem in continuous-time [3, 4, 5, 6, 7, 8] and in discrete time [9, 10, 11, 12]. For applications of RL procedures to nonlinear systems and other extensions, the reader is referred to the surveys [13, 14, 15] and the references therein.
In the continuous-time linear time-invariant (CT-LTI) case, several RL algorithms with attractive properties have been designed. Although the first proposed algorithms required at least partial knowledge of the system model (e.g., [3]), completely data-based methods are now well known [4, 5, 6, 7]. These data-based algorithms replace the need for model knowledge by measuring persistently excited data directly from the system. Most of these data-based methods are on-policy algorithms, meaning that they require the application (or simulation) of an exciting input to the system at every iteration, such that a new set of data can be collected. In contrast, the authors in [8] proposed a data-based off-policy RL algorithm. This method has the advantage of requiring to collect data from the system only once, and then every iteration of the algorithm is performed using the same batch of measurements.
The method in [8], as well as most on-policy methods, is formulated as the problem of determining the values of certain unknown matrices from a set of equations derived from the Bellman equation. Taking advantage of the properties of the Kronecker product, this problem is then expressed as a set of linear equations that can be easily solved. However, the Kronecker product formulation generates matrices of large size, and this procedure presents a high computational burden that increases rapidly with an increase in the system dimension.
Another important issue in the existing learning-based control literature is the selection of a proper persistently exciting (PE) input. In most of the above literature, heuristic approaches for persistence of excitation are employed, often designing exciting inputs by adding sinusoidal, exponential and/or random signals [14]. A different approach for persistence of excitation was studied in [16], where conditions for the design of a discrete-time PE input are formally established. It is shown in [16] that their definition of persistence of excitation provides data measurements that are so rich in information that every possible trajectory of a controllable discrete-time linear system can be expressed in terms of such data. This result is now known as Willems’ lemma, and has been successfully used in recent years in data-based analysis, estimation and control of discrete-time systems (see, e.g., the survey [17] and the references therein). In [6], it was proposed to use a PE signal as defined in [16] to excite a continuous-time system during a Q-learning procedure, which guarantees solvability of their policy evaluation step. However, the method in [6] is an on-policy algorithm and the authors require persistence of excitation of a signal composed of both the input and the state of the system. This contrasts with our objective of considering a PE signal in terms of the input only. Moreover, in [6] a high order of persistence of excitation is needed.
The contributions of this paper are as follows. We propose a novel data-based off-policy RL algorithm to solve the LQR problem for continuous-time systems. As in [8], we perform the policy evaluation and policy improvement steps simultaneously. Different from the existing algorithms, we formulate a Sylvester-transpose equation that can be efficiently solved using known methods [18, 19, 20]. This avoids the use of the Kronecker product and the ensuing large matrices in our computations. Moreover, we use the results in [21], where a continuous-time version of Willems’ lemma was proposed. This allows us to design a PE input that guarantees the solvability of the Sylvester-transpose equation in a data-based fashion. In our formulation, persistence of excitation depends only on the input of the system, and we require the use of a PE input of lower order compared to [6]. Finally, we propose a method to determine the required initial stabilizing policy for the proposed algorithm using only measured data. Different from [7], this method does not require the solution of linear matrix inequalities (LMIs).
In the following, Section II introduces the preliminary results that are used throughout the paper. The development of the proposed efficient RL algorithm and its theoretical analysis are shown in Section III. Section IV analyses the computational efficiency of the proposed algorithm and presents a procedure to compute the initial stabilizing gain. In Section V, we illustrate the theoretical results with numerical examples, and Section VI concludes the paper.
II PRELIMINARIES
In this section, we present existing results from the literature that are relevant for the remainder of this paper.
II-A Matrix definitions for continuous-time data
Consider the integer and the positive scalar . Let , with , denote a continuous-time signal of length . Using the trajectory , we define the following matrix
(1) |
for . Notice that (1) is a time-varying matrix defined on the interval .
Now, consider the following CT-LTI system
(2) |
where and are the state and input vectors of the system, respectively. The pair is assumed to be controllable throughout the paper.
Suppose that the input signal is applied to (2), and the resulting state trajectory is collected. From (2) and the definition in (1), we can write
Since it is unusual to have the state derivative available as a measurement, integrate the expression above to obtain
For convenience of notation, define the matrices
(3) |
Notice that the matrix (and similarly ) only requires the computation of integrals of the form , . This is simpler than the integrals computed in the existing RL literature [8, 6, 7].
By definition, the following expression holds
(4) |
II-B Persistence of excitation for discrete-time systems
Define the integer constants . The Hankel matrix of depth of a discrete-time sequence , , is defined as
In [16], the following definition of a PE input for discrete-time systems is made.
Definition 1
The discrete sequence , , is said to be persistently exciting of order if its Hankel matrix of depth has full row rank, i.e.,
(5) |
It is important to highlight the fact that Definition 1 provides a condition that enables a straightforward design of a PE input and that is easy to verify for any discrete sequence.
Remark 1
A necessary condition for (5) to hold is that . This provides a minimum length for a PE input sequence.
II-C Persistence of excitation for continuous-time systems
It is shown in [21] that a piecewise constant input designed by exploiting Definition 1 is persistently exciting for the continuous-time system (2). This class of inputs is formally described in the following definition.
Definition 2 (Piecewise constant PE input)
Consider a time interval such that
(6) |
where and are any two eigenvalues of matrix in (2), and is the imaginary part of a complex number. A piecewise constant persistently exciting (PCPE) input of order for continuous-time systems is defined as for all , , where is a sequence of constant vectors that is persistently exciting of order in the sense of Definition 1.
Remark 2
When a PCPE input is applied to system (2), the obtained input-state data set satisfies an important rank condition, as shown below.
Lemma 1 ([21])
II-D The LQR problem and Kleinman’s algorithm
For a CT-LTI system (2), the infinite-horizon LQR problem concerns determining the control input that minimizes a cost function of the form
(8) |
where and . Throughout the paper, we assume that the pair is observable. This, together with the assumed controllability of , implies that the optimal control input is given by , where
and the matrix solves the algebraic Riccati equation
In [22], Kleinman proposed a model-based iterative algorithm to solve the LQR problem. This algorithm starts by selecting an initial stabilizing matrix , i.e., a matrix such that is Hurwitz stable. At every iteration , the Lyapunov equation
(9) |
is solved for . Then, a new feedback matrix is defined as
(10) |
The algorithm iterates the equations (9) and (10) until convergence. With the main drawback of being a model-based method, Kleinman’s algorithm otherwise possesses highly attractive features. Namely, at each iteration the matrix is stabilizing, the algorithm converges such that
and convergence occurs at a quadratic rate [22].
The following section presents the main developments of this paper.
III AN EFFICIENT DATA-BASED ALGORITHM FOR THE CT LQR PROBLEM
In this section, we present an efficient data-based off-policy RL algorithm to determine the optimal controller that minimizes (8). We show that the proposed procedure is equivalent to Kleinman’s algorithm (9)-(10), and therefore preserves all of its theoretical properties. For the clarity of exposition, we introduce first a model-based algorithm that is then used as the basis of our data-based method.
III-A A model-based algorithm
Combining (9) and (10), we readily obtain the following expressions
and . Therefore, the matrix equation
(11) |
holds, where is an identity matrix and represents a matrix of zeros with appropriate dimensions.
Denoting the fixed matrices as
(15) | |||
(18) |
and the unknown matrix as
(19) |
we can write (11) in the compact form
(20) |
The matrix consists of the unknown matrices in Kleinman’s algorithm, and . It is of our interest to design a method in which solving a matrix equation as in (20) for corresponds to solving both (9) and (10) simultaneously. However, it can be noted that (20), as it is formulated, in general does not have a unique solution . To address this issue, first express the unknown submatrices of as
(21) |
with and . In the following lemma, we show that there exists only one matrix that solves (20) such that the submatrix is symmetric.
Lemma 2
Proof:
Considering the partition in (21), notice that (20) holds for any matrix such that
and
From the second equation it is clear that . Substituting this and the fact that in the first equation, we get
(22) |
Since is stabilizing, we use Lyapunov arguments to conclude that (and therefore also ) is unique. ∎
Lemma 2 implies that constraining the solution of (20) to include a symmetric submatrix leads to the desired solution (19). The following lemma shows that we achieve this by properly modifying in (18).
Lemma 3
Proof:
First, define the matrix
Using this definition, it is straightforward to express (23) in terms of the matrix in (18) as
Notice that the left-hand side of this expression is symmetric, and therefore so must be . Now, is symmetric if and only if , that is, . This implies both that the solution of (23) also solves (20) and, by Lemma 2, that this solution is unique. ∎
Remark 4
Using this result, we formulate Algorithm 1 below. As in any policy iteration procedure, Algorithm 1 is initialized with a stabilizing matrix . Using this matrix (as well as model knowledge), (23) is solved for . Then, partitioning as in (21), a new feedback matrix is obtained as .
Using the results obtained so far, we conclude that Algorithm 1 is equivalent to Kleinman’s algorithm in the sense that, starting from the same initial matrix , they provide the same updated policies at every iteration. This implies that Algorithm 1 preserves all the properties of Kleinman’s algorithm. In the following, we use this result to design a data-based algorithm.
III-B The data-based algorithm
To avoid the need for model knowledge in Algorithm 1, we collect persistently excited data from the system (2) as described in Section II-C. Using this data, we define the constant matrices , and as in (3).
Lemma 1 showed that the collected data set satisfies the rank condition (7). In the following lemma, we extend this result to the matrices and .
Lemma 4
Proof:
Notice that, since the applied input is piecewise constant, an expression for the resulting state of (2) is
for and . Thus, we can write
Notice that is nonsingular since the condition (6) holds (the fact that is nonsingular follows from the fact that corresponds to a non-pathological sampling time [23]). Moreover, by Lemma 1 the second matrix on the right-hand side has full row rank, completing the proof. ∎
Define . Since has full row rank by Lemma 4, we can select linearly independent columns from it. Let represent the th column of , and let be a set of indices such that
(26) |
is a nonsingular matrix. Then, is a solution of (23) if and only if it is a solution of
(27) |
From the definitions in (18) and (24), and using the expression (4), we have the following
and , where the subindex represents a matrix constructed using the columns specified by the set from the corresponding original matrix. Substituting in (27), we obtain
(28) |
where
(29) | ||||
Now, (28) is a data-based equation that does not require any knowledge about the system model. Algorithm 2 uses this expression to solve the LQR problem. For convenience, for Algorithm 2 we define
(30) |
Algorithm 2: Data-based RL algorithm
The following theorem states the main properties of this algorithm.
Theorem 1
Proof:
The proof is obtained by showing that Algorithm 2 is equivalent to Kleinman’s algorithm at every iteration. First, notice that by Lemma 4, the matrix has full row rank and, therefore, a nonsingular matrix can always be constructed. This means that (31) is equivalent to (23). Now, noting that is stabilizing, use an induction argument to assume that is stabilizing. Lemma 3 shows the existence and uniqueness of from (23). Moreover, the expression (22) in the proof of Lemma 2 shows that , where is the solution of the Lyapunov equation (9). Also in the proof of Lemma 2 it was shown that , which now corresponds to Kleinman’s updated gain (10). Therefore, Algorithm 2 is equivalent to Kleinman’s algorithm and shares all of its properties [22]. ∎
Algorithm 2 is a purely data-based, off-policy method to solve the continuous-time LQR problem. Using Definition 2, we are able to guarantee the existence of a solution of (31) at every iteration for data trajectories of fixed length. This contrasts with the methods in the literature that must keep collecting data until a matrix gets full rank, such as, e.g., [7, 8]. Moreover, we avoid the use of the Kronecker product and its resulting large matrices in Algorithm 2. As stated in Remark 4, methods to efficiently solve a Sylvester-transpose equation as in (31) are well known.
Remark 5
Step 4 of Algorithm 2 instructs to select linearly independent columns of . This step is performed in benefit of efficiency, as it decreases the size of the matrices in (31). However, since has full row rank, skipping this step in Algorithm 2 and using the complete data matrices instead does not affect the result at each iteration.
IV PRACTICAL CONSIDERATIONS
IV-A Efficiency analysis of Algorithm 2
In this subsection, we analyze the theoretical computational complexity of Algorithm 2. Moreover, we compare this complexity with that of the algorithm proposed in [8]. This is because [8] is also an off-policy data-based method that shares many of the characteristics of Algorithm 2.
The most expensive steps in Algorithm 2 are obtaining the solution of (31) and selecting linearly independent vectors from . Methods to solve the Sylvester-transpose equation (31) with a complexity of are known [19]. The selection of linearly independent vectors can be performed using a simple procedure like Gaussian elimination to transform the matrix of interest into row echelon form. This method has a complexity of operations [24]. This step, however, only needs to be performed once in Algorithm 2 (in Step 4). Thus, we conclude that Algorithm 2 requires once and then in each iteration floating point operations.
The algorithm in [8] was also shown to be equivalent to Kleinman’s algorithm at every iteration. However, their method uses a Kronecker product formulation that yields matrices of large dimensions. Let be the amount of data samples used in [8]. Then, the most expensive step at each iteration of their algorithm is the product of a matrix with dimensions times its transpose. This product, and hence each iteration of the algorithm, requires floating point operations [25]. Clearly, as the dimension of the system increases, the difference in performance of both algorithms becomes more significant. Moreover, we notice from [8] that the amount of collected data must satisfy for the algorithm to yield a unique solution at every iteration. Compare this with the bound in Algorithm 2. In Section V, we test this theoretical comparison using numerical examples.
IV-B An initial stabilizing policy
In [26, Remark 2], a procedure to design a stabilizing controller for continuous-time systems using only measured data was described. This method is based on the solution of a linear matrix inequality (LMI). The authors in [7] proposed to use a similar LMI-based procedure to determine the initial stabilizing gain for a Q-learning algorithm. Since one of the goals in this paper is computational efficiency, we would like to avoid the computationally expensive step of solving an LMI. In this subsection, we present an alternative method to determine the initial stabilizing matrix for Algorithm 2. The following development follows closely a procedure proposed in [27, Section IV] for discrete-time systems.
Let be the Moore-Penrose pseudoinverse of the matrix in (3). Since has full row rank (see Lemma 4), is a right inverse of . Furthermore, let be a basis for the null space of , such that for any matrix of appropriate dimensions. Using the matrices , and from (3), we propose to compute the initial stabilizing gain for Algorithm 2 as
(32) |
where is a matrix to be determined.
From (4) and (32), notice that
Therefore, by designing the poles of the matrix , we also set the poles of . Since is controllable and hence the poles of can be assigned arbitrarily, also the poles of can be placed arbitrarily by a suitable choice of . Moreover, since , and are matrices obtained from data, we can operate with them without any need of model knowledge. This procedure is summarized in the following theorem. The proof of this theorem is straightforward considering the procedure described in this subsection and is hence omitted.
Theorem 2
Let the matrices , and be defined as in (3) using data collected from (2) during the application of a PCPE input of order . Define as the Moore-Penrose pseudoinverse of and as a basis for the null space of . Moreover, define the virtual system matrices and . Using pole-placement methods, determine a matrix such that is Hurwitz. Then, the matrix defined by (32) is stabilizing for system (2).
Remark 6
Notice that the matrices and in Theorem 2 do not correspond to the actual system matrices and . In fact, and in general do not have the same dimensions. No model identification is performed in the proposed procedure.
V NUMERICAL EXPERIMENTS
In this section, we compare in simulation the efficiency of the proposed Algorithm 2 with that of the algorithm presented in [8]. As described above, these algorithms have the same characteristics: they are data-based off-policy methods that are equivalent to Kleinman’s algorithm at every iteration.
To compare the efficiency of both algorithms, several simulations are performed for different, randomly generated linear systems (2). In particular, 100 different linear systems are generated using the command rss in Matlab, and both algorithms are applied to each of them. The system dimensions considered for each set of 100 experiments are and . In every case, we consider single input systems (), and we define the cost function (8) with and .
Each implementation of Algorithm 2 had the following characteristics. A PCPE input as in Definition 2 was used to collect data from the system. A sample of data was collected every time units. We considered a time interval of , and we collected data for a total of time units, with . The method described in [18] was used to solve the Sylvester-transpose equation (31) at every iteration.
For the implementation of the Kronecker product-based method in [8], we followed the same simulation characteristics described in the simulation section of that paper. The only exception is in the amount of data collected, which was reduced for small system dimensions in order to make a fairer comparison.
Finally, notice that the command rss in Matlab yields stable systems. Thus, an initial stabilizing matrix of was used for all experiments and both algorithms. The simulations were performed using Matlab R2020b on an Intel i7-10875H (2.30 GHz) with 16 GB of memory.
The results of our simulations are displayed in Table I. In this table, we refer to Algorithm 2, which is based on the solution of a Sylvester-transpose equation, as ‘SYL’. The algorithm in [8] that is based on the use of the Kronecker product is denoted as ‘KRO’. To compare the computational efficiency of the methods, we present the average time that it takes the algorithms to complete 10 iterations. Due to their quadratic rate of convergence, 10 iterations yield a very accurate result of the optimal control gain for both algorithms. In the table we can observe a confirmation of our theoretical analysis regarding the improved performance of Algorithm 2.
Dimension | Average time (sec) | |
---|---|---|
SYL | KRO | |
During the execution of these experiments, we noted some issues in the performance of both methods when applied to systems of large dimensions. First, Algorithm 2 requires the application of a solver from the literature to solve (31). We found that, if the data matrix in Algorithm 2 has a large condition number, the solvers considered often failed to provide the correct result. To address this problem, methods to construct a matrix with low condition number from a larger matrix could be considered. Regarding the algorithm in [8], determining a proper input in order to satisfy the required persistence of excitation condition for the collected data (compare the discussion in the Introduction) becomes ever more difficult as the dimension of the system increases. In this case, it is uncertain how to solve this issue.
VI CONCLUSIONS
In this paper, a computationally efficient algorithm was proposed to solve the continuous-time LQR problem. The proposed algorithm is equivalent to Kleinman’s method, it does not require any knowledge from the system model and it requires collecting data from the system only once. We presented a persistently exciting input that guarantees that the matrix equation (31) in our algorithm has a unique solution at every iteration. Finally, we showed a method to determine an initial stabilizing feedback matrix using only measured data and that does not require to solve LMIs. Simulation results show that our algorithm significantly improves the performance of an algorithm with similar properties in the literature.
References
- [1] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA: MIT Press, 2018.
- [2] D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed. Belmont, MA: Athena Scientific, 2005, vol. I.
- [3] D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, pp. 477–484, 2009.
- [4] K. G. Vamvoudakis, “Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach,” Systems & Control Letters, vol. 100, pp. 14–20, 2017.
- [5] H. Mohammadi, M. Soltanolkotabi, and M. R. Jovanovic, “Random search for learning the linear quadratic regulator,” in 2020 American Control Conference (ACC), 2020, pp. 4798–4803.
- [6] J. Y. Lee, J. B. Park, and Y. H. Choi, “Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems,” Automatica, vol. 48, no. 11, pp. 2850–2859, 2012.
- [7] C. Possieri and M. Sassano, “Q-learning for continuous-time linear systems: A data-driven implementation of the Kleinman algorithm,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 10, pp. 6487–6497, 2022.
- [8] Y. Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, pp. 2699–2704, 2012.
- [9] S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” in Proceedings of 1994 American Control Conference, vol. 3, 1994, pp. 3475–3479.
- [10] M. Fazel, R. Ge, S. M. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in Proceedings of the 35th International Conference on Machine Learning, 2018, pp. 1–10.
- [11] V. G. Lopez, M. Alsalti, and M. A. Müller, “Efficient off-policy Q-learning for data-based discrete-time LQR problems,” IEEE Transactions on Automatic Control, pp. 1–12, 2023.
- [12] Y. Yang, B. Kiumarsi, H. Modares, and C. Xu, “Model-free -policy iteration for discrete-time linear quadratic regulation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 2, pp. 635–649, 2023.
- [13] B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042–2062, 2018.
- [14] Z.-P. Jiang, T. Bian, and W. Gao, “Learning-based control: A tutorial and some recent results,” Foundations and Trends in Systems and Control, vol. 8, no. 3, pp. 176–284, 2020. [Online]. Available: http://dx.doi.org/10.1561/2600000023
- [15] D. Liu, S. Xue, B. Zhao, B. Luo, and Q. Wei, “Adaptive dynamic programming for control: A survey and recent advances,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 1, pp. 142–160, 2021.
- [16] J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,” Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005.
- [17] I. Markovsky and F. Dörfler, “Behavioral systems theory in data-driven analysis, signal processing, and control,” Annual Reviews in Control, vol. 52, pp. 42–64, 2021.
- [18] M. Hajarian, “Extending the CGLS algorithm for least squares solutions of the generalized Sylvester-transpose matrix equations,” Journal of the Franklin Institute, vol. 353, no. 5, pp. 1168–1185, 2016, special Issue on Matrix Equations with Application to Control Theory.
- [19] K. Tansri, S. Choomklang, and P. Chansangiam, “Conjugate gradient algorithm for consistent generalized Sylvester-transpose matrix equations,” AIMS Mathematics, vol. 7, no. 4, pp. 5386–5407, 2022.
- [20] C. Song, G. Chen, and L. Zhao, “Iterative solutions to coupled Sylvester-transpose matrix equations,” Applied Mathematical Modelling, vol. 35, no. 10, pp. 4675–4683, 2011.
- [21] V. G. Lopez and M. A. Müller, “On a continuous-time version of Willems’ lemma,” in 2022 IEEE 61st Conference on Decision and Control (CDC), 2022, pp. 2759–2764.
- [22] D. Kleinman, “On an iterative technique for Riccati equation computations,” IEEE Transactions on Automatic Control, vol. 13, no. 1, pp. 114–115, 1968.
- [23] C.-T. Chen, Linear System Theory and Design, 3rd ed. New York, USA: Oxford University Press, 1999.
- [24] G. Strang, Linear Algebra and its Applications, 4th ed. Belmont, USA: Thomson Brooks/Cole, 2006.
- [25] R. Hunger, “Floating point operations in matrix-vector calculus,” Tech. Rep., 2007. [Online]. Available: https://mediatum.ub.tum.de/doc/625604/625604
- [26] C. De Persis and P. Tesi, “Formulas for data-driven control: Stabilization, optimality, and robustness,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 909–924, 2020.
- [27] H. J. van Waarde, J. Eising, H. L. Trentelman, and M. K. Camlibel, “Data informativity: A new perspective on data-driven analysis and control,” IEEE Transactions on Automatic Control, vol. 65, no. 11, pp. 4753–4768, 2020.