This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Does Optimal Control Always Benefit from Better Prediction? An Analysis Framework for Predictive Optimal Control

Xiangrui Zeng zeng@hust.edu.cn Cheng Yin yinchenghust@hust.edu.cn Zhouping Yin yinzhp@hust.edu.cn State Key Laboratory of Intelligent Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China The China-EU Institute for Clean and Renewable Energy, Huazhong University of Science and Technology, Wuhan, Hubei, China
Abstract

The “prediction + optimal control” scheme has shown good performance in many applications of automotive, traffic, robot, and building control. In practice, the prediction results are simply considered correct in the optimal control design process. However, in reality, these predictions may never be perfect. Under a conventional stochastic optimal control formulation, it is difficult to answer questions like “what if the predictions are wrong”. This paper presents an analysis framework for predictive optimal control where the subjective belief about the future is no longer considered perfect. A novel concept called the hidden prediction state is proposed to establish connections among the predictors, the subjective beliefs, the control policies and the objective control performance. Based on this framework, the predictor evaluation problem is analyzed. Three commonly-used predictor evaluation measures, including the mean squared error, the regret and the log-likelihood, are considered. It is shown that neither using the mean square error nor using the likelihood can guarantee a monotonic relationship between the predictor error and the optimal control cost. To guarantee control cost improvement, it is suggested the predictor should be evaluated with the control performance, e.g., using the optimal control cost or the regret to evaluate predictors. Numerical examples and examples from automotive applications with real-world driving data are provided to illustrate the ideas and the results.

keywords:
Model predictive control, optimal control, data-based control
journal: ISA Transactions

1 Introduction

In many automotive control[1][2], traffic control [3], robot control [4] and building control [5] applications, optimal control decisions need to be made in the presence of an uncertain future. This uncertain future is usually caused by complicated human behaviors or highly complex environment systems, and it can have a relative large impact on the control system performance[6]. The control policies in these applications need to be adjusted according to different potential future scenarios[7]. A common way to handle this is to use a predictor to forecast the future, and then apply the optimal controller with respect to this forecasted future[8]. The “prediction + optimal control” scheme has shown good performance in practice. There have also been theoretical results showing that predictions for certain control problems are beneficial[9]. In this paper, we refer to this type of control method as predictive optimal control.

Predictive optimal control is closely related to model predictive control (MPC). MPC was originally used to handle constraints to achieve recursive feasibility[10]. In some MPC applications, the prediction stages can forecast certain future external signal values that impacts the system dynamics[11]. The word external here means that this signal is neither a state nor an output of the system to be controlled. In this paper, we call this external signal the generalized disturbance. For our problems of interest, we use the phrase predictive optimal control instead of MPC, mainly because we want to to emphasize that a future generalized disturbance has to be forecasted, and the major goal of the control is to minimize a certain cost. Meanwhile, in this predictive optimal control framework, the prediction horizon and prediction update frequency is flexible. This still fits the general MPC framework, but it may be different from the commonly-used receding horizon MPC.

In many predictive optimal control applications, the future to be predicted has intrinsic uncertainties[12]. The to-be-predicted generalized disturbance may be future human maneuvers or ambient factors such as the temperature[13][14]. The future values of these signals are uncertain at the time of the forecast. In practice, the prediction results may be in either stochastic or deterministic forms[15]. Many applications simply use deterministic predictions as it is easier to compute its corresponding optimal solution[16]. With more data and higher computing power, probabilistic forecast, also called stochastic prediction, is drawing more and more attentions from the control community. Probabilistic forecast has been used in applications such as weather forecast[17]. Different measures for evaluating stochastic predictions have been developed [18][19]. There are techniques in Bayesian decision theory[20] [21] and stochastic MPC [22][23] utilizing a stochastic prediction for decision-making and control.

Since predictive optimal control has shown good performance, naturally, we are interested in investigating how we should design predictors and improve predictors. One important question is whether we can decouple the predictor design and the control system design. In practice, we sometimes put a lot of efforts into improving the predictor, and hope this can lead to better predictive optimal control performance. In this paper, the control performance is measured by the cost function defined in the optimal control. When we try to improve the control by improving the prediction, the underlying assumption is that there is a monotonic relationship (like the one shown in Fig.1 (a)) between the predictor error and the optimal control cost. However, in literatures, this assumption is rarely verified. Actually, as we will show in this paper, the relation between the predictor and the optimal control performance is complicated. Our theoretical analysis and numerical examples show that the relationship can be like Fig.1 (b), which means as the predictor improves, the optimal control cost may get worse.

Refer to caption

Figure 1: We sometimes assume a monotonic relationship like (a) between the predictor error and the optimal control cost. However, it is possible that the relationship is like (b), which means when the predictor improves, the control cost may get worse.

The motivation of this paper is to provide tools to build connections between the predictor and the optimal control performance. Prediction of a complicated process involving human behaviors or complex systems may never be perfect. We need to be able to analyze the predictive optimal control system while admitting that the prediction may be wrong. To thoroughly analyze this problem, we need to build an analysis framework describing the predictive optimal control process.

Build a rigorous analysis framework for predictive optimal control is a nontrivial task for the following three reasons. First, while it is easy to tell if a deterministic prediction is wrong, how we should handle the stochastic prediction case and define a truth is not obvious, especially in scenarios where we cannot collect data repeatedly (i.e. it is almost impossible to re-create a driving scene with exactly the same environment, the same surrounding vehicles and drivers with the same status). Second, the “prediction + optimal control” scheme usually runs in a dynamic way with receding horizons, which means new prediction may override previous ones after an update, and new control policies may update accordingly. This adds up to the complexity. Third, the predictive optimal control involves a subjective predictor and an objective optimal control cost, which brings difficulties in notations. The complexities in describing the prediction performance and control performance are also somewhat related to the Bayesian vs. frequentist discrepancy in statistics. The predictor usually relies on some a priori assumptions, and the predicted probability distribution may be a subjective belief, which can be considered a Bayesian approach. However, the control performance to be optimized is the objective long-run cost, which fits the frequentists’ point of view. In order to analyze predictive optimal control problems, we need to consider some perspectives from the Bayesians and frequentists simultaneously in one framework.

The tricky relationship between prediction performance and optimization performance has been noticed by some researchers, and it has been analyzed from data’s point of views. In “machine-learning-based prediction + optimization” problems, there is a growing interest in decision-focused learning, which uses loss functions measuring the optimization results in the upstream machine learning prediction model training[24][25][26]. For most dynamic system predictive optimal control applications, the predictor design is still separated from the optimal control process. Meanwhile, there are many efforts on robust MPC focusing on generating good control policies despite imperfect predictions[22][27]. However, formulations which can describe impact of wrong prediction for dynamic systems has not yet been reported in literatures.

In stochastic predictive optimal control, we need to deal with two types of descriptions of the future generalized disturbance: one predicted subjective probability distribution (which we call the belief), and one “true” probability distribution (which will be precisely defined later). In this paper, we focus on two questions:

  • 1.

    Q1 (What): What is the proper framework that describes the relationships among the predicted probability distribution, the (to-be-defined) “true” probability distribution, the control performance, and other elements in predictive optimal control?

  • 2.

    Q2 (How): With limited data and generally-unknown “true” probability distribution, how should we evaluate the predictors which generate the subjective probability distributions?

We believe Q1 has been answered in this paper, and a general answer to Q2 is provided. More investigations of specific types of predictive optimal control will be needed for a comprehensive answer to Q2 in future studies.

In this paper, we presents a framework describing the relationship among the elements of predictive optimal control that can be used to analyze the impact of wrong predictions. We incorporate a stochastic environment model and define a new concept called the hidden prediction state to connect the subjective belief and the objective truth. Both the single-observation fixed-end-horizon case and the updating-observation receding-horizon case are considered. Then we use this framework to consider the predictor evaluation problem, in the practical case with limited data availability. Three commonly-used predictor evaluation measures, including the mean squared error, the regret and the log-likelihood, are discussed. We show that a better predictor with respect to the mean squared error or the log-likelihood may actually lead to worse control performance, and this may happen even if the predictor is arbitrarily close to the global optimal. Evaluating the predictor along with the control performance, such as using the control cost or the regret measure, can avoid this. The results are illustrated in numerical examples and simulation examples from automotive applications.

The paper is structured as follows. Section 2 provides the general problem formulation of predictive optimal control. Section 3 presents an analysis framework for predictive optimal control with a environment model. Section 4 discusses predictor evaluation measures. In Section 5, we use the proposed framework to analyze the relationship between the predictor performance and the control performance. Section 6 provides examples to illustrate the ideas and the results. Section 7 concludes this paper.

2 The Predictive Optimal Control Problem

2.1 The Optimal Control Problem

We consider the following discrete-time dynamic system

xk+1=f(xk,wk,uk),x_{k+1}=f(x_{k},w_{k},u_{k}), (1)

where xdnx\in\mathbb{R}^{d_{n}}, wdlw\in\mathbb{R}^{d_{l}}, and udmu\in\mathbb{R}^{d_{m}}. The state xkx_{k} and the generalized disturbance wkw_{k} can be directly measured at step kk. ww is usually considered as the output of a complex system, which is called the environment. uku_{k} is the control input.

This dynamic system represents the physical system to be controlled, such as a vehicle, a robot, a machine, or a building. The measured generalized disturbance ww may be the human input such as the pedal position and steering of the vehicle, or the ambient factors such as surrounding traffic behavior or the temperature. We assume that ww has a relatively large impact on the dynamics and the cost. Therefore, when we design the control uu, we want to consider the impact of ww.

We consider a finite-horizon problem of NN steps. The cost JJ in this finite horizon is the sum of running costs from step 0 to step N1N-1 and the terminal cost,

J(x0,w0,u0,,xN1,wN1,un1,xN)\displaystyle J(x_{0},w_{0},u_{0},...,x_{N-1},w_{N-1},u_{n-1},x_{N}) (2)
=\displaystyle= k=0N1lk(xk,wk,uk)+lN(xN).\displaystyle\sum_{k=0}^{N-1}l_{k}(x_{k},w_{k},u_{k})+l_{N}(x_{N}).

The goal is to find a policy u=π()u=\pi(\cdot) to minimize the cost JJ.

If the complete disturbance sequence w¯=[w0,w1,,wN1]\bar{w}=[w_{0},w_{1},...,w_{N-1}] (the brackets here mean an ordered sequence) is known at step 0, the problem can be solved as a deterministic optimal control problem. Many tools such as dynamic programming and Pontryagin’s minimum principle can be used to solve it.

However, in practice we usually do not know w¯\bar{w} in advance. In our problem, wkw_{k} is unknown before step kk. If we do not know the future values of ww, we can no longer simply apply the deterministic optimal control. Instead, we can consider this sequence w¯\bar{w} as a stochastic signal. It can be represented by a random matrix of dimension l×Nl\times N, or equivalently, a random vector of dimension lNlN.

The control objective is to find a feedback policy u=π()u=\pi(\cdot) that uses available information to minimize the cost, or more rigorously, a certain expectation of the cost. In the context of optimal control in this paper, better control performance means a smaller cost expectation.

2.2 Components of Predictive Optimal Control

Refer to caption

Figure 2: A typical predictive optimal control structure. A predictor generates a belief of the future based on observations of the environment. The optimal controller decides the control input based on the belief.

To handle the future uncertainties of the disturbance sequence w¯\bar{w}, it is common to use a predictor to forecast it. We consider the general case where the forecast result is a discrete or continuous probability distribution of w¯\bar{w}. If one specific disturbance sequence is forecasted instead of a probability distribution, we may consider it as a distribution with a one or near-one probability at this specific sequence, and zero or near-zero probability at all other sequences. We call this probability distribution our belief.

Definition 1 (Belief).

A belief is a subjective probability distribution of the future disturbance sequence w¯\bar{w}.

We use W¯b\bar{W}_{b} to denote this probability distribution of w¯\bar{w}. W¯b𝑾¯\bar{W}_{b}\in\bm{\bar{W}}, where 𝑾¯\bm{\bar{W}} is the set of all possible probability distributions of w¯\bar{w}. We write w¯W¯b\bar{w}\sim\bar{W}_{b}, which means w¯\bar{w} follows the distribution W¯b\bar{W}_{b}. When there is no ambiguity, we do not distinguish the disturbance sequence probability distribution W¯b\bar{W}_{b} and its data representation, which may be a high-dimensional vector to represent a probability mass function, or a vector of parameters for a probability density function.

To obtain a belief W¯b\bar{W}_{b}, we need to observe the environment for necessary information. We assume that o𝕆o\in\mathbb{O} is the observation from the environment. The observation oo may be in the form of sensor readings, images, videos, or data received via communication, and their histories. We can define the concept of predictors as follows.

Definition 2 (Predictor).

A predictor 𝒫:𝕆𝑾¯\mathcal{P}:\mathbb{O}\rightarrow\bm{\bar{W}} is a mapping from an observation to a belief.

With the belief generated by the predictor, we can compute the optimal control. As we believe our belief, the expected cost to be minimized can be defined as

𝔼J(x0,w0,u0,,xN1,wN1,un1,xN),\displaystyle\mathbb{E}\,J(x_{0},w_{0},u_{0},...,x_{N-1},w_{N-1},u_{n-1},x_{N}), (3)

where our belief tells us that w¯W¯b\bar{w}\sim\bar{W}_{b} and W¯b=𝒫(o)\bar{W}_{b}=\mathcal{P}(o). We use πW¯b()\pi_{\bar{W}_{b}}(\cdot) to denote the optimal policy that minimizes the above expected cost when w¯W¯b\bar{w}\sim\bar{W}_{b}. This optimal control problem is well-defined, though computing the exact optimal policy πW¯b()\pi_{\bar{W}_{b}}(\cdot) may be challenging if W¯b\bar{W}_{b} is complicated. In practice, an approximated solution is usually applied [28].

The predictive optimal control structure shown in Fig. 2 has been used in many control applications: first using a predictor to process the observation oo to obtain a belief W¯b\bar{W}_{b}, then computing (or approximately computing) the corresponding optimal control policy u=πW¯b()u=\pi_{\bar{W}_{b}}(\cdot), finally applying the control to the dynamic system in (1). The conventional predictive optimal control formulation in practice just stops here, and it cannot provide further insights about the “prediction + optimal control” scheme. It assumes that W¯b\bar{W}_{b} is correct, so we use it in our optimal control design. However, since our belief W¯b\bar{W}_{b} is estimated, in most cases, it is actually different from the truth.

2.3 What If the Beliefs Are Wrong

In real-world applications, usually we know that our belief is imperfect. This brings up a lot of interesting questions. In the rest of this paper, we will focus on the case where we know our belief is not perfect, and we would like to find out how it impacts the control performance, and how we can improve our it.

The true probability distribution of the future disturbance sequence is denoted by W¯t\bar{W}_{t}. We will define this true probability distribution in Section 3.2. For now, let us assume there exists a well-defined one.

If we are not sure whether our belief is true, we have a problem immediately: since we do not know the true distribution W¯t\bar{W}_{t}, we cannot evaluate the cost expectation in (3) in the sense that w¯W¯t\bar{w}\sim\bar{W}_{t}. Based on our imperfect belief W¯b\bar{W}_{b}, we can only obtain an optimized policy with respect to this imperfect belief. As W¯b\bar{W}_{b} is only a subjective probability distribution which is not directly associated with any actual random vectors defined in our formulation so far, we use the notation 𝔼^w¯W¯b(w¯)\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{b}}*(\bar{w}) to denote the believed expectation of a function (w¯)*({\bar{w}}) when the random vector w¯\bar{w} follows the believed distribution W¯b\bar{W}_{b}. We will stop using the ambiguous expectation notation in (3) for subject beliefs. Instead, the believed expected cost to be minimized is now defined as

𝔼^w¯W¯bJ(x0,w0,u0,,xN1,wN1,un1,xN).\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{b}}J(x_{0},w_{0},u_{0},...,x_{N-1},w_{N-1},u_{n-1},x_{N}). (4)

Under this notation, the true expectation can be re-written as

𝔼^w¯W¯tJ(x0,w0,u0,,xN1,wN1,un1,xN).\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{t}}J(x_{0},w_{0},u_{0},...,x_{N-1},w_{N-1},u_{n-1},x_{N}). (5)

Our optimal policy πW¯b()\pi_{\bar{W}_{b}}(\cdot) obtained using the imperfect belief W¯b\bar{W}_{b} minimizes (4), not (5). It initially seems that there is not much we can do if our best estimation is W¯b\bar{W}_{b}. However, a complete analysis framework will give us insights about the relationship between the predictor and the control performance, thus guide our predictor design.

3 An Analysis Framework

In this section, we present an analysis framework to consider the impact of imperfect predictions. We will first introduce the environment model with the hidden predictions state, then discuss what a true probability distribution is, and finally integrate recurrent prediction schemes with the presented model.

3.1 The Hidden Prediction State

We assume that the environment that generates the disturbance ww is a dynamic system which can be described by the following equations,

zk+1\displaystyle z_{k+1} =fE(zk,rk)\displaystyle=f_{E}(z_{k},r_{k}) (6)
ok\displaystyle o_{k} =hEo(zk,rk)\displaystyle=h_{Eo}(z_{k},r_{k})
wk\displaystyle w_{k} =hEw(zk,rk),\displaystyle=h_{Ew}(z_{k},r_{k}),

where zkz_{k} is the state of the environment system, r0r_{0}, r1r_{1}, …, rN1r_{N-1} are independent and identically distributed (i.i.d.) random disturbance, ok𝕆o_{k}\in\mathbb{O} is the measured output, which is also called the observation of the environment, and wkw_{k} is the output of the environment system that impacts our system to be controlled. Since we assume that the value of wkw_{k} is measured, oko_{k} contains all the information of wkw_{k}. From the point of view of the system to be controlled, wkw_{k} is the generalized disturbance. Since the environment is usually very complex, such as the human beings and the weather system, the dimensions of zz and rr can be very high and the exact formulas of (6) may never be known.

Refer to caption

Figure 3: The hidden prediction state ss in the environment model. ss contains all factors from the environment that impacting the predictive optimal control process. It completely determines the realized disturbance sequence w¯\bar{w} and the observation sequence o¯\bar{o}.

Let us define the hidden prediction state

s=[z0Tr0Tr1TrN1T],s=\begin{bmatrix}z_{0}^{T}&r_{0}^{T}&r_{1}^{T}&\ldots&r_{N-1}^{T}\end{bmatrix}, (7)

as shown in Fig. 3. According to the environment model in (6), the generalized disturbance sequence w¯\bar{w} can be completely determined by the hidden prediction state, which is a random vector, sSs\sim S. Therefore, we write w¯(s)\bar{w}(s) as w¯\bar{w} is uniquely determined when ss is given. The observations oo can also be completely determined by ss. Therefore, we write

ok(s)=hEo(zk,rk),o_{k}(s)=h_{Eo}(z_{k},r_{k}), (8)

and use the notations

o(s)\displaystyle o(s) =o0(s)=hEo(z0,r0)\displaystyle=o_{0}(s)=h_{Eo}(z_{0},r_{0}) (9)
o¯(s)\displaystyle\bar{o}(s) =[o0(s),o1(s),,oN1(s)],\displaystyle=[o_{0}(s),o_{1}(s),...,o_{N-1}(s)],

for short.

In general, ss cannot be directly observed at step 0 due to two reasons. First, the environment state z0z_{0} and environment disturbance r0r_{0} may not be estimated by only measuring o0o_{0}. Second, the i.i.d. random disturbances r1r_{1}, r2r_{2}, …, rN1r_{N-1} at future steps cannot be known at step 0.

Since there may be multiple ways building the environment model, the selection of the hidden prediction state ss is not unique. In predictive optimal control applications, it is not necessary to formulate the environment and define the hidden prediction state at all. However, the environment model in (6) and the definition of the hidden prediction state ss in (7) are the cores of the analysis framework, and they can help us in analyzing the problems when the predictions are not perfect.

3.2 The True Probability Distribution

In the previous section, we assume that there is a true probability distribution W¯t\bar{W}_{t}. Now let us define it with the environment model.

We consider the belief obtained at step 0. With the environment in (6), our belief W¯b\bar{W}_{b} is determined by applying the predictor 𝒫\mathcal{P} on the observation o(s)o(s), therefore we can write

W¯b=𝒫(o)=𝒫(o(s)).\bar{W}_{b}=\mathcal{P}(o)=\mathcal{P}(o(s)). (10)

When there is no ambiguity, we may also write 𝒫(s)\mathcal{P}(s) instead of 𝒫(o(s))\mathcal{P}(o(s)) for conciseness. We investigate multiple potential ways to define the true probability distributions.

3.2.1 The A Posteriori Truth

One may argue that we will be able to know the realized disturbance sequence w¯\bar{w} at step NN, so the true probability of the realized disturbance sequence is one, and the probabilities of all other disturbance sequence are zero. So for the belief W¯b=𝒫(o(s))\bar{W}_{b}=\mathcal{P}(o(s)), w¯(s)\bar{w}(s) is the a posteriori truth.

3.2.2 The A Priori Truth

One may argue that there are intrinsic uncertainties in ss as the future values of the i.i.d. environment disturbance [r1,r2,,rN1][r_{1},r_{2},...,r_{N-1}] can never be known at step 0. In a prediction, the true probability distribution should be determined by what has happened by the time of the prediction, not from the future. This probability distribution should be determined at step 0. Therefore for the belief W¯b=𝒫(o(s))\bar{W}_{b}=\mathcal{P}(o(s)), the truth is the following conditional probability distribution

w¯(S)|Sz0=sz0,Sr0=sr0,\bar{w}(S)\big{|}S_{z_{0}}=s_{z_{0}},S_{r_{0}}=s_{r_{0}}, (11)

where sz0s_{z_{0}}, sr0s_{r_{0}} are the z0z_{0} and r0r_{0} component of ss respectively. We call this a priori true probability distribution W¯tn(sz0,sr0)\bar{W}_{tn}(s_{z_{0}},s_{r_{0}}), or W¯tn(s)\bar{W}_{tn}(s) for short.

3.2.3 The Observable Truth

One may also argue that since our only observation of the environment is oo, the best estimation of the distribution should be limited not just by the time of the prediction, but also by the information we have. Therefore for the belief W¯b=𝒫(o(s))\bar{W}_{b}=\mathcal{P}(o(s)), the truth is the following conditional probability distribution

w¯(S)|o(S)=o(s).\bar{w}(S)\big{|}o(S)=o(s). (12)

We call this observable (not related to the observability in control theory) true probability distribution W¯to(o(s))\bar{W}_{to}(o(s)), or W¯to(s)\bar{W}_{to}(s) for short.

All the above three arguments about the true probability distributions make sense. As W¯to(o(s))\bar{W}_{to}(o(s)) is the observable probability distribution that could be learned given enough data, we consider it as the benchmarking truth in this paper. Making W¯b=𝒫(o)\bar{W}_{b}=\mathcal{P}(o) close to W¯to(o)\bar{W}_{to}(o) is an intuitive way of improving the prediction, but how to define the difference between W¯b\bar{W}_{b} and W¯to(o)\bar{W}_{to}(o) is an interesting question which will be discussed in Section 4.

3.3 Recurrent Predictions

In the predictive optimal control implementation under the structure in Fig. (2), the prediction runs recurrently at every control step. Depending on whether new observations are used after the initial step and how long the prediction horizon is, we consider three typical types of recurrent predictions.

3.3.1 Type I: No Subsequent Observations After Step 0

Refer to caption

Figure 4: Three typical types of recurrent predictions. (a) Type I: the only observation is obtained at step 0. (b) Type II: a new observation is obtained at every step. (c). Type III: the prediction covers a fixed-length receding horizon with new observations at every step.

Refer to caption

Figure 5: Impacters of the actual cost in predictive optimal control, when the predictor is given and the policy is optimal with respect to the belief. (a) In Type I, the actual cost is uniquely determined by ss through two parallel paths: the controller path (the observation-belief-policy path) and the physics path (the disturbance path). (b) Paths in Type II are similar to Type I. (c) In Type III, the actual cost is determined by ss and the artificial terminal cost in the controller path.

We first consider the case where all information we obtain from the environment is o0o_{0} and w0,w1,,wNw_{0},w_{1},...,w_{N}, which means that there are no subsequent observations after step 0. This type is commonly used when the process of obtaining the observation or forecasting is expensive, e.g., the observation is obtained in a pay-per-use way, or the initial forecasting computation takes a long time. At step 0, given a predictor 𝒫\mathcal{P}, our belief W¯b=𝒫(o(s))\bar{W}_{b}=\mathcal{P}(o(s)) is determined by the observation o(s)o(s). We denote this belief at step 0 as W¯b,0\bar{W}_{b,0}. At step 1, even though no new observation of o1o_{1} is available, w1w_{1} is still measured so the forecasted belief can be updated according to the value of w1w_{1}. This update is essentially computing a conditional probability given the original belief W¯b,0\bar{W}_{b,0} and w1w_{1}. In some context, this is also called a filtration process. The filtration is repeated at every step. We denote the updated belief at step k as W¯b,k\bar{W}_{b,k}. The filtration update is

W¯b,k=fWI(W¯b,k1,wk).\bar{W}_{b,k}=f_{W_{I}}(\bar{W}_{b,k-1},w_{k}). (13)

Each new belief contains the information of the probability distribution of the disturbance sequence from the current step to the terminal step, as shown in Fig. 4 (a).

Given a belief W¯b\bar{W}_{b}, the optimal policy πW¯b()\pi_{\bar{W}_{b}}(\cdot) minimizes the expected cost defined in (4). The policy πW¯b()\pi_{\bar{W}_{b}}(\cdot) can be obtained as a feedback control using all available information at each step. All available information includes the history of the measured state xx and the measured generalized disturbance ww. Since (1) tells us that when ww and uu are given, xx is Markovian. We can use the current xkx_{k} in the feedback, instead of using all the history of xx. In general, the generalized disturbance ww is not Markovian. Actually it can be considered as the output of a partially observable Markov decision process. Therefore, we need to include the history of ww in the control feedback. However, when the belief update is just a filtration, even if wkw_{k} is not Markovian, W¯b,k\bar{W}_{b,k} is Markovian. Therefore, we can use W¯b,k\bar{W}_{b,k} in the feedback, instead of the history of ww with the belief W¯b\bar{W}_{b}. The optimal policy πW¯b()\pi_{\bar{W}_{b}}(\cdot) can be computed recursively using dynamic programming. This feedback policy is in the form of

uk=πW¯b(k,xk,[w0,w1,,wk])=πW¯b,k(k,xk).u_{k}=\pi_{\bar{W}_{b}}(k,x_{k},[w_{0},w_{1},...,w_{k}])=\pi_{\bar{W}_{b,k}}(k,x_{k}). (14)

πW¯b()\pi_{\bar{W}_{b}}(\cdot) is completely determined by the initial belief W¯b{\bar{W}_{b}}, which means it is determined by the hidden prediction state ss along with the predictor 𝒫\mathcal{P}.

We use Jw¯(s)πW¯bJ^{\pi_{\bar{W}_{b}}}_{\bar{w}(s)} to denote the cost of system (1) with a fixed initial state when the disturbance sequence is w¯(s)\bar{w}(s) and the policy is the optimal policy πW¯b()\pi_{\bar{W}_{b}}(\cdot) obtained with the belief W¯b\bar{W}_{b} by minimizing (4). As shown in Fig. 5 (a), given a predictor 𝒫\mathcal{P}, this actual cost is completely determined by ss.

We assume that any disturbance sequence with a non-zero probability in W¯tn(s)\bar{W}_{tn}(s) has a non-zero probability in W¯b\bar{W}_{b}. With this assumption, the policy πW¯b()\pi_{\bar{W}_{b}}(\cdot) obtained using W¯b\bar{W}_{b} can be applied when the true disturbance sequence distribution is W¯tn(s)\bar{W}_{tn}(s). This is why we may want to consider a deterministic prediction as a distribution with a near-one (instead of one) probability at this specific sequence, and near-zero (instead of zero) probability at all other sequences.

3.3.2 Type II: A New Observation at Every Step

In Type II, a new observation is obtained at every step. Therefore, the belief W¯b,k\bar{W}_{b,k} is updated according to the new observation oko_{k} at every step. Each belief is still about the disturbance sequence from the current step to the terminal step, as shown in Fig. 4 (b). The belief is updated at every step using the new observation and the predictor,

W¯b,k=𝒫k(ok(s)),\bar{W}_{b,k}=\mathcal{P}_{k}(o_{k}(s)), (15)

where the predictor 𝒫\mathcal{P} may be dependent on kk. We denote the sequence of beliefs [W¯b,0,W¯b,1,,W¯b,N1][\bar{W}_{b,0},\bar{W}_{b,1},...,\bar{W}_{b,N-1}] as W¯¯b\bar{\bar{W}}_{b}, and we write

W¯¯b=𝒫¯(o¯(s)),\bar{\bar{W}}_{b}=\bar{\mathcal{P}}(\bar{o}(s)), (16)

as these beliefs are determined by the predictors and the sequence of observations o¯\bar{o}.

In predictive optimal control, the control policy at step kk is determined using the belief W¯b,k\bar{W}_{b,k}, just like at any step of Type I. So the at every step, the feedback control policy is in the form

uk=πW¯b,k(k,xk).u_{k}=\pi_{\bar{W}_{b,k}}(k,x_{k}). (17)

The policies in all steps form a sequence of policies [πW¯b,0(),πW¯b,1(),,πW¯b,N1()][\pi_{\bar{W}_{b,0}}(\cdot),\pi_{\bar{W}_{b,1}}(\cdot),...,\pi_{\bar{W}_{b,N-1}}(\cdot)]. Each policy πW¯b,k()\pi_{\bar{W}_{b,k}}(\cdot) covers the time horizon from step kk to step N1N-1, with the prediction made at step kk. Each of them depends on their observation ok(s)o_{k}(s), thus ultimately depends on ss. At step kk, we apply policy πW¯b,k()\pi_{\bar{W}_{b,k}}(\cdot), then discard it since a new policy will be obtained at step k+1k+1. Combining the first steps (the step-kk where k=0,1,,N1k=0,1,...,N-1) of each element of the policy sequence, we obtain one single policy, which can be denoted as π[W¯b,0,W¯b,1,,W¯b,N]()\pi_{[\bar{W}_{b,0},\bar{W}_{b,1},...,\bar{W}_{b,N}]}(\cdot). We also write this policy as πW¯¯b()\pi_{\bar{\bar{W}}_{b}}(\cdot), or equivalently, π𝒫¯(s)()\pi_{\bar{\mathcal{P}}(s)}(\cdot). The the actual cost is Jw¯(s)πW¯¯bJ^{\pi_{\bar{\bar{W}}_{b}}}_{\bar{w}(s)}, which can also be written as Jw¯(s)π𝒫¯(s)J^{\pi_{\bar{\mathcal{P}}(s)}}_{\bar{w}(s)}. As shown in Fig. 5 (b), this cost is determined uniquely by ss in a similar way as Type I.

3.3.3 Type III: A New Observation at Every Step with Receding Horizon Prediction

In Type III, the prediction is made on a time window of fixed-length in a receding horizon way. This is commonly used in predictive optimal control applications. When the final step of the window is not NN, an artificial terminal cost V()V(\cdot) is usually designed for the optimal control problem. The cost minimized at step kk is

𝔼^w¯W¯b,k[i=kk+n1lk(xi,wi,ui)+VW¯b,k(xk+n)],\displaystyle\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{b,k}}\big{[}\sum_{i=k}^{k+n-1}l_{k}(x_{i},w_{i},u_{i})+V_{\bar{W}_{b,k}}(x_{k+n})\big{]}, (18)

where nn is the window length. The value of the artificial terminal cost VW¯b,k(xk+n)V_{\bar{W}_{b,k}}(x_{k+n}) will impact the control policy, therefore it will impact the actual cost. In this case, given the predictors, the actual cost is determined by both ss and the design of V()V(\cdot).

We may consider the artificial terminal cost V()V(\cdot) as an estimation of the optimal cost-to-go, that is,

VW¯b,k(xk+n)min𝔼^w¯W¯b,k[i=k+nN1li(xk,wk,uk)+lN(xN)].V_{\bar{W}_{b,k}}(x_{k+n})\approx\min\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{b,k}}\big{[}\sum_{i=k+n}^{N-1}l_{i}(x_{k},w_{k},u_{k})+l_{N}(x_{N})\big{]}. (19)

Then this type of problem can be considered as an approximation of Type II. If this artificial terminal cost is exactly the optimal cost-to-go, it is the same as Type II.

4 Predictor Evaluation

In this section, we use the developed framework to discuss predictor evaluation in the context of predictive optimal control. We will focus on the Type I problem, which is the foundation of all predictive control problems.

4.1 Just Being Accurate Is Not Enough

An accurate predictor means that when the predictor says the probability distribution of w¯\bar{w} is W¯b\bar{W}_{b}, the distribution is indeed W¯b\bar{W}_{b}. However, just being accurate is not good enough for predictors. To show this, we will define it rigorously first.

Definition 3 (Maximum Indistinguishable Observation Set).

Give a predictor 𝒫\mathcal{P} and a belief W¯b\bar{W}_{b}, the maximum indistinguishable observation set OMO_{M}, is the set of all observations based on which the prediction output is W¯b\bar{W}_{b},

OM={o|𝒫(o)=W¯b}.O_{M}=\{o|\mathcal{P}(o)=\bar{W}_{b}\}. (20)
Definition 4 (Accurate).

A predictor 𝒫\mathcal{P} is accurate over an observation set 𝕆\mathbb{O} if belief W¯b=𝒫(oi)\bar{W}_{b}=\mathcal{P}(o_{i}) generated by this predictor over any observation oi𝕆o_{i}\in\mathbb{O}, equals the probability distribution of w¯(S)\bar{w}(S) conditioned on o(S)OMo(S)\in O_{M}, where OMO_{M} is the maximum indistinguishable observation set of 𝒫\mathcal{P} and W¯b\bar{W}_{b}.

This definition means that if we collect all realized disturbance sequence data for any fixed belief from an accurate predictor, the collected data distribution will match this belief. Simply being accurate does not mean a good predictor. For example, a blind predictor is a predictor that generates the same belief for all observations. An accurate blind predictor does not use any information from the observation oo, yet it offers accurate beliefs. It is accurate, but not very informative.

4.2 The Goal is W¯to(o)\bar{W}_{to}(o), But Getting There Is A Challenge

The goal of prediction is to obtain the true distribution W¯to(o)\bar{W}_{to}(o) for every observation oo. To show this, we will prove that if our belief is the same as W¯to(o)\bar{W}_{to}(o), we obtain the optimal control performance, as long as we use no more information about ss than o(s)o(s).

Given a predictor 𝒫\mathcal{P}, the predictive optimal control cost expectation is 𝔼Jw¯(S)π𝒫(S)\mathbb{E}\,J^{\pi_{\mathcal{P}(S)}}_{\bar{w}(S)}. By the law of total expectation and the definition of Wto(o(s))W_{to}(o(s)),

𝔼Jw¯(S)π𝒫(S)=𝔼[𝔼[Jw¯(S)π𝒫(S)|o(S)]]=𝔼[𝔼^w¯W¯to(o(S))Jw¯π𝒫(S)].\mathbb{E}\,J^{\pi_{\mathcal{P}(S)}}_{\bar{w}(S)}=\mathbb{E}\,\Big{[}\mathbb{E}[J^{\pi_{\mathcal{P}(S)}}_{\bar{w}(S)}|o(S)]\Big{]}=\mathbb{E}\,\Big{[}\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}(o(S))}J^{\pi_{\mathcal{P}(S)}}_{\bar{w}}\Big{]}. (21)

By definition, πW¯to(o)\pi_{\bar{W}_{to}(o)} is the optimal strategy that minimizes 𝔼^w¯W¯to(o)Jw¯()\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}(o)}J^{(\cdot)}_{\bar{w}} for every oo. Since we use no more information about ss than o(s)o(s), given any oo, this cost is the best that we can achieve. Therefore, given any oo, W¯to(o)\bar{W}_{to}(o) minimizes the predictive optimal control actual cost 𝔼^w¯W¯to(o)Jw¯π()\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}(o)}J^{\pi_{(\cdot)}}_{\bar{w}}. So it minimizes the predictive optimal control cost expectation.

The goal of our predictor result 𝒫(o)\mathcal{P}(o) is W¯to(o)\bar{W}_{to}(o). However, the path to achieve this goal is full of challenges. First, in practice, we may never be able to directly compare the predicted belief 𝒫(o)\mathcal{P}(o) with our target W¯to(o)\bar{W}_{to}(o) as W¯to(o)\bar{W}_{to}(o) is unknown. Second, in general, there is no guarantee that predictors generating beliefs closers to W¯to(o)\bar{W}_{to}(o) lead to better control performance. Even locally around Wto(o)W_{to}(o), as long as 𝒫(o)W¯to(o)\mathcal{P}(o)\neq\bar{W}_{to}(o), we do not have such guarantees (details will be discussed in Section 5 and examples are provided in Section 6). Nevertheless, this ultimate goal still provides us an incentive to keep optimizing the predictor towards W¯to(o)\bar{W}_{to}(o).

4.3 Available Data for Evaluation

In most predictive optimal control applications, it is not possible to obtain enough data to estimate W¯to(o)\bar{W}_{to}(o) for every oo. In practice, the data samples are in the form of {o,w¯}\{o,\bar{w}\} pairs. For a few specific oio^{i}, we may have many {oi,w¯}\{o^{i},\bar{w}\} pair samples such that W¯to(oi)\bar{W}_{to}(o^{i}) can be directly estimated. However, for most oio^{i}, we usually do not have enough {oi,w¯}\{o^{i},\bar{w}\} samples to estimate W¯to(oi)\bar{W}_{to}(o^{i}). For example, in human-driven vehicle speed prediction, where the to-be-predicted ww is the vehicle speed, and the observation oo is the driving scenario data including the vehicle status, driver status, road conditions, and traffic conditions, etc. We can collect many samples in the form of {o,w¯}\{o,\bar{w}\} pairs under many driving scenarios. But for a specific driving scenario, it is very difficult to have repeated data showing how a driver behaves differently each time, as the scenario keeps changing. In practical situations, we usually make some assumptions on the predictor and parameterize the predictor so that it can be trained using {o,w¯}\{o,\bar{w}\} pairs. We also need to evaluate predictor performance using many {o,w¯}\{o,\bar{w}\} pairs, while not assuming we have repeated data for every specific oio^{i} to learn W¯to(oi)\bar{W}_{to}(o^{i}).

4.4 Predictor Evaluation Using Available Data

With one {o,w¯}\{o,\bar{w}\} pair, given a predictor 𝒫\mathcal{P}, we can compute a belief W¯b=𝒫(o)\bar{W}_{b}=\mathcal{P}(o). Using a beleif W¯b\bar{W}_{b} and the corresponding realized disturbance sequence w¯\bar{w}, we can define some one-time prediction performance measures. With multiple {o,w¯}\{o,\bar{w}\} pairs, the predictor 𝒫\mathcal{P}’s performance can be evaluated by aggregating these one-time prediction performance measures.

4.4.1 One-Time Prediction Performance Measures

We consider two types of measures: the error-based measures, and the probability-based measures. Given a realized disturbance sequence w¯𝒘¯\bar{w}\in\bm{\bar{w}} and a belief W¯b𝑾¯\bar{W}_{b}\in\bm{\bar{W}}, the one-time prediction performance measure is in the form of m:𝒘¯×𝑾¯m:\bm{\bar{w}}\times\bm{\bar{W}}\rightarrow\mathbb{R}. If mm satisfies that

m(w¯,W¯b)0, for all w¯ and W¯b,m(\bar{w},\bar{W}_{b})\geq 0,\text{ for all }\bar{w}\text{ and }\bar{W}_{b}, (22)

and

m(w¯,W¯b)=0 if ^W¯b(w¯)=1,m(\bar{w},\bar{W}_{b})=0\text{ if }\hat{\mathbb{P}}_{\bar{W}_{b}}(\bar{w})=1, (23)

where ^W¯b(w¯)\hat{\mathbb{P}}_{\bar{W}_{b}}(\bar{w}) means the probability of w¯\bar{w} in the belief W¯b\bar{W}_{b}, we say mm is an error-based measure. Furthermore, when m(w¯,W¯b)=0m(\bar{w},\bar{W}_{b})=0 only if ^W¯b(w¯)=1\hat{\mathbb{P}}_{\bar{W}_{b}}(\bar{w})=1, this error measure is called a strict one-time prediction error measure. With a slight abuse of notation, we write W¯b=w¯\bar{W}_{b}=\bar{w} if ^W¯b(w¯)=1\hat{\mathbb{P}}_{\bar{W}_{b}}(\bar{w})=1. Then W¯b=w¯\bar{W}_{b}=\bar{w} minimizes any error-based measure m(w¯,)m(\bar{w},\cdot) by definition.

Here are two examples of error-based measures: the expected mean squared error (MSE), and the regret. The commonly-used MSE,

M(w¯,W¯b)=𝔼^w¯bW¯bw¯w¯b22,M(\bar{w},\bar{W}_{b})=\mathop{\hat{\mathbb{E}}}_{\bar{w}_{b}\sim\bar{W}_{b}}\left\lVert\bar{w}-\bar{w}_{b}\right\rVert_{2}^{2}, (24)

is a strict one-time prediction error measure. The expectation here is the subjective expectation with respect to the belief W¯b\bar{W}_{b}. The regret

R(w¯,W¯b)=Jw¯πW¯bJw¯πw¯,R(\bar{w},\bar{W}_{b})=J^{\pi_{\bar{W}_{b}}}_{\bar{w}}-J^{\pi_{\bar{w}}}_{\bar{w}}, (25)

is a non-strict one-time prediction error measure. It is defined as the difference between the optimal cost with the predicted belief, and the optimal cost with the a posteriori disturbance sequence. Computing the regret is usually much more difficult than computing the MSE, as it involves solving a stochastic optimal control problem. It is essentially evaluating the control performance of a specific dynamic system with this prediction, instead of evaluating the prediction alone.

Besides the error measures, we may also use a probability-based measure such as the log-likelihood,

P(w¯,W¯b)={log^W¯b(w¯),if W¯b is discrete,log𝔽^W¯b(w¯),if W¯b is continuous,P(\bar{w},\bar{W}_{b})=\begin{cases}-\log\hat{\mathbb{P}}_{\bar{W}_{b}}(\bar{w}),\,&\text{if $\bar{W}_{b}$ is discrete},\\ -\log\hat{\mathbb{F}}_{\bar{W}_{b}}(\bar{w}),\,&\text{if $\bar{W}_{b}$ is continuous},\end{cases}\\ (26)

where ^W¯b(w¯)\hat{\mathbb{P}}_{\bar{W}_{b}}(\bar{w}) is believed probability of the realized disturbance sequence, and 𝔽^W¯b(w¯)\hat{\mathbb{F}}_{\bar{W}_{b}}(\bar{w}) is the believed probability density at the realized disturbance sequence. We use the log-likelihood because we will sum up or average these one-time prediction performance to evaluate the predictor performance. There is a minus sign because we want to keep this probability-based measure consistent with the error-based measures, which are to be minimized.

4.4.2 Predictor Performance Measures

If we compute the average of multiple one-time prediction performance measures from one predictor’s result, we obtain a predictor performance measure, whose expectation is

m(𝒫)=\displaystyle\mathcal{E}_{m}(\mathcal{P})= 𝔼m[w¯(S),𝒫(o(S))]\displaystyle\mathbb{E}\,m[\bar{w}(S),\mathcal{P}(o(S))] (27)
=\displaystyle= 𝔼[𝔼[m[w¯(S),𝒫(o(S))]|o(S)]].\displaystyle\mathbb{E}\,\Big{[}\mathbb{E}\,\Big{[}m[\bar{w}(S),\mathcal{P}(o(S))]\Big{|}o(S)\Big{]}\Big{]}.

Based on our definition of the observable true distribution W¯to(o)\bar{W}_{to}(o), for a given, fixed observation oo, the expectation of the one-time prediction performance measure of a belief W¯b\bar{W}_{b} can be computed using W¯to(o)\bar{W}_{to}(o),

𝔼[m[w¯(S),W¯b]|o(S)=o]\displaystyle\mathbb{E}\,\Big{[}m[\bar{w}(S),\bar{W}_{b}]\Big{|}o(S)=o\Big{]} =𝔼^w¯W¯to(o)m(w¯,W¯b).\displaystyle=\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}(o)}m(\bar{w},\bar{W}_{b}). (28)

The right-hand side of the (28) can be computed just using two probability distributions W¯to(o)\bar{W}_{to}(o) and W¯b\bar{W}_{b}. Therefore, we define

Em(W¯to(o),W¯b)=𝔼^w¯W¯to(o)m(w¯,W¯b).\displaystyle E_{m}(\bar{W}_{to}(o),\bar{W}_{b})=\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}(o)}m(\bar{w},\bar{W}_{b}). (29)

According to (27), (28) and (29), given a one-time prediction performance measure mm, the expected predictor performance measure can also be written as

m(𝒫)=\displaystyle\mathcal{E}_{m}(\mathcal{P})= 𝔼Em[W¯to(o(S)),𝒫(o(S))].\displaystyle\mathbb{E}\,E_{m}[\bar{W}_{to}(o(S)),\mathcal{P}(o(S))]. (30)

5 Predictor Performance vs. Control Performance

We are interested in the following two properties of predictor measures: (1) best-P-lowest-C: whether the best performed predictor always leads to the lowest predictive optimal control cost, (2) better-P-lower-C: whether better performed predictors always lead to lower predictive optimal control costs. The performance and cost here mean the expectation of the performance measure and the cost. Since the predictor performance measures are in the form of (30), we analyze this by investigating different predictor performance measures under each given observation oo.

5.1 MSE: A Poor Measure

if we choose the expected MSE as the one-time prediction performance measure, then,

EM(W¯to,W¯b)=\displaystyle E_{M}(\bar{W}_{to},\bar{W}_{b})= 𝔼^w¯W¯toM(w¯,W¯b)\displaystyle\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}}M(\bar{w},\bar{W}_{b}) (31)
=\displaystyle= 𝔼^w¯W¯to[𝔼^w¯bW¯bw¯w¯b22].\displaystyle\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}}\Big{[}\mathop{\hat{\mathbb{E}}}_{\bar{w}_{b}\sim\bar{W}_{b}}\left\lVert\bar{w}-\bar{w}_{b}\right\rVert_{2}^{2}\Big{]}.

In this case, W¯b=W¯to\bar{W}_{b}=\bar{W}_{to} may not even locally minimize EM(W¯to,)E_{M}(\bar{W}_{to},\cdot). This means that when using MSE, even with a large amount of data, we will miss the preditor’s ultimate goal W¯to\bar{W}_{to}. When there is no constraint, the best W¯b\bar{W}_{b} may be a deterministic prediction, which is inferior to the best stochastic predictor in terms of the control performance. With the proposed framework, examples can be easily constructed to show that the MSE-based measure is neither a best-P-lowest-C measure nor a better-P-lower-C measure (see Section 6).

5.2 Regret: A Good But Computationally Expensive Measure

If the regret-based predictor performance measure is used, then

ER(W¯to(o),W¯b)=\displaystyle E_{R}(\bar{W}_{to}(o),\bar{W}_{b})= 𝔼^w¯W¯to(o)R(w¯,W¯b)\displaystyle\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}(o)}R(\bar{w},\bar{W}_{b}) (32)
=\displaystyle= 𝔼^w¯W¯to(o)(Jw¯πW¯bJw¯πw¯)\displaystyle\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}(o)}(J^{\pi_{\bar{W}_{b}}}_{\bar{w}}-J^{\pi_{\bar{w}}}_{\bar{w}})
=\displaystyle= 𝔼^w¯W¯to(o)Jw¯πW¯b𝔼^w¯W¯to(o)Jw¯πw¯.\displaystyle\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}(o)}J^{\pi_{\bar{W}_{b}}}_{\bar{w}}-\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}(o)}J^{\pi_{\bar{w}}}_{\bar{w}}.

The second term 𝔼^w¯W¯toJw¯πw¯\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}}J^{\pi_{\bar{w}}}_{\bar{w}} is the posteriori optimal cost, which is a constant given W¯to\bar{W}_{to}. The first term is the same as the expected cost. The regret is essentially the control cost with a constant offset. W¯to\bar{W}_{to} globally minimizes ER(W¯to(o),)E_{R}(\bar{W}_{to}(o),\cdot). Furthermore, decreasing ER(W¯to(o),W¯b)E_{R}(\bar{W}_{to}(o),\bar{W}_{b}) will lead to a decrease of the expected cost. Evaluating the predictors using regret-based error measures is essentially evaluating the control performance after connecting the prediction and the optimal control process. Actually, we can just compute the first term 𝔼^w¯W¯to(o)Jw¯πW¯b\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}(o)}J^{\pi_{\bar{W}_{b}}}_{\bar{w}}, which is the control cost under the predictor, to evaluate this predictor. The regret is both a best-P-lowest-C and a better-P-lower-C measure.

5.3 Log-Likelihood: A Probably-Fine Measure

If we use the probability-based measure log-likelihood PP, in the discrete case,

EP(W¯to,W¯b)=\displaystyle E_{P}(\bar{W}_{to},\bar{W}_{b})= 𝔼^w¯W¯toP(w¯,W¯b)\displaystyle\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}}P(\bar{w},\bar{W}_{b}) (33)
=\displaystyle= 𝔼^w¯W¯tolog^W¯b(w¯)\displaystyle-\mathop{\hat{\mathbb{E}}}_{\bar{w}\sim\bar{W}_{to}}\log\hat{\mathbb{P}}_{\bar{W}_{b}}(\bar{w})
=\displaystyle= w¯(w¯|o)log^W¯b(w¯).\displaystyle-\sum_{\bar{w}}\mathbb{P}(\bar{w}|o)\log\hat{\mathbb{P}}_{\bar{W}_{b}}(\bar{w}).

This expectation is related to two probability distributions: the subjective belief and the observable true probability distribution (it is different from the entropy w¯^W¯b(w¯)log^W¯b(w¯)-\sum_{\bar{w}}\hat{\mathbb{P}}_{\bar{W}_{b}}(\bar{w})\log\hat{\mathbb{P}}_{\bar{W}_{b}}(\bar{w}), which describes the property of one probability distribution). The best predictor is ^W¯b(w¯)=^W¯to(w¯)\hat{\mathbb{P}}_{\bar{W}_{b}}(\bar{w})=\hat{\mathbb{P}}_{\bar{W}_{to}}(\bar{w}) for all w¯\bar{w} due to convexity, which means this measure is best-P-lowest-C. However, there is no guarantee to make it better-P-lower-C.

5.4 Summaries

A summary of the three predictor measures is provided in Table 1. Neither the MSE nor the log-likelihood measure have the Better-P-lower-C property. For a general predictive control problem, , there is no guarantee that predictors with better MSE or log-likelihood lead to better control performance. It implies that the predictor design cannot be simply decoupled from the downstream optimal control problem, and the predictor needs to be evaluated along with the control system performance, e.g., using the control cost or the regret as the predictor performance measure.

Table 1: Comparing Predictor Measures
Predictor Measure Best-P-lowest-C Better-P-lower-C Computation
MSE No No Low
Regret Yes Yes High
Log-likelihood Yes No Low

6 Illustrative Examples

6.1 A Simple Linear System Example

We provide a numerical example where the predictor measure gets better while the control cost gets worse in a Type I problem, and illustrate the differences of the three predictor measures shown in Table 1. Consider the simple linear system,

xk+1=xk+wk+uk,k=0,1,x_{k+1}=x_{k}+w_{k}+u_{k},\quad k=0,1, (34)

where 1u1-1\leq u\leq 1, x0=w0=0x_{0}=w_{0}=0, with a quadratic cost function J=x22J=x_{2}^{2}. There is no observation information other than ww. The input u1u_{1} will be determined after w1w_{1} is measured. So the key of this predictive optimal control is to forecast the value of w1w_{1} and determine u0u_{0} accordingly.

Assume that w1{3,2}w_{1}\in\{-3,2\}. Therefore, the observable true probability distribution of w1w_{1} is in the form of

(w1=3)=p,(w1=2)=1p.\mathbb{P}(w_{1}=-3)=p,\quad\mathbb{P}(w_{1}=2)=1-p. (35)

In addition, we assume that 0p230\leq p\leq\frac{2}{3}. In this example, the hidden prediction state ss can be considered as w1w_{1}. If we know the value of pp and can use it to design the optimal control accordingly, the ideal optimal policy is

u1\displaystyle u_{1}^{*} ={1,if w1=3,1,if w1=2,\displaystyle=\begin{cases}1,\quad&\text{if }w_{1}=-3,\\ -1,\quad&\text{if }w_{1}=2,\end{cases} (36)
u0\displaystyle u_{0}^{*} =3p1,\displaystyle=3p-1,

and the ideal optimal cost expectation is,

𝔼Jw¯(S)πW¯to(o(S))=9p2+9p.\mathbb{E}\,J^{\pi_{\bar{W}_{to}(o(S))}}_{\bar{w}(S)}=-9p^{2}+9p. (37)

A predictor gives a prediction of pp as pbp_{b} and 0pb230\leq p_{b}\leq\frac{2}{3}. pbp_{b} generates a belief of w1w_{1}. The predictive optimal solution is in the same form as (36) while replacing pp with pbp_{b}. With this belief, the predictive optimal control cost expectation is

𝔼Jw¯(S)π𝒫(o(S))=9pb218ppb+9p.\mathbb{E}\,J^{\pi_{\mathcal{P}(o(S))}}_{\bar{w}(S)}=9p_{b}^{2}-18pp_{b}+9p. (38)

Obviously, pb=pp_{b}=p minimizes this cost expectation.

When using the MSE, the regret, and the log-likelihood measure, the predictor performance measures are

EM(W¯to,w¯b)\displaystyle E_{M}(\bar{W}_{to},\bar{w}_{b}) =50ppb+25pb+25p,\displaystyle=-50pp_{b}+25p_{b}+25p, (39)
ER(W¯to,w¯b)\displaystyle E_{R}(\bar{W}_{to},\bar{w}_{b}) =9pb218ppb+8p,\displaystyle=9p_{b}^{2}-18pp_{b}+8p, (40)
EP(W¯to,w¯b)\displaystyle E_{P}(\bar{W}_{to},\bar{w}_{b}) =plogpb(1p)log(1pb),\displaystyle=-p\log p_{b}-(1-p)\log(1-p_{b}), (41)

respectively. In the regret and the log-likelihood, pb=pp_{b}=p minimizes the measures, as they are best-P-lowest-C measures. Furthermore, for the regret, ER(W¯to,w¯b)E_{R}(\bar{W}_{to},\bar{w}_{b}) is just the cost expectation 𝔼Jw¯(S)π𝒫(o(S))\mathbb{E}\,J^{\pi_{\mathcal{P}(o(S))}}_{\bar{w}(S)} with a constant offset. However, in MSE, depending on the sign of 12p1-2p, pbp_{b} takes its minimal MSE value at 0 or 23\frac{2}{3}, which is different from its optimal result pp in terms of the optimal control cost.

We use p=0.3p=0.3 to illustrates the better-P-lower-C properties of the measures. The trend of the predictor measures and the cost expectation is shown in Fig. 6 as the belief pbp_{b} changes. In the MSE and the log-likelihood case, it is possible to improve the predictor measure while making the optimal control cost worse.

Refer to caption

Figure 6: The change of the cost and the predictor measures as the belief changes when p=0.3p=0.3. In (a), all the predictor measures are scaled to [0,1][0,1]. The cost and the scaled ERE_{R} are overlapped as they have exactly the same shape. (b) (c) and (d) each shows the relationship between a predictor measure and the control cost. In (b) and (d), the arrows indicate cases where the predictor measure improves while the cost gets worse. Such pair exists even if the predictor is arbitrarily close to the global optimal.

6.2 Automotive Examples with Real-World Driving Data

We use a hybrid electric vehicle example in the form of a Type III problem to demonstrate the relationship between predictors and control performance in the real world. Given a pre-defined driving cycle as a time-velocity table, optimal control can be used to determine the most efficient energy management strategy for a hybrid electric vehicle. With uncertain future driving cycles, predictive optimal control can be used and the future vehicle velocities are forecasted. We use real-world driving data from the Next Generation Simulation (NGSIM) [29]. We consider the energy management strategy for a simple hybrid electric vehicle powertrain shown in [30]. The hybrid electric vehicle powertrain model is as follows,

xSOC,k+1=\displaystyle x_{SOC,k+1}= xSOC,kKηbηmωkTm,k,\displaystyle x_{SOC,k}-K\eta_{b}\eta_{m}\omega_{k}T_{m,k}, (42)

where ηb=ηb(xSOC,k,ωk,Tm,k)\eta_{b}=\eta_{b}(x_{SOC,k},\omega_{k},T_{m,k}), ηm=ηm(ωk,Tm,k)\eta_{m}=\eta_{m}(\omega_{k},T_{m,k}), xSOCx_{SOC} is the battery state of charge (SOC), KK is a constant, ηb\eta_{b} is a variable representing the battery or the inverse of the battery efficiency, depending on the sign of the motor power, ηm\eta_{m} represents the motor efficiency or the inverse of the motor efficiency, ω\omega is the motor speed, TmT_{m} is the motor torque. We assume the following static relationship is known,

=Tfs(v,a),\displaystyle{}^{T}=f_{s}(v,a), (43)

where TdT_{d} is the total torque demand to the powertrain, vv is the vehicle velocity, aa is the vehicle acceleration. The control input is the motor torque TmT_{m}. Once TmT_{m} is determined, the engine torque TeT_{e} is determined as

Te=TdTm.\displaystyle T_{e}=T_{d}-T_{m}. (44)

The cost function is defined as

J=\displaystyle J= α1(xsoc,Nxsoc,0)\displaystyle\alpha_{1}(x_{soc,N}-x_{soc,0}) (45)
+k=0N[α2JFC,k(ωk,Te,k)+α3(xsoc,kxtarget)2],\displaystyle+\sum_{k=0}^{N}[\alpha_{2}J_{FC,k}(\omega_{k},T_{e,k})+\alpha_{3}(x_{soc,k}-x_{target})^{2}],

where α1\alpha_{1}, α2\alpha_{2} and α3\alpha_{3} are weights, JFCJ_{FC} is the fuel consumption determined by ω\omega and TeT_{e}, and xtargetx_{target} is the target battery SOC to be maintained. This cost function considers the electric energy consumed, the fuel consumed, and the battery SOC variation from the target value during the NN steps.

In this problem, w=[v,a]Tw=[v,a]^{T} (or equivalently, w=[ω,Td]Tw=[\omega,T_{d}]^{T}) is the generalized disturbance. Observation oo is just ww and its history. A predictor tries to forecast future ww, that is, the future velocity and acceleration. The future velocity and acceleration is dependent on various factors including the driving style, the road and traffic condition. In the proposed analysis framework, these factors are considered in the environment dynamics equations in (6). The hidden prediction state ss associated with this environment dynamic system may be a high-dimensional vector. But the exact formulas associated with the environment and the hidden prediction state are not needed for applying predictive optimal control.

Four types of deterministic predictors and two types of stochastic predictors are designed.

(D1) Constant velocity predictor.

ak+1=0.a_{k+1}=0. (46)

(D2) Linear decay acceleration predictor.

ak+1=aka0γ,a_{k+1}=a_{k}-\frac{a_{0}}{\gamma},\\ (47)

where γ\gamma is a constant parameter.

(D3) Exponential decay acceleration predictor.

ak+1=λak,a_{k+1}=\lambda a_{k}, (48)

where λ(0,1)\lambda\in(0,1) is a constant parameter.

(D4) Deterministic Long Short-Term Memory (LSTM) predictor,

v¯+=fLSTM(v¯),\bar{v}^{+}=f_{LSTM}(\bar{v}^{-}), (49)

where v¯+=[vk+1,vk+2,vk+3,vk+4,vk+5]T\bar{v}^{+}=[v_{k+1},v_{k+2},v_{k+3},v_{k+4},v_{k+5}]^{T}, v¯=[vk5,vk4,vk3,vk2,vk1,vk]T\bar{v}^{-}=[v_{k-5},v_{k-4},v_{k-3},v_{k-2},v_{k-1},v_{k}]^{T}, and fLSTM()f_{LSTM}(\cdot) is an LSTM network. The network is composed of an LSTM layer and a fully connected layer, where the LSTM layer is 6×1286\times 128 and the fully connected layer is 128×5128\times 5. The LSTM network is trained using about six thousand vehicle trajectories from NGSIM data.

(S1) Zero-mean stochastic acceleration predictor.

ak+1𝒩(0,σ2),a_{k+1}\sim\mathcal{N}(0,\sigma^{2}), (50)

where σ\sigma is a constant, 𝒩\mathcal{N} stands for normal distribution.

(S2) Stochastic linear decay acceleration predictor.

μk+1\displaystyle\mu_{k+1} =a0(k+1)a0γ,\displaystyle=a_{0}-(k+1)\frac{a_{0}}{\gamma}, (51)
ak+1\displaystyle a_{k+1} 𝒩(μk+1,σ2),\displaystyle\sim\mathcal{N}(\mu_{k+1},\sigma^{2}),

where σ\sigma and γ\gamma are constants.

In simulation, 22 predictors are created based on the above six types. Three different linear decay acceleration predictors D2-a, D2-b and D2-c are used, where γ=3,4\gamma=3,4, and 55 respectively. Three different exponential decay acceleration predictors D3-a, D3-b and D3-c are used, where λ=e1,e2\lambda=e^{-1},e^{-2}, and e3e^{-3} respectively. Seven different exponential decay acceleration predictors S1-a, S1-b, …, S1-g are used, where σ=0.1,0.2,0.4,0.6,0.8,1.0\sigma=0.1,0.2,0.4,0.6,0.8,1.0, and 1.21.2, respectively. Seven different exponential decay acceleration predictors S2-a, S2-b, …, S2-g are used, where σ=0.1,0.2,0.4,0.6,0.8,1.0\sigma=0.1,0.2,0.4,0.6,0.8,1.0, and 1.21.2, respectively. For all S2 predictors, γ\gamma is set to 5 to match the best D2 predictor.

Refer to caption

Figure 7: The driving data used in simulation

We run simulation over a 112-second driving data from NGSIM, as shown in Fig. 7. The predictive optimal control is applied in a receding horizon fashion as the Type III problem. The energy management strategy runs at 1 Hz, and the prediction horizon is 5 steps.

We use dynamic programming to approximately solve the optimal control problems after each prediction. For stochastic predictors, the continuous probability distributions are discretized first. Each normally-distributed random variable is approximated by a discrete random variable with 5 possible values, before dynamic programming is applied.

Refer to caption

Figure 8: Automotive optimal control simulation results of prediction errors and control performance. (a) shows the MSE-based predictor measure and control cost of all deterministic predictors. (b) shows the MSE-based predictor measure and control cost of all stochastic predictors. (c) shows the log-likelihood-based predictor measure and control cost of all S1 stochastic predictors. (d) shows the log-likelihood-based predictor measure and control cost of all S2 stochastic predictors.

The simulation results are shown in Fig. 8. In this Type III problem, the MSE and log-likelihood is defined as the average of multiple predictions. All four plots in Fig. 8 show similar patterns to Fig. 1 (b). Therefore, when using the MSE or the log-likelihood as the predictor error measure, a better predictor does not necessarily mean better optimal control performance. Comparing the deterministic predictor with its corresponding stochastic predictors (D1 vs S1, and D2 vs S2), it can be seen that in general the MSE-based predictor measures are larger for stochastic predictors. However, the stochastic predictors may lead to lower control cost. This implies that we should be cautious when directly comparing stochastic predictors with deterministic predictors. It is still suggested that the predictors should not be evaluated alone, but with the optimal control task performance.

7 Conclusions

In this paper, an analysis framework for predictive optimal control is presented. An environment model which generates the to-be-predicted signal is included in the framework. The truth of the to-be-predicted signal is properly defined with a hidden prediction state describing the current and future uncertainties in the environment. We use the proposed analysis framework to rethink the predictor evaluation problem. It is shown that improving the predictor using a general performance measure may not guarantee the improvement in control performance. It is suggested that for a general predictive control problem, the predictor should be evaluated along with the control system performance.

Acknowledgement

This work was supported by the National Natural Science Foundation of China under Grant 52188102 and Grant 52272416.

References