Adaptive Real-Time Grid Operation via Online Feedback Optimization with Sensitivity Estimation

Miguel Picallo1, Lukas Ortmann1, Saverio Bolognani, Florian Dörfler Automatic Control Laboratory, ETH Zurich, 8092 Zurich, Switzerland
{miguelp,ortmannl,bsaverio,dorfler}@ethz.ch

Abstract

In this paper we propose an approach based on an Online Feedback Optimization (OFO) controller with grid input-output sensitivity estimation for real-time grid operation, e.g., at subsecond time scales. The OFO controller uses grid measurements as feedback to update the value of the controllable elements in the grid, and track the solution of a time-varying AC Optimal Power Flow (AC-OPF). Instead of relying on a full grid model, e.g., grid admittance matrix, OFO only requires the steady-state sensitivity relating a change in the controllable inputs, e.g., power injections set-points, to a change in the measured outputs, e.g., voltage magnitudes. Since an inaccurate sensitivity may lead to a model-mismatch and jeopardize the performance, we propose a recursive least-squares estimation that enables OFO to learn the sensitivity from measurements during real-time operation, turning OFO into a model-free approach. We analytically certify the convergence of the proposed OFO with sensitivity estimation, and validate its performance on a simulation using the IEEE 123-bus test feeder, and comparing it against a state-of-the-art OFO with constant sensitivity.

Index Terms:

Online Feedback Optimization, Real-time AC Optimal Power Flow, Recursive Estimation, Voltage Regulation

\thanksto

1 These two authors contributed equally.
Funding by the Swiss Federal Office of Energy through the projects “ReMaP” (SI/501810-01) and “UNICORN” (SI/501708), the Swiss National Science Foundation through the NCCR Automation, and by the ETH Foundation is gratefully acknowledged.

I Introduction

The increasing amount of controllable, yet sometimes unpredictable, power resources in electrical grids, e.g., renewable generation, electric vehicles, flexible loads, etc., leads to new challenges and opportunities in the operation of power systems. On the one hand, these new controllable elements allow to minimize the grid operational cost and promote a transition to a more sustainable power system. On the other hand, given the volatility and unpredictability of these resources, fast control decisions are required to avoid constraint violations, e.g., overvoltages. This is especially relevant in distribution grids, where many of these resources are deployed. However, measurement scarcity and poor grid models challenge grid operation at such low voltage levels.

One way to leverage the controllability of these resources and to optimize the grid operation is by solving an AC Optimal Power Flow (AC-OPF) [1], an optimization problem to determine the set-points of controllable resources that minimize the operational cost and enforce grid safety requirements, e.g., voltage limits, line thermal limits, etc. Unfortunately, standard AC-OPF requires a) full grid observability, e.g., measurements of all active and reactive power injections and consumptions, and b) an accurate nonlinear grid model, e.g., its admittance matrix [1]. Yet, learning the model may require an extensive deployment of measurements across the network [2, 3], usually not available or affordable on the distribution system level. Furthermore, the volatility of renewable energy sources and household loads requires high sampling and control-loop rates to satisfy the grid constraints. Yet, solving a computationally expensive AC-OPF may pose a limit on these rates.

Online Feedback Optimization (OFO) [4, 5, 6] is a novel computationally efficient approach that allows to track the solutions of an AC-OPF problem under time-varying conditions using subsecond control-loop rates. OFO is based on a controller that uses grid measurements as feedback to iteratively steer the controllable input set-points towards the AC-OPF solutions, and has already been successfully tested in both simulations and experimental settings [7]. Furthermore, OFO neither requires full grid observability [8], nor an accurate nonlinear grid model. It only needs measurements of the outputs that need to be controlled, and the input-output sensitivity that matches a change in the input to a change in the output. This sensitivity is essentially a derivative of the power flow equations at the operating point [9], and thus depends on the grid state and exogenous disturbances, e.g., loads. Hence, constructing an accurate sensitivity requires the grid model and full measurements of the grid to evaluate it. To avoid these requirements, some OFO approaches use a constant approximate linear model, and thus a constant approximate sensitivity [6, 8, 7]. Even though OFO is robust against small approximation errors in this sensitivity [7], an inaccurate sensitivity introduces a model-mismatch that may lower the approach performance [10]. Therefore, some model-free approaches try to operate the system optimally without requiring a model or sensitivity. First, reinforcement learning allows to disregard the model, and instead take decisions based solely on measurements [11]. However, reinforcement learning has limited theoretical guarantees, and may not be able to enforce the grid safety constraints during its learning phase. Second, data-driven control [12, 13, 14] based on Willems Fundamental lemma [15] allows to compute the sensitivity after gathering sufficient data. Yet, these approaches estimate a constant linear model, and thus may fail to adapt to different operating points. Finally, zeroth-order gradient-free methods as [16] allow to operate the system while continuously estimating and updating the sensitivity. However, [16] requires a sufficient time-scale separation between the sensitivity estimation procedure and the feedback optimization, which may lower the convergence rate of the entire approach if the measurement sample rate is restricted due to communication limits.

Therefore, in this paper, with a similar spirit as in the extremum seeking approach [16], we propose a model-free OFO approach that sequentially estimates a time-varying sensitivity while operating the grid, bypassing the need to know the whole grid model accurately, and to have full grid observability. Our contributions are as follows: First, we design a sensitivity learning approach via recursive least squares [17, 18]. We use as measurements the change in the outputs caused by a change of the controllable inputs. Second, we combine this sensitivity estimation with a persistently exciting OFO that gathers enough information about the sensitivity while driving the control inputs towards the AC-OPF solutions. Third, we certify the convergence of both the estimated sensitivity and the control input towards the true sensitivity and the time-varying solution of the AC-OPF, respectively. Fourth and finally, we simulate the proposed OFO controller with sensitivity estimation on the 3-phase, unbalanced IEEE 123-bus test feeder [19] using real consumption data, and show its superior performance over a state-of-the-art OFO with a constant sensitivity approximation.

The paper is structured as follows: Section II presents some preliminaries on grid models, AC-OPF and OFO. Section III explains our proposed OFO with sensitivity estimation approach, and provides theoretical convergence guarantees. Section IV shows the simulation on a test feeder. Finally, Section V concludes and discusses further work.

II Preliminaries: Grid Model, AC-OPF and OFO

II-A Grid Model

For each bus $i$ of a $n$ -bus power system we define the voltage magnitude as $v_{i}\in\mathbb{R}$ , the active and reactive power as $p_{i}\in\mathbb{R}$ and $q_{i}\in\mathbb{R}$ , respectively. We obtain the vectors $v$ , $p$ , and $q$ of dimension $n$ by stacking the individual bus quantities, i.e., $v=[v_{1},\dots,v_{n}]^{T}$ . We define the control input vector $u\in\mathbb{R}^{n_{u}}$ consisting of all the controllable resources (e.g. active and reactive generation and flexible loads in $p$ and $q$ , slack bus voltage magnitude $v_{1}$ through tap changers); the output vector $y$ (e.g. voltage magnitude elements in $v$ ) with all the quantities that we measure and want to control through the inputs; and the disturbance vector $d$ with all uncontrollable power injections (e.g. conventional consumption loads in $p$ and $q$ ). The grid admittance matrix and the power flow equations allow to define an input-output map $\mathdutchcal{h}(\cdot)$ that characterizes the output $y$ as a non-linear function of $u$ and $d$ :

y=\mathdutchcal{h}(u,d).

(1)

The input-output map $\mathdutchcal{h}(\cdot)$ is not typically available in closed form, since in general it is not possible to derive an analytical expression of $v$ (in $y$ ) as a function of $p$ and $q$ (in $u$ and $d$ ) using the power flow equations [1]. Yet, the local existence of a continuous differentiable map $\mathdutchcal{h}(\cdot)$ can be guaranteed by the implicit function theorem [20].

II-B AC Optimal Power Flow for Grid Operation

The operation of a power grid consists of deciding the input $u_{t}$ at each time instant $t$ . An AC-OPF allows to formulate this decision process as an optimization problem:

\displaystyle\begin{split}u^{*}_{t},y_{t}^{*}=&\arg\min_{u\in\mathcal{U}_{t},y}f(u)+g(y)\\ &\text{ s.t. }y=\mathdutchcal{h}(u,d_{t}),\end{split}

(2)

where $f(u)$ is the operational cost on the input $u$ ; $g(y)$ is a penalty function to enforce some grid specification on the output $y$ , e.g., voltage limits; $\mathcal{U}_{t}$ is the time-varying set of admissible inputs that defines the operational constraints on $u_{t}$ , e.g., power limits $\mathcal{U}_{t}=\{u\,|\,\underline{u}_{t}<u<\overline{u}_{t}\}$ ; and $d_{t}$ is the disturbance value at time $t$ , e.g., uncontrollable loads or non-dispatchable generation. The nonlinear input-output model (1) in (2) relates the outputs to the chosen input.

Optimal real-time decision making consists of first taking measurements $d_{t}$ ; then, solving the AC-OPF problem (2), and finally applying the solution $u^{*}_{t}$ to the system. Then, this is repeated at the next time step $t+1$ .

II-C Linear Power Flow Approximation

Solving AC-OPF problems (2) to determine the set-points of power resources is a compelling and valuable tool for grid operators, but it comes with some drawbacks: First, the full nonlinear model of the grid $\mathdutchcal{h}(u,d)$ is needed. Second, solving the AC-OPF (2) can be computationally expensive, which may jeopardize its use for real-time grid operation. This can be circumvented by linearizing the map $\mathdutchcal{h}(\cdot)$ in (1) at an operating point [21, 22, 1], e.g., the zero-injection point $(u_{\text{op}},d_{\text{op}})=(0,0)$ , to obtain the approximation

\displaystyle\begin{split}y=H_{0}u+D_{0}d+y_{0},\end{split}

(3)

where $y_{0}$ is an offset representing the output value when $u=d=0$ , e.g., $1$ p.u. for all voltage magnitudes. The matrices $H_{0}=\nabla_{u}\mathdutchcal{h}(u,d)|_{(u_{\text{op}},d_{\text{op}})}$ and $D_{0}=\nabla_{d}\mathdutchcal{h}(u,d)|_{(u_{\text{op}},d_{\text{op}})}$ are evaluated at the operating point, and represent the sensitivities of the output with respect to changes in the input $u$ and disturbance $d$ , respectively. This linear approximation (3) can substitute the nonlinear map $\mathdutchcal{h}(\cdot)$ in the AC-OPF (2) to get

\displaystyle\begin{split}\min_{u\in\mathcal{U}_{t}}f(u)+g(H_{0}u+D_{0}d_{t}+y_{0}).\end{split}

(4)

II-D Online Feedback Optimization (OFO)

Solving the AC-OPF with linear power flow approximation (4) is computationally efficient and could be employed in real-time operation. However, this approach does not take advantage of output measurements $y_{t}$ , since it only feeds $d_{t}$ through the inaccurate linear model (3). Hence, such a feedforward approach introduces a model-mismatch that can cause a performance degradation, and even lead to constraint violations, e.g., under and overvoltages.

Instead, OFO is a novel approach [5, 6, 4] that uses $y_{t}$ as feedback to achieve a safer grid operation and track the solution of the AC-OPF (2) under time-varying conditions. For that, OFO turns a standard optimization algorithm, in our case projected gradient decent [23], into a feedback controller that takes the grid output measurements $y_{t}$ , instead of computing the output $y_{t}$ via the grid model (1) or the linearized one (3). Projected gradient decent consists of a gradient step and a projection: First, we compute the gradient of the cost function in (4):

\displaystyle\nabla_{u}\big{(}f(u)+g(y)\big{)}\overset{\eqref{eq:linPF}}{=}\nabla_{u}f(u)+H_{0}^{T}\nabla_{y}g(y).

To minimize the operational cost, the current input $u_{t}$ is pushed along the direction of the negative gradient with a step size $\alpha$ , and then it is projected onto the feasible space $\mathcal{U}_{t}$ to enforce the operational constraints on the input, i.e.,

\displaystyle\begin{split}u_{t+1}=\Pi_{\mathcal{U}_{t}}\big{[}u_{t}-\alpha\big{(}\nabla_{u}f(u_{t})+H_{0}^{T}\nabla_{y}g(y_{t})\big{)}\big{]},\end{split}

(5)

where $\Pi_{\mathcal{U}}\big{[}u]=\arg\min_{z\in\mathcal{U}}\lVert u-z\rVert_{2}^{2}$ is the projection of $u$ onto $\mathcal{U}$ , which is typically easy to evaluate for power grid operation [6], especially if $\mathcal{U}_{t}=\{u\,|\,\underline{u}_{t}\leq u\leq\overline{u}_{t}\}$ is a box constraint.

III Online Feedback Optimization with Sensitivity Estimation

The OFO controllers are robust, i.e., preserve stability, against using a constant power flow sensitivity approximation $H_{0}$ instead of the actual one $\nabla_{u}\mathdutchcal{h}(u,d)$ [7, 10]. Unfortunately, even if the overall system is stable, a model mismatch between $H_{0}$ and $\nabla_{u}\mathdutchcal{h}(u,d)$ may lead to a difference between the solution $u_{t}^{*}$ of the AC-OPF problem and the values $u_{t}$ produced by the OFO controller (5) [10]. Therefore, we propose an approach to sequentially update the sensitivity $H_{0}$ into a good approximation of the true sensitivity $\nabla_{u}\mathdutchcal{h}(u,d)$ , and thus avoid a potential performance degradation. For that, we will consider the sensitivity as a time-varying parameter $H_{t}=\nabla_{u}\mathdutchcal{h}(u_{t},d_{t})$ , and propose a recursive least-squares approach to generate sensitivity estimates $\hat{H}_{t}$ using the measured variations of $y$ and $u$ over time, $\Delta u$ and $\Delta y$ respectively. Then, in every time step we feed this estimated sensitivity $\hat{H}_{t}$ to the OFO as in Figure 1.

Figure 1: Model-free grid operation via Online Feedback Optimization (OFO) with sensitivity estimation.

III-A Sensitivity Estimation

Due to the non-linearity of $\mathdutchcal{h}(u,d)$ , the true sensitivity $\nabla_{u}\mathdutchcal{h}(u,d)$ depends on the values of $u$ and $d$ . The temporal variation of the disturbance $d_{t}$ and the input $u_{t}$ , e.g., due to applying the OFO controller (5) in the input case, produces a time-varying sensitivity $H_{t}=\nabla_{u}\mathdutchcal{h}(u_{t},d_{t})$ . Instead of learning the dependency on $u$ and $d$ , we model a time-varying sensitivity $H_{t}$ with the following random process:

\displaystyle\begin{split}h_{t}=h_{t-1}+\omega_{p,t-1}\end{split}

(6)

where $h=\text{vec}(H)$ is the column-wise vector representation of the sensitivity matrix $H$ , $\Delta u_{t-1}=u_{t}-u_{t-1}$ denotes a change of the input $u$ , and $\omega_{p,t}\sim\mathcal{N}(0,\Sigma_{p,t})$ is a Gaussian process noise with covariance $\Sigma_{p,t}=\Sigma_{p_{1}}+\Sigma_{p_{2}}\lVert\Delta u_{t}\rVert_{2}^{2}$ , that represents how the sensitivity changes over time. We make the part $\Sigma_{p_{2}}$ of the process noise proportional to $\lVert\Delta u_{t}\rVert_{2}$ , since a large $\Delta u_{t}$ can trigger a larger change in the true sensitivity $\nabla_{u}\mathdutchcal{h}(u,d)$ that depends on $u$ , and the part $\Sigma_{p_{1}}$ independent of $\Delta u_{t}$ to account for a uncontrolled random change $\Delta d_{t}=d_{t+1}-d_{t}$ that can affect the sensitivity as well.

Next, to derive a measurement equation for the sensitivity $H_{t}$ , consider the first-order Taylor approximation of $y_{t}$

\displaystyle\begin{split}\overbrace{\mathdutchcal{h}(u_{t},d_{t})}^{y_{t}}\approx&\overbrace{\mathdutchcal{h}(u_{t-1},d_{t-1})}^{y_{t-1}}+\overbrace{\nabla_{u}\mathdutchcal{h}(u_{t-1},d_{t-1})}^{H_{t-1}}\Delta u_{t-1}\\ &+\nabla_{d}\mathdutchcal{h}(u_{t-1},d_{t-1})\Delta d_{t-1}.\end{split}

(7)

At each time $t$ , we measure $y_{t}$ , and compute the variation $\Delta y_{t-1}=y_{t}-y_{t-1}$ . Based on the Taylor approximation (7), we treat this variation $\Delta y_{t-1}$ as a noisy linear measurement of $H_{t-1}$ through a measurement model that depends on $\Delta u_{t-1}$ :

\displaystyle\begin{split}\Delta y_{t-1}&=\underbrace{H_{t-1}\Delta u_{t-1}}_{=U_{\Delta,t-1}h_{t-1}}+\omega_{m,t-1}\end{split}

(8)

where $U_{\Delta,t}=\Delta u_{t}^{T}\otimes\mathbbm{1}$ , with the Kronecker product $\otimes$ , and $\omega_{m,t}\sim\mathcal{N}(0,\Sigma_{m,t})$ is a Gaussian measurement noise with covariance $\Sigma_{m,t}=\Sigma_{m_{1}}+\Sigma_{m_{2}}\lVert\Delta u_{t}\rVert_{2}^{2}+\Sigma_{m_{3}}\lVert\Delta u_{t}\rVert_{2}^{4}$ . Again, the part $\Sigma_{m_{1}}$ independent of $\Delta u_{t}$ in the measurement noise represents the effect of an uncontrolled random disturbance change $\Delta d_{t}$ , while the other parts $\Sigma_{m_{2}}$ and $\Sigma_{m_{3}}$ encapsulate the second-order error of the Taylor approximation (7).

To update the sensitivity estimate $\hat{h}_{t}$ , we combine the information given by the previous sensitivity estimate $\hat{h}_{t-1}=\text{vec}(\hat{H}_{t-1})$ , and the measurements $\Delta y_{t-1}$ (8). We compute the new sensitivity estimate $\hat{h}_{t}$ through a Bayesian update represented in the following least-squares problem [17, 18]:

\displaystyle\begin{split}&\hat{h}_{t}=\arg\min_{\hat{h}}\lVert\hat{h}-\hat{h}_{t-1}\rVert_{{\Sigma_{t-1}^{-1}}}^{2}+\lVert\Delta y_{t-1}-U_{\Delta,t-1}\hat{h}\rVert_{{\Sigma_{m,t-1}^{-1}}}^{2},\end{split}

where $\Sigma_{t}$ is the covariance matrix representing the uncertainty of the sensitivity estimate $\hat{h}_{t}$ , and $\lVert x\rVert_{A}^{2}=x^{T}Ax$ is the norm of $x$ with respect to a positive definite matrix $A$ . The resulting recursive estimation can be expressed as a Kalman filter [24]:

\displaystyle\begin{split}\hat{h}_{t}=&\hat{h}_{t-1}+K_{t-1}(\Delta y_{t-1}-U_{\Delta,t-1}\hat{h}_{t-1})\\ \Sigma_{t}=&\big{(}\mathbbm{1}-K_{t-1}U_{\Delta,t-1}\big{)}\Sigma_{t-1}+\Sigma_{p,t-1},\end{split}

(9)

where $\mathbbm{1}$ is the identity matrix, and $K_{t}=\Sigma_{t}U_{\Delta,t}^{T}(\Sigma_{m,t}+U_{\Delta,t}\Sigma_{t}U_{\Delta,t}^{T})^{-1}$ is the Kalman gain, which is well defined for an invertible $\Sigma_{m,t}$ , see later Assumption 1.

Remark 1

Note that for a diagonal measurement noise covariance $\Sigma_{m,t}=\sigma_{m,t}\mathbbm{1}$ , in the limit $\sigma_{m,t}\to\infty$ , the gain is $K_{t}=0$ , thus the sensitivity is not updated, and we keep the initial sensitivity, i.e., $\hat{h}_{t}=\hat{h}_{t-1}=\cdots=\hat{h}_{0}$ . Similarly, a large $\Sigma_{m,t}$ diminishes $K_{t}$ , and helps to tune how fast we want to learn or differ from the initial sensitivity. On the other hand, the process noise covariance $\Sigma_{p,t}$ represents our trust in our current model, and it also helps to tune the learning rate.

III-B Persistently Exciting OFO

To learn the time-varying sensitivity $H_{t}$ , we need to capture enough information via the measurement equation (8), i.e, we need to use different $\Delta u$ to explore different reactions $\Delta y$ and infer different elements of $H_{t}$ from them. This can be formalized via the persistency of excitation condition [25]: $\Delta u_{t}$ is persistently exciting if there exists a time span $T>0$ , such that for all $t>0$ , the matrix formed by columns $\Delta u_{t+i}$ for $i\in\{0,\dots,T\}$ has full rank, i.e., $\text{rank}(\Delta u_{t},\dots,\Delta u_{t+T})=n_{u}$ . To achieve persistency of excitation, we perturb the OFO step (5) with $\omega_{u,t}\in\mathbb{R}^{n_{u}}$ , a bounded zero-mean white noise with independent and identically distributed elements with standard deviation $\sigma_{u}$ , e.g., a truncated Gaussian distribution. As a result, we obtain the following persistently exciting OFO with estimated sensitivity $\hat{H}_{t}$ :

\displaystyle\begin{split}u_{t+1}=\Pi_{\mathcal{U}_{t}}\big{[}u_{t}-\alpha\big{(}\nabla_{u}f(u_{t})+\hat{H}_{t}^{T}\nabla_{y}g(y_{t})\big{)}+\omega_{u,t}\big{]}\end{split}

(10)

The resulting interconnected OFO, sensitivity learning and power grid is represented in the block diagram in Figure 1. At each time $t$ , a complete loop of the online optimization with sensitivity estimation can be represented as:

Algorithm 1 Online Feedback Optimization (OFO) with sensitivity estimation (blue block in Figure 1)

1: Input:

y_{t}

(measured from the grid)

2: Recover from previous step:

y_{t-1},u_{t-1},u_{t}

3: Sensitivity update using (9):

K_{t-1}=\Sigma_{t-1}U_{\Delta,t-1}^{T}(\Sigma_{m,t-1}+U_{\Delta,t-1}\Sigma_{t-1}U_{\Delta,t-1}^{T})^{-1}

\hat{h}_{t}=\hat{h}_{t-1}+K_{t-1}(\Delta y_{t-1}-U_{\Delta,t-1}\hat{h}_{t-1})

\Sigma_{t}=\big{(}\mathbbm{1}-K_{t-1}U_{\Delta,t-1}\big{)}\Sigma_{t-1}+\Sigma_{p,t-1}

4: Sample the excitation noise

\omega_{u,t}\sim\mathcal{N}(0,\sigma_{u}^{2}\mathbbm{1})

5: Input optimization using (10):

u_{t+1}=\Pi_{\mathcal{U}_{t}}\big{[}u_{t}-\alpha\big{(}\nabla_{u}f(u_{t})+\hat{H}_{t}^{T}\nabla_{y}g(y_{t})\big{)}+\omega_{u,t}\big{]}

6: Output:

u_{t+1}

Remark 2

The sensitivity learning approach (9) is independent of the method used to update the input $u$ , since it only requires the increment $\Delta u$ and the measured $\Delta y$ . Hence, it is not only compatible with the projected-gradient-based OFO in (10), but can be combined with linearly simplified AC-OPF as (4), or other OFO approaches, e.g., primal-dual methods [6, 7], quadratic programming [26, 27], which may have other desirable properties, like strict constraint satisfaction or a faster convergence.

III-C Convergence Analysis

In this section we analyze the convergence of the estimated sensitivity $\hat{H}_{t}$ produced by the sensitivity learning (9), and the input $u_{t}$ produced by the OFO (10), towards the true sensitivity $H_{t}$ and the solution $u^{*}_{t}$ of the AC-OPF (2), respectively. We certify this convergence assuming that the true sensitivity $H_{t}$ behaves according to the simplified dynamic process (6) and satisfies the linear measurements equation (8); and that the projected gradient descent used in (10) is a strongly monotone and Lipschitz continuous operator:

Definition 1 (Monotone and Lipschitz operator)

An operator $F:\mathbb{R}^{n}\to\mathbb{R}^{n}$ is $\eta_{F}$ -strongly monotone if $(x_{1}-x_{2})^{T}(F(x_{1})-F(x_{2}))\geq\eta_{F}\lVert x_{1}-x_{2}\rVert_{2}^{2}$ for all $x_{1},x_{2}$ , and $L_{F}$ -Lipschitz continuous if $\lVert F(x_{1})-F(x_{2})\rVert_{2}\leq L_{F}\lVert x_{1}-x_{2}\rVert_{2}$ .

Assumption 1

The functions $f(\cdot)$ and $g(\cdot)$ in (2) are continuously differentiable. The sensitivity satisfies (6) and (8) with independent $\omega_{p,t}$ and $\omega_{m,t}$ . Furthermore, for all $t>0$ , $\Sigma_{p,t},\Sigma_{m,t}$ have a positive lower and upper bound, i.e., there exists $\gamma,\beta>0$ such that $\gamma\mathbbm{1}\preceq\Sigma_{p,t}\preceq\beta\mathbbm{1}$ , $\gamma\mathbbm{1}\preceq\Sigma_{m,t}\preceq\beta\mathbbm{1}$ ; there exists $L_{h}>0$ such that $\lVert\nabla_{y}g(\mathdutchcal{h}(u^{*}_{t},d_{t}))\rVert_{2}\leq L_{h}$ ; and the operator $F_{t}(\cdot)=\nabla_{u}f(\cdot)+H_{t}^{T}\nabla_{y}g(\mathdutchcal{h}(\cdot,d_{t}))$ in (10) is $\eta$ -strongly monotone and $L$ -Lipschitz continuous.

The continuous differentiability of $f(\cdot)$ and $g(\cdot)$ is common for typical cost functions in power systems, e.g., linear or quadratic $f(\cdot)$ , and quadratic penalty functions like $g(\cdot)=\max(0,\cdot)^{2}$ . For strongly convex and Lipschitz smooth cost functions $f(\cdot)$ , the strong monotonicity and Lipschitz continuity of the gradient operator $F_{t}(\cdot)$ holds in certain regions around nominal operating points [10]. In particular, it would hold if using a usual linear approximation for the input-output map (1) [8]. Since $u$ and $d$ are restricted by the grid physical limits, e.g., power ratings, the upper bound of $\lVert\Delta u_{t}\rVert_{2}$ and $\lVert\nabla_{y}g(\mathdutchcal{h}(u^{*}_{t},d_{t}))\rVert_{2}$ are justified, since $g(\cdot)$ is differentiable in a compact set. The persistency of excitation ensures that $\lVert\Delta u_{t}\rVert_{2}>0$ with high probability. Then, $\Sigma_{p,t}=\Sigma_{p_{1}}+\Sigma_{p_{2}}\lVert\Delta u_{t}\rVert_{2}^{2}\succ 0,\Sigma_{m,t}=\Sigma_{m_{1}}+\Sigma_{m_{2}}\lVert\Delta u_{t}\rVert_{2}^{2}+\Sigma_{m_{3}}\lVert\Delta u_{t}\rVert_{2}^{4}\succ 0$ if at least one $\Sigma_{p_{i}}\succ 0$ and one $\Sigma_{m_{j}}\succ 0$ for some $i,j$ . Finally, even though the true sensitivity is state dependent, i.e., $H_{t}=\nabla_{u}\mathdutchcal{h}(u_{t},d_{t})$ , the process and measurement noises in (6) and (8) allow to overapproximate the actual behavior of the sensitivity via these simplifications. In conclusion, Assumption 1 is reasonable. Then, with a persistently exciting $\Delta u$ as in (10), we have the following convergence result:

Proposition 1

Under Assumption 1, and the persistently excited OFO updates (10), the sensitivity estimates (9) satisfy:

\displaystyle\begin{split}\text{Unbiased mean: }&\lVert\mathbb{E}[h_{t}-\hat{h}_{t}]\rVert_{2}^{2}\leq C_{h,1}e^{-C_{h,2}t}\overset{t\to\infty}{\to}0\\ \text{Bounded covariance: }&\mathbb{E}[\lVert h_{t}-\hat{h}_{t}\rVert_{2}^{2}]=\text{tr}(\Sigma_{t})\\ &\leq C_{h,3}+C_{h,4}e^{-C_{h,5}t}{\to}C_{h,3},\end{split}

(11)

where $\mathbb{E}[\cdot]$ denotes the expectation, $C_{h,i}>0$ are positive constants, and $\overset{t\to\infty}{\to}$ the limit as $t$ goes to infinity. Furthermore, if the step size in (10) satisfies $\alpha<\tfrac{2\eta}{L^{2}}$ , so that $\epsilon=\sqrt{1-2\eta\alpha+L^{2}\alpha^{2}}<1$ , then we have

\displaystyle\begin{split}&\mathbb{E}[\lVert u_{t}-u_{t}^{*}\rVert_{2}]\\ \leq&\tfrac{1}{1-\epsilon}\big{(}\sigma_{u}+\sup_{k<t}\mathbb{E}[\lVert\Delta u^{*}_{k}\rVert_{2}]+\sqrt{C_{h,3}}\alpha L_{h}\big{)}\\[-2.84544pt] &+\epsilon^{t}\mathbb{E}[\lVert u_{0}-u^{*}_{0}\rVert_{2}]+\alpha L_{h}t\sqrt{C_{h,4}}\max(\epsilon,e^{\frac{-C_{h,5}}{2}})^{t-1}\\[2.84544pt] \overset{t\to\infty}{\to}&\tfrac{1}{1-\epsilon}\big{(}\sigma_{u}+\sup_{k}\mathbb{E}[\lVert\Delta u^{*}_{k}\rVert_{2}]+\sqrt{C_{h,3}}\alpha L_{h}\big{)}.\end{split}

(12)

Proof:

See Appendix. ∎

Proposition 1 establishes first that the estimated sensitivity $\hat{h}_{t}$ converges in expectation to the true sensitivity $h_{t}$ with a bounded covariance. Additionally, the control input $u_{t}$ converges to the AC-OPF solution $u_{t}^{*}$ from (2) with a quantifiable tracking error determined by the bound $C_{h,3}$ of the sensitivity estimation covariance, the variance $\sigma_{u}$ of the persistency of excitation noise $\omega_{u}$ , and the temporal variation of the AC-OPF solution $\mathbb{E}[\lVert\Delta u^{*}_{t}\rVert_{2}^{2}]$ , where $\Delta u^{*}_{t}$ can also be bounded by the temporal variation of $d_{t}$ and $\mathcal{U}_{t}$ in the AC-OPF (2) [28].

IV Test Case

In this section we validate the proposed OFO with sensitivity estimation. We simulate a benchmark distribution grid under time-varying conditions during a 1-hour simulation with 1-second resolution, hence a 1 second control-loop rate. In particular, we show its superior performance against an OFO approach with a constant sensitivity. First we explain the simulation setup, and then we comment the results obtained.

IV-A Simulation Setup

Refer to caption — Figure 2: IEEE 123-bus test feeder [19]. Distributed generation: yellow diamond = solar, grey parallelogram = wind. Lines with perturbed electrical parameters: blue square-dotted.

•

Distribution grid: We use the 3-phase, unbalanced IEEE 123-bus test feeder [19] in Figure 2.
•

Disturbance $d$ : We consider uncontrollable active and reactive loads in our disturbance vector $d$ . To generate these load profiles we use $1$ -second resolution data of the ECO data set [29], then aggregate households and rescale them to the base loads of the 123-bus feeder. This gives us values of $d_{t}$ for every second during simulation time of 1h.
•

Controllable input set-points $u$ : We add two solar PV systems and two wind turbines to the grid as in [8], see Figure 2. They can inject active power, and inject and absorb reactive power on all three phases, which gives us 24 control inputs. We consider a slack bus 150 in Figure 2, with a controllable voltage magnitude through, e.g., a tap changer, which makes in total $n_{u}=25$ . The solar and wind generation profiles are generated based on a $1$ -minute solar irradiation profile [30] and a $2$ -minute wind speed profile [31]. Generation is assumed constant between samples. We use these profiles to set the time-varying upper limit of the feasible set $\overline{u}_{t}$ , set the lower limit of active generation to $\underline{u}_{t}=0$ , and define $\mathcal{U}_{t}=\{u\,|\,\underline{u}_{t}\leq u\leq\overline{u}_{t}\}$ .
•

Output $y$ : We consider as output $y$ the voltage magnitudes of all phases at all buses except the slack bus, given that it is a control input.
•

AC-OPF cost function in (2): We use a quadratic cost that penalizes deviating from a reference: $f(u)=\frac{1}{2}\lVert u-u_{\text{ref}}\rVert_{2}^{2}$ . The reference $u_{\text{ref}}$ for the voltage magnitude at the slack bus is $1$ p.u. The reference for the controllable generation is the maximum installed power to promote using as much renewable energy as possible. The reference for reactive power is $0$ . Note that the cost function is continuously differentiable, and has a strongly monotone and Lipschitz continuous gradient as required in Assumption 1. We consider the voltage limits $[0.94\,\text{p.u.},1.06\,\text{p.u.}]$ for all nodes as in [5, 8], and use the penalty function $g(y)=\frac{\rho}{2}\max\big{(}\left[\begin{smallmatrix}\mathbbm{1}\\ -\mathbbm{1}\end{smallmatrix}\right]y+\left[\begin{smallmatrix}-1.06\\ 0.94\end{smallmatrix}\right],0\big{)}^{2}$ , with a sufficiently large penalization parameter $\rho=100$ to discourage violations. Again, this function is continuously differentiable, and has a monotone and Lipschitz continuous gradient.
•

Sensitivity process and measurement noises in (6) and (8): Under fast sampling rates $\Delta d_{t}$ may be negligible, especially when compared to $\Delta u_{t}$ . Hence, for the simulation we assign $\Sigma_{p_{1}},\Sigma_{m_{1}},\Sigma_{m_{2}}$ to 0, and keep $\Sigma_{p_{2}},\Sigma_{m_{3}}\succ 0$ . This ensures that $\Sigma_{p,t},\Sigma_{m,t}\succ 0$ for all $t$ , as required by Assumption 1.
•

Persistency of excitation: We use a symmetric truncated Gaussian distribution with $\sigma_{u}=0.0001$ p.u. to introduce a low persistency of excitation noise $\omega_{u,t}$ that facilitates our sensitivity learning, but avoids introducing a big deviation in the input convergence, see (12).
•

Initializing sensitivity and linear model (3): We use the zero-injection operating point $u_{\text{op}}=0,d_{\text{op}}=0$ to initialize the sensitivity estimation, i.e., $\hat{H}_{0}=H_{0}=\nabla_{u}\mathdutchcal{h}(u,d)|_{(0,0)}$ , see (3). In the first simulation (1: true admittance) we use the true admittance to compute $H_{0}$ , in the second (2: perturbed admittance) we use a perturbed admittance matrix, where we have introduce an up to $20\%$ error in the admittance of the lines indicated in Figure 2.

IV-B Results

We analyze the simulation performance of OFO with sensitivity learning (9) and (10), and compare it against an OFO with constant sensitivity (5). We validate both results in Proposition 1: First, the estimated sensitivity $\hat{H}_{t}$ converges to the real time-varying sensitivity $H_{t}$ . Second, the input $u_{t}$ converges to the AC-OPF solution $u^{*}_{t}$ (2).

IV-B1 True admittance

First we perform a simulation where we use the true admittance to derive the initial sensitivity $H_{0}$ in the linear power flow approximation (3). Figure 3 shows the norm of the AC-OPF solution $u^{*}_{t}$ of (2) that we calculate with the correct non-linear model $\mathdutchcal{h}(\cdot)$ and the disturbances $d_{t}$ . This optimal input is time-varying due to the changing solar radiation and wind speed in the limits $\overline{u}_{t}$ , and the temporal variation of the loads in $d_{t}$ . Figure 3 shows how the OFO control input $u_{t}$ converges towards the optimal input $u^{*}_{t}$ using different sensitivities: The inputs $u_{H}$ produced by the OFO controller (5) with the exact sensitivity $H_{t}=\nabla\mathdutchcal{h}(u_{t},d_{t})$ succeed in tracking the AC-OPF solution $u^{*}$ , with relatively small differences caused by the time-varying disturbances $d_{t}$ and/or available energy $\overline{u}_{t}$ . However, when using the constant sensitivity $H_{0}$ in (5), there is a large difference between the generated control input $u_{H_{0}}$ and the optimal one $u^{*}$ . This gap is closed when using the OFO with sensitivity estimation (10), i.e., $u_{\hat{H}}$ is able to converge to the AC-OPF solution $u^{*}$ of (2) with a small tracking error, as predicted by Proposition 1.

Figure 4 shows the relative error $\frac{\lVert\Delta y-H\Delta u\rVert_{2}}{\lVert\Delta y\rVert_{2}}$ of the measurement equation (8). This helps to understand why OFO with sensitivity learning (9) performs better than with a constant sensitivity $H_{0}$ : The linearization error with estimated sensitivity $\hat{H}_{t}$ gets lower respect to the one with $H_{0}$ . This means that the learned sensitivity becomes a more accurate linear approximation than (3), which causes the lower optimization error observed in Figure 3. Even though the error $\frac{\lVert\Delta y-H\Delta u\rVert_{2}}{\lVert\Delta y\rVert_{2}}$ does not converge to $0$ when using $\hat{H}$ , the sensitivity estimation approach (9) learns enough to drive the control set-points to the optimum, see Figure 3, which is our ultimate objective.

Finally, Figure 5 shows that the inputs $u_{\hat{H}}$ , produced by the OFO with sensitivity estimation (10) result into much less voltage violations than $u_{H_{0}}$ from the OFO with constant sensitivity (5). Actually, the number of voltage violations of $u_{\hat{H}}$ gets close to those of the OFO with true sensitivity $u_{H}$ . Hence, the OFO with sensitivity estimation not only reduces the distance to the AC-OPF solution, see Figure 3, but performs a better voltage regulation.

IV-B2 Perturbed admittance

In Figure 6 we show a simulation for which we perturb the admittance of the lines indicated in Figure 2 with an up to $20\%$ error. We observe how the OFO with sensitivity learning $u_{\hat{H}}$ (10) is still able to track the AC-OPF solution $u^{*}$ of (2) in time-varying conditions. The OFO $u_{H_{0}}$ with a fixed sensitivity (5) and the same step size as $u_{\hat{H}}$ diverges, since it tries to regulate the voltage with a wrong sensitivity that is too far from the actual one. Convergence is recovered with a lower step size in $u_{H_{0},\text{slow}}$ , but it still performs poorly at tracking the AC-OPF solution. This experiment allows us to conclude that the OFO with sensitivity estimation (10) is a model-free approach that does not require an accurate model, but learns it online.

V Conclusion and Outlook

Standard Online Feedback Optimization (OFO) typically uses an approximate input-output sensitivity, which may lower its performance. Alternative, one can compute the actual sensitivity, but that requires, having an accurate grid model and full grid observability, which is usually not available. In this work we have proposed a recursive estimation approach that provides Online Feedback Optimization (OFO) with a tool to learn the model sensitivity without extensive measurements, and thus improves its performance and turns OFO into a model-free approach. We have provided convergence guarantees when approximating the time-varying sensitivity behavior by a random process with linear measurements. We have established that even under time-varying conditions the estimated sensitivity and the control input converge to a neighborhood of the true sensitivity and the solution of the AC-OPF, respectively. Finally, we have validated with simulations using the IEEE 123-bus test feeder that our proposed OFO controller with sensitivity estimation performs successfully even though the actual sensitivity is state-dependent, i.e., it is able to track a time-varying optimal input while satisfying the grid specifications. In short, the proposed OFO controller with sensitivity estimation can be used as a model-free plug-and-play controller for real-time power grid operation that enables safe and optimal control.

An interesting future addition would be to investigate a more suitable way to design the persistency of excitation, possibly linked to the optimization problem, so that it explores specific directions of interest. Additionally, it would be interesting to observe how the proposed sensitivity estimation approach performs under a sudden change of topology caused by, e.g., a line fault, network split, etc.; under communication problems, e.g., delays, missing packages, recurrent outliers due to, for example, sensor misscalibration.

References

[1] D. K. Molzahn and I. A. Hiskens, “A Survey of Relaxations and Approximations of the Power Flow Equations,” Foundations and Trends in Electric Energy Systems, vol. 4, no. 1-2, pp. 1–221, February 2019.
[2] S. Bolognani, N. Bof, D. Michelotti, R. Muraro, and L. Schenato, “Identification of power distribution network topology via voltage correlation analysis,” in 52nd Conf. on Decision and Control, 2013, pp. 1659–1664.
[3] K. Moffat, M. Bariya, and A. Von Meier, “Unsupervised impedance and topology estimation of distribution networks—limitations and tools,” IEEE Trans. Smart Grid, vol. 11, no. 1, pp. 846–856, 2020.
[4] D. K. Molzahn, F. Dörfler, H. Sandberg, S. H. Low, S. Chakrabarti, R. Baldick, and J. Lavaei, “A survey of distributed optimization and control algorithms for electric power systems,” IEEE Trans. Smart Grid, vol. 8, no. 6, pp. 2941–2962, Nov. 2017.
[5] A. Hauswirth, A. Zanardi, S. Bolognani, F. Dörfler, and G. Hug, “Online optimization in closed loop on the power flow manifold,” in 2017 IEEE Manchester PowerTech. IEEE, 2017, pp. 1–6.
[6] E. Dall’Anese and A. Simonetto, “Optimal power flow pursuit,” IEEE Trans. Smart Grid, vol. 9, no. 2, pp. 942–952, Mar. 2018.
[7] L. Ortmann, A. Hauswirth, I. Caduff, F. Dörfler, and S. Bolognani, “Experimental validation of feedback optimization in power distribution grids,” Electric Power Systems Research, vol. 189, p. 106782, 2020.
[8] M. Picallo, S. Bolognani, and F. Dörfler, “Closing the loop: Dynamic state estimation and feedback optimization of power grids,” Electric Power Systems Research, vol. 189, p. 106753, 2020.
[9] S. Bolognani and F. Dörfler, “Fast power system analysis via implicit linearization of the power flow manifold,” in 53rd Annual Allerton Conf. on Communication, Control, and Computing. IEEE, 2015, pp. 402–409.
[10] M. Colombino, J. W. Simpson-Porco, and A. Bernstein, “Towards robustness guarantees for feedback-based optimization,” in 2019 IEEE 58th Conf. on Decision and Control. IEEE, 2019, pp. 6207–6214.
[11] X. Chen, G. Qu, Y. Tang, S. Low, and N. Li, “Reinforcement learning for decision-making and control in power systems: Tutorial, review, and vision,” arXiv preprint arXiv:2102.01168, 2021.
[12] J. Coulson, J. Lygeros, and F. Dörfler, “Data-enabled predictive control: In the shallows of the deepc,” in 2019 18th European Control Conference (ECC). IEEE, 2019, pp. 307–312.
[13] C. Mugnier, K. Christakou, J. Jaton, M. De Vivo, M. Carpita, and M. Paolone, “Model-less/measurement-based computation of voltage sensitivities in unbalanced electrical distribution networks,” in 2016 Power Systems Computation Conference (PSCC). IEEE, 2016, pp. 1–7.
[14] G. Bianchin, M. Vaquero, J. Cortes, and E. Dall’Anese, “Data-driven synthesis of optimization-based controllers for regulation of unknown linear systems,” Dec. 2021.
[15] J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,” Systems & Control Letters, vol. 54, no. 4, pp. 325–329, 2005.
[16] X. Chen, J. I. Poveda, and N. Li, “Model-free optimal voltage control via continuous-time zeroth-order methods,” Dec. 2021.
[17] L. Lennart, “System identification: theory for the user,” PTR Prentice Hall, Upper Saddle River, NJ, vol. 28, 1999.
[18] R. Isermann and M. Münchhof, Identification of dynamic systems: an introduction with applications. Springer Science & Business Media, 2010.
[19] W. Kersting, “Radial distribution test feeders,” IEEE Trans. Power Syst., vol. 6, no. 3, pp. 975–985, 1991.
[20] S. G. Krantz and H. R. Parks, The implicit function theorem: history, theory, and applications. Springer Science & Business Media, 2012.
[21] S. H. Low, “Convex relaxation of optimal power flow—part i: Formulations and equivalence,” IEEE Transactions on Control of Network Systems, vol. 1, no. 1, pp. 15–27, 2014.
[22] S. Bolognani and S. Zampieri, “On the existence and linear approximation of the power flow solution in power distribution networks,” IEEE Transactions on Power Systems, vol. 31, no. 1, pp. 163–172, 2015.
[23] D. P. Bertsekas, “Nonlinear programming,” Journal of the Operational Research Society, vol. 48, no. 3, pp. 334–334, 1997.
[24] A. H. Jazwinski, “Mathematics in science and engineering,” Stochastic processes and filtering theory, vol. 64, 1970.
[25] E. Bai and S. Sastry, “Persistency of excitation, sufficient richness and parameter convergence in discrete time adaptive control,” Systems & Control Letters, vol. 6, no. 3, pp. 153–163, 1985.
[26] V. Häberle, A. Hauswirth, L. Ortmann, S. Bolognani, and F. Dörfler, “Non-convex feedback optimization with input and output constraints,” IEEE Control Systems Letters, vol. 5, no. 1, pp. 343–348, 2020.
[27] M. Picallo, D. Liao-McPherson, S. Bolognani, and F. Dörfler, “Cross-layer design for real-time grid operation: Estimation, optimization and power flow,” arXiv preprint arXiv:2109.13842, 2021.
[28] I. Subotic, A. Hauswirth, and F. Dorfler, “Quantitative sensitivity bounds for nonlinear programming and time-varying optimization,” IEEE Transactions on Automatic Control, 2021.
[29] C. Beckel, W. Kleiminger, R. Cicchetti, T. Staake, and S. Santini, “The ECO data set and the performance of non-intrusive load monitoring algorithms,” in Proc. 1st ACM Conf. on Embedded Systems for Energy-Efficient Buildings, 11 2014.
[30] HelioClim-3, “HelioClim-3 Database of Solar Irradiance,” http://www.soda-pro.com/web-services/radiation/helioclim-3-archives-for-free, accessed: 2017-12-01.
[31] MERRA-2, “The Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) Web service,” http://www.soda-pro.com/web-services/meteo-data/merra, accessed: 2017-12-01.
[32] Tzyh-Jong Tarn and Y. Rasis, “Observers for nonlinear stochastic systems,” IEEE Trans. Autom. Control, vol. 21, no. 4, pp. 441–448, Aug. 1976.

Appendix: Proof of Proposition 1

Consider the information matrix $W_{I}=\sum_{k=t}^{t+T}U_{\Delta,k}^{T}{\Sigma_{m,k}^{-1}}U_{\Delta,k}\allowbreak=\sum_{k=t}^{t+T}(\Delta u_{k}\Delta u_{k}^{T})\otimes{\Sigma_{m,k}^{-1}}$ . Since $\gamma\mathbbm{1}\preceq\Sigma_{m,t}\preceq\beta\mathbbm{1}$ for all $t$ , we have $\frac{1}{\beta}\mathbbm{1}\preceq\Sigma_{m,t}^{-1}\preceq\frac{1}{\gamma}\mathbbm{1}$ , and $(\sum_{k=t}^{t+T}\Delta u_{k}\Delta u_{k}^{T})\otimes\frac{1}{\beta}\mathbbm{1}\preceq W_{I}\preceq(\sum_{k=t}^{t+T}\Delta u_{k}\Delta u_{k}^{T})\otimes\frac{1}{\gamma}\mathbbm{1}$ . Since $\Delta u$ is persistently exciting, there exists a sufficiently large $T$ and $\gamma_{2},\beta_{2}>0$ so that $\gamma_{2}\mathbbm{1}\preceq\sum_{k=t}^{t+T}\Delta u_{k}\Delta u_{k}^{T}\preceq\beta_{2}\mathbbm{1}$ , and thus $\frac{\gamma_{2}}{\beta}\mathbbm{1}\preceq W_{I}\preceq\frac{\beta_{2}}{\gamma}\mathbbm{1}$ . Hence, the matrix pair $(\mathbbm{1},U_{\Delta,t})$ from the dynamic system (6) and (8) is uniformly completely observable, and, additionally, uniformly complete controllable given $\Sigma_{p,t}\succ 0$ [24, Ch. 7]. As a result, the sensitivity converges exponentially in expectation, and is exponentially bounded in mean square [24, 32], i.e., there exists positive constants $C_{h,i}>0$ satisfying (11).

Then, under Assumption 1 we have

\displaystyle\begin{split}&\lVert u_{t+1}-u_{t+1}^{*}\rVert_{2}\leq\lVert u_{t+1}-u^{*}_{t}\rVert_{2}+\lVert\Delta u^{*}_{t}\rVert_{2}\\[-2.84544pt] \overset{\eqref{eq:ofope}}{\hskip 11.38092pt\leq}&\lVert\Pi_{\mathcal{U}_{t}}\big{[}u_{t}-\alpha\big{(}\nabla_{u}f(u_{t})+\hat{H}_{t}^{T}\nabla_{y}g(y_{t})\big{)}+\omega_{u,t}\big{]}\\ &-\Pi_{\mathcal{U}_{t}}\big{[}u^{*}_{t}-\alpha F_{t}(u^{*}_{t})\big{]}\rVert_{2}+\lVert\Delta u^{*}_{t}\rVert_{2}\\ {\leq}&\lVert\big{(}u_{t}-\alpha\big{(}\nabla_{u}f(u_{t})+\hat{H}_{t}^{T}\nabla_{y}g(y_{t})\big{)}+\omega_{u,t}\big{)}\pm H_{t}\nabla_{y}g(y_{t})\\ &-\big{(}u^{*}_{t}-\alpha F_{t}(u^{*}_{t})\big{)}\rVert_{2}+\lVert\Delta u^{*}_{t}\rVert_{2}\\ \leq&\lVert\big{(}u_{t}-\alpha F_{t}(u_{t})\big{)}-\big{(}u^{*}_{t}-\alpha F_{t}(u^{*}_{t})\big{)}\rVert_{2}+\lVert\omega_{u,t}\rVert_{2}\\ &+\alpha L_{h}\lVert h_{t}-\hat{h}_{t}\rVert_{2}+\lVert\Delta u^{*}_{t}\rVert_{2}\\ \leq&\epsilon\lVert u_{t}-u^{*}_{t}\rVert_{2}+\lVert\omega_{u,t}\rVert_{2}+\alpha L_{h}\lVert h_{t}-\hat{h}_{t}\rVert_{2}+\lVert\Delta u^{*}_{t}\rVert_{2},\end{split}

where in the second inequality we use that $u^{*}_{t}$ satisfies $u^{*}_{t}=\Pi_{\mathcal{U}_{t}}\big{[}u^{*}_{t}-\alpha F_{t}(u_{t}^{*})\big{]}$ , i.e., due to optimality $u^{*}_{t}$ is a fixed point of the operator (10) with $\omega_{u,t}=0$ and the true sensitivity $H_{t}$ instead of the estimated one $\hat{H}_{t}$ . In the fourth inequality, where $\epsilon^{2}=1-2\eta\alpha+L^{2}\alpha^{2}$ , we use that the operator $F_{t}(\cdot)$ is $\eta$ -strongly monotone and $L$ -Lipschitz continuous. Hence, in expectation we have

\displaystyle\begin{split}&\mathbb{E}[\lVert u_{t+1}-u_{t+1}^{*}\rVert_{2}]\\ \leq&\epsilon\mathbb{E}[\lVert u_{t}-u^{*}_{t}\rVert_{2}]+\sigma_{u}+\mathbb{E}[\lVert\Delta u^{*}_{t}\rVert_{2}]+\alpha L_{h}\mathbb{E}[\lVert h_{t}-\hat{h}_{t}\rVert_{2}]\\ \leq&\epsilon^{t+1}\mathbb{E}[\lVert u_{0}-u^{*}_{0}\rVert_{2}]+\tfrac{1}{1-\epsilon}\big{(}\sigma_{u}+\sup_{k\leq t}\mathbb{E}[\lVert\Delta u^{*}_{k}\rVert_{2}]\big{)}\\[-9.95863pt] &+\alpha L_{h}\sum_{k=0}^{t}\epsilon^{t-k}\mathbb{E}[\lVert h_{k}-\hat{h}_{k}\rVert_{2}]\\[-5.69046pt] \overset{\eqref{eq:convhproof}}{\leq}&\epsilon^{t+1}\mathbb{E}[\lVert u_{0}-u^{*}_{0}\rVert_{2}]\\ &+\tfrac{1}{1-\epsilon}\big{(}\sigma_{u}+\sup_{k\leq t}\mathbb{E}[\lVert\Delta u^{*}_{k}\rVert_{2}]+\sqrt{C_{h,3}}\alpha L_{h}\big{)}\\[-4.26773pt] &+\alpha L_{h}(t+1)\sqrt{C_{h,4}}\max(\epsilon,e^{\frac{-C_{h,5}}{2}})^{t}\\[-2.84544pt] \overset{t\to\infty}{\to}&\tfrac{1}{1-\epsilon}\big{(}\sigma_{u}+\sup_{k}\mathbb{E}[\lVert\Delta u^{*}_{k}\rVert_{2}]+\sqrt{C_{h,3}}\alpha L_{h}\big{)},\end{split}

where in the second inequality we apply the first one recursively. In the second and third inequality we bound the geometric series $\sum_{k=0}^{t}\epsilon^{t-k}\leq\frac{1}{1-\epsilon}$ , and use that $\sqrt{\cdot}$ is subadditive.