Learning to Seek: Multi-Agent Online Source Seeking Against Non-Stochastic Disturbances

Bin Du, Kun Qian, Christian Claudel, and Dengfeng Sun Bin Du is with College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China (iniesdu@nuaa.edu.cn)Kun Qian and Christian Claudel are with Department of Civil, Architectural, and Environmental Engineering, the University of Texas at Austin, Austin, TX 78712, USA ({kunqian, christian.claudel}@utexas.edu)Dengfeng Sun is with School of Aeronautics and Astronautics, Purdue University, West Lafayette, IN 47906, USA (dsun@purdue.edu)

Abstract

This paper proposes to leverage the emerging learning techniques and devise a multi-agent online source seeking algorithm under unknown environment. Of particular significance in our problem setups are: i) the underlying environment is not only unknown, but dynamically changing and also perturbed by two types of non-stochastic disturbances; and ii) a group of agents is deployed and expected to cooperatively seek as many sources as possible. Correspondingly, a new technique of discounted Kalman filter is developed to tackle with the non-stochastic disturbances, and a notion of confidence bound in polytope nature is utilized to aid the computation-efficient cooperation among multiple agents. With standard assumptions on the unknown environment as well as the disturbances, our algorithm is shown to achieve sub-linear regrets under the two types of non-stochastic disturbances; both results are comparable to the state-of-the-art. Numerical examples on a real-world pollution monitoring application are provided to demonstrate the effectiveness of our algorithm.

I Introduction

The problem of online source seeking, in which one or multiple agents are deployed to adaptively localize the underlying sources under a possibly unknown and disturbed environment, has gained considerably increasing attention recently among researchers in both control and robotics communities [1, 2, 3, 4]. Two challenges are of particular significance to solve such a source seeking problem: i) how to obtain a reliable perception or estimation via observations on the unknown environment; and ii) how to integrate the environment estimation with task planning for the agent(s) to seek sources in an online manner.

In order to tackle with the above two challenges, a variety of methodologies have been investigated in the literature, among which, the mainstream approaches are typically based on the estimation of environment gradients[5, 6, 7]. Considering that the sources are often associated with the maximum/minimum values of a function which is utilized to characterize the state of environment, thus the gradient based approaches naturally steer the agents to search along with the direction of estimated gradients toward the locations whose gradients are close to zero. An appealing feature of this method is often attributed to the fact that only local measurements are collected during the searching process without the knowledge of agents’ global positions. However, a critical disadvantage is that the agents are easily trapped into the local extremum when the environment can not be modeled as an ideal convex/concave function.

To further address the above issue, recent methods, building on certain learning techniques [8, 9], interplays processes of learning of an unknown environment and source seeking based on the learned environmental information. Particularly, a novel algorithm termed as AdaSearch is proposed in [8], which leverages the notions of upper and lower confidence bounds to guide the agent’s adaptive searching for the static sources. Our previous work [9] considers a more sophisticated searching scenario, in which i) the unknown environment follows certain linear dynamics, and thus the underlying sources are moving around; and ii) multiple agents are deployed simultaneously with the aim to cooperatively locate as many moving sources as possible. Indeed, one of the significant challenges in such a multi-agent source seeking setup is the combinatorial growth of the searching space as the increase of the number of agents. To deal with this challenge, we developed a novel notion of confidence bound, termed as D-UCB, which appropriately constructs a polytope confidence set and helps decompose the searching space for each agent. As a consequence, a linear complexity is achieved for our algorithm with respect to the number of agents, which enables the computation-efficient cooperation among the multiple agents.

Despite the remarkable feature of our D-UCB algorithm in reducing the computational complexity, one critical drawback is its dependence on the precise knowledge of the environment dynamics. Nevertheless, considering that uncertainties and/or disturbances are almost ubiquitous in practice, the knowledge of an exact model on the environment is barely available when considering real-world applications. To take into account the disturbances in system dynamics, a set of classical approaches, such as linear quadratic regulator, incorporates the stochastic processing noise which is usually assumed to be independent and identically (Gaussian) distributed and in most cases with zero-mean. Recently, with the great advancement of learning theory applied into control problems, relevant works started to turn to a new paradigm where the stochastic disturbances are replaced by non-stochastic ones. It is well recognized that, in most problems, the non-stochastic setup is more challenging than the stochastic one, as the standard statistical properties of the disturbances are no longer available. In addition, it is also more general, on the other hand, since the non-stochastic disturbances can not only characterize the modeling deviation of the environment, but also be interpreted as the one which is arbitrarily injected by an underlying adversary. As such, we consider in this paper the multi-agent online source seeking problem with the non-stochastic setup where the environment is disturbed by two types of non-stochastic disturbances. Our objective is to enhance the D-UCB algorithm with the capability of dealing with the non-stochastic disturbances while still enjoying the low computational complexity with a guaranteed source seeking performance.

I-A Related Works

As mentioned earlier, the predominate approaches to solve the source seeking problem, including the well-known technique of extremum seeking control [10, 11], often build on the environment gradient estimation. These approaches can be indeed viewed as variants of the first-order optimization algorithm, which drive the agent to search for the local extremum values. In particular, by modeling the unknown environment as a time-invariant and concave real-valued function, the authors in [12] designed the distributed source seeking control law for a group of cooperative agents. Besides, the diffusion process is further considered in [13] for investigating the scenario of dynamical environments. The source seeking problems are also studied in [14, 15] by forcing the multiple agents to follow a specific circular formation. In addition, the stochastic gradient based methods are proposed in [16, 17] when considering that the gradient estimation is subject to environment and/or measurement noises. We should note that, also inherited from the first-order optimization algorithm, the above gradient based methods are very likely to be stuck at local extremum points when the considered environment is non-convex/non-concave. Furthermore, the gradient estimation is also sensitive to the measurement/environment noise, and thus additional statistical properties on the noise, such as known distribution with zero-mean, need to be imposed as assumptions in the problem setup.

Whilst it is unknown how to deal with noises without statistical properties in the context of source seeking, non-stochastic disturbance has been considered increasingly broadly in control communities. Within the classical robust control framework, the non-stochastic disturbance is often treated by considering the worst-scenario performance, see e.g., [18]. However, more recent works related to the learning based control mainly concern about the development of adaptive approaches which aim at controlling typically a linear system with adversarial disturbances while optimizing certain objective function with respect to the system states and control inputs[19, 20, 21, 22, 23]. To measure the performance of adaptive controllers, the notion of regret is adopted; that is, to measure the discrepancy between the gain/cost of the devised controller and that of the best one in hindsight. In particular, the authors in [19] devise the first $\mathcal{O}(\sqrt{K})$ -regret algorithm by assuming a convex cost function and known system dynamics. Afterwards, such a regret bound is enhanced to be logarithmic in [20, 21] within the same problem setup. To further relax the requirement of the known dynamics, the authors in [22] develop the algorithm which attains $\mathcal{O}(K^{2/3})$ -regret, and such a bound is also improved to $\mathcal{O}(\sqrt{K})$ later in [21, 23]. Though the above works have investigated quite thoroughly the non-stochastic setting in the context of learning based control, we remark that our paper considers a different problem where some standard conditions in control, such as controllability and observability, can be no longer simply assumed. In fact, our problem is more related to a sequential decision process; that is, the agents make their source seeking decisions in sequence while interplaying with perception of the unknown environment.

This sequential feature also makes our setting closely related to the well-known problem of multi-armed bandits. Therefore, another rich line of relevant works is on the series of bandit algorithms. More specifically, involved with the non-stochastic disturbances, linear bandits are investigated within two settings of non-stationary environment and adversarial corruptions, respectively. While the former one interprets the non-stochastic disturbance as a variation of the environment, the latter one is corresponding to corruptions injected by potential adversaries. Both cases are well studied in literature with the development of algorithms guaranteeing sub-linear regrets. To deal with the environmental non-stationarity, the WindowUCB algorithm is first proposed in [24] along with the technique of sliding-window least squares. It is shown that the algorithm achieves the regret of $\widetilde{\mathcal{O}}(K^{2/3}B_{K}^{1/3})$ where $B_{K}$ is a measure to the level of non-stationarity. The same regret is proved for the weighted linear bandit algorithm proposed in [25], which leverages on the weighted least-square estimator. Further, a simple restart strategy is developed in [26], obtaining the same regret. It is indeed proved that the $\widetilde{\mathcal{O}}(K^{2/3}B_{K}^{1/3})$ -regret is the optimal one that can be achieved in the setting of non-stationary bandits. In terms of the adversarial bandits, a robust algorithm is proposed in [27] which guarantees the $\widetilde{\mathcal{O}}(B_{K}{K}^{3/4})$ -regret, and thus it is sub-linear only if the level of adversarial corruptions satisfies $B_{K}=o(K^{1/4})$ . More recently, such a regret has been improved to $\widetilde{\mathcal{O}}(B_{K}+\sqrt{K})$ in [28, 29] which is also shown to be nearly optimal in the adversarial setting. It can be concluded from the above discussion that, once $B_{K}$ grows sub-linearly, the regrets in both cases are guaranteed to be sub-linear. These are also the state-of-the-art that we are expected to achieve for our algorithm to be developed in this work.

I-B Statement of Contributions

This paper proposes an online source seeking algorithmic framework using the emerging learning technique, which is capable of i) dealing with the unknown environment in the presence of non-stochastic disturbances; and ii) taking advantages of the cooperation among the multi-agent network. In terms of the non-stochastic disturbances, two specific types of them are considered: i) an external one which disturbs the measurable states of the environment; and ii) an internal one which is truly evolved with the environment dynamics. To deal with them, an unified technique of discounted Kalman filtering is proposed to estimate the unknown environment states while mitigating the disturbances. Meanwhile, to build the cooperation among multiple agents and avoid the combinatorial complexity, we leverage the polytope confidence set, and as a result, the proposed algorithm is exceptionally computation-efficient in the multi-agent setting. It is shown by the regret analysis that our algorithm attains sub-linear regrets against both types of non-stochastic disturbances. The obtained two regrets are both comparable to the state-of-the-art in the studies of non-stationary and adversarial bandit algorithms. At last, all theoretical findings are validated by simulation examples on a real-world pollution monitoring application.

II Problem Statement

II-A Unknown Environment with Non-Stochastic Disturbances

Consider an obstacle-free environment which is assumed to be bounded and discretized by a finite set of points $\mathcal{S}$ where each $\mathbf{s}\in\mathcal{S}$ represents the corresponding position. Suppose that the unknown state of the environment at each discrete time $k$ is described by a real-valued function $\phi_{k}(\cdot):\mathcal{S}\to\mathbb{R}_{+}$ which maps the positional information $\mathbf{s}$ to a positive quantity $\phi_{k}(\mathbf{s})$ indicating the environmental value of interest. Let us denote $N$ the total number of all points, i.e., $N=|\mathcal{S}|$ , and for simplicity, denote $\bm{\phi}_{k}\in\mathbb{R}_{+}^{N}$ the vector which stacks all individual $\phi_{k}(\mathbf{s})$ . Further, to characterize dynamics of the changing environment, we consider that the evolution of state $\bm{\phi}_{k}$ is basically governed by the following nominal linear time-varying (LTV) model

\displaystyle\bm{\phi}_{k+1}=A_{k+1}\bm{\phi}_{k},

(1)

where the state transition matrix $A_{k}\in\mathbb{R}^{N\times N}$ is assumed to be known a prior. In order for the considered source seeking problem to be well-defined, we need the state $\bm{\phi}_{k}$ to be neither explosive nor vanishing to zero, which can be ensured by the following assumption.

Assumption 1

For the LTV dynamics (1), there exists a pair of uniform lower and upper bounds $0<\underaccent{\bar}{\alpha}\leq\bar{\alpha}<\infty$ such that, for $\forall k\geq t>0$ ,

\displaystyle\underaccent{\bar}{\alpha}\cdot\mathbf{I}_{N}\preceq A[k:t]^{\top}A[k:t]\preceq\bar{\alpha}\cdot\mathbf{I}_{N},

(2)

where $\mathbf{I}_{N}$ represents the $N\times N$ identity matrix and the state propagation matrix¹¹1By convention, we let $A[k:t]=\mathbf{I}_{N}$ when $k<t$ . is defined as $A[k:t]:=A_{k}A_{k-1}\cdots A_{t}$ .

We should note that the above Assumption 1 not only helps confine the behavior of the environment states, but also implies the invertibility of the state transition matrices $A_{k}$ ’s which aids the subsequent regret analysis of our algorithm. In fact, such an assumption is not unusual in the study of system control and estimation problems; see e.g., [30, 31, 32, 33].

Now, in order to further impose the underlying disturbances into the environment model, let us consider the following two types of non-stochastic ones on top of the nominal dynamics:

		$\displaystyle\texttt{Type I}:$	$\widetilde{\bm{\phi}}_{k+1}=A_{k+1}{\bm{\phi}}_{k}+\bm{\delta}_{k}$ .		(3a)
		$\displaystyle\texttt{Type II}:$	$\widetilde{\bm{\phi}}_{k+1}=A_{k+1}\widetilde{\bm{\phi}}_{k}+\bm{\delta}_{k}$ ,		(3b)

Note that in both types, $\widetilde{\bm{\phi}}_{k}\in\mathbb{R}_{+}^{N}$ denotes the disturbed state. However, while the first type of disturbance can be interpreted as an external one since $\bm{\phi}_{k}$ in (3a) is still evolved according to the nominal dynamics (1) and the disturbance $\bm{\delta}_{k}$ only affects the state $\widetilde{\bm{\phi}}_{k+1}$ in one step, the second type can be viewed as an internal one since the disturbance $\bm{\delta}_{k}$ is intrinsically imposed into the dynamics and accumulated during the evolution of $\widetilde{\bm{\phi}}_{k}$ . In fact, we shall remark that the two types of disturbances both find a wide range of real-world applications. For instance, in the scenario of pollution monitoring which is investigated in our simulations, the external disturbance could correspond to certain unrelated emitters which will not change the locations of sources of interest but interfere the perceptible environment states, the internal one might result from some environmental conditions, such as wind, which will truly affect the diffusion of pollutants and thus change their positions. It is also enlightened by the provided example that the localization of sources should be considered differently for the above two cases. More details will be found in Section II-B. In addition, we note that the internal disturbance can be also used to capture to some extent the unmodeled dynamics of the unknown environment. However, no matter which type of disturbances is involved in the process, only the disturbed state $\widetilde{\bm{\phi}}_{k}$ is measurable for the agents which are employed to operate in the environment later.

As we have remarked earlier, the disturbances of both types are supposed to be non-stochastic, i.e., no statistical property in any form is assumed regarding $\bm{\delta}_{k}$ . Instead, to characterize the effect of both disturbances in long term, we consider to impose the following assumption.

Assumption 2

There exists a positive sequence $\{B_{K}\}_{K\in\mathbb{N}_{+}}$ such that, for $\forall K\geq 0$ ,

\displaystyle\sum_{k=0}^{K}\|\bm{\delta}_{k}\|\leq B_{K}.

(4)

Remark 1

The sequence $\{B_{K}\}_{K\in\mathbb{N}_{+}}$ in Assumption 2 is not necessarily required to be bounded by some constant in our work. In fact, we consider the problem under the condition that $B_{K}$ increases at a sub-linear rate and aim to provide a performance guarantee for our algorithm on the dependence of $B_{K}$ . It is often implied by the sub-linear increasing $B_{K}$ that either the total number of occurrence of the disturbance $\bm{\delta}_{k}$ increases sub-linearly or the effect of disturbance $\|\bm{\delta}_{k}\|$ vanishes to zero over the time-steps $k$ . While the former is often referred to as the abrupt-changing disturbance, the latter is regarded as the slowly-varying one. In addition, in the context of learning theory in adversarial/non-stationary settings, such a sequence $\{B_{K}\}_{K\in\mathbb{N}_{+}}$ is also viewed as an attack budget of an adversary; see e.g., [34, 28].

II-B Multi-Agent Source Seeking

With the aim to locate the potential sources which usually correspond to the extreme values in the unknown environment state, we deploy a network of $I$ agents and expect each of them $i\in\mathcal{I}:=\{1,2,\cdots,I\}$ to seek its best positions $\mathbf{p}_{k}^{\star}[i]\in\mathcal{S}$ at each time $k$ by solving the following maximization problem,

\displaystyle\mathop{\text{maximize}}\limits_{\mathbf{p}[i]\in\mathcal{S},\,i\in\mathcal{I}}

\displaystyle F_{k}(\mathbf{p}[1],\mathbf{p}[2],\cdots,\mathbf{p}[I])=\sum_{\mathbf{s}\in\cup_{i=1}^{I}\mathbf{p}[i]}\phi_{k}(\mathbf{s}).

(5)

Notice that the summation involved in the objective function $F_{k}(\cdot):\mathcal{S}^{I}\to\mathbb{R}_{+}$ takes into account the union of positions $\mathbf{p}[i]$ ’s, therefore all agents will naturally tend to locate as many distinct positions as possible for the purpose of maximizing $F_{k}(\cdot)$ . In addition, it is now clear to see the reason why Assumption 1 would be needed, i.e., the maximization in (5) is otherwise not well-defined if the environment state $\bm{\phi}_{k}$ explodes or vanishes to zero. Further, we should also note that an inherent difference will take place in the counted state $\bm{\phi}_{k}$ involved in the objective function when considering the disturbances of the two types. More precisely, for the first type of disturbance, i.e., the external one, the positions of sources should be indeed reflected by the undisturbed $\bm{\phi}_{k}$ , though only the information of disturbed $\widetilde{\bm{\phi}}_{k}$ is measurable for agents. On the contrary, for the second type, i.e., the internal disturbance, the disturbed $\widetilde{\bm{\phi}}_{k}$ should be taken into account in (5), since $\bm{\delta}_{k}$ is evolved in the environment dynamics and changes the positions of sources. On this account, we emphasize that while the maximization problem (5) is precisely the one that the agents would like to solve when considering the external disturbance, yet for the internal one, the objective function should be amended as

\displaystyle\widetilde{F}_{k}(\mathbf{p}[1],\mathbf{p}[2],\cdots,\mathbf{p}[I])=\sum_{\mathbf{s}\in\cup_{i=1}^{I}\mathbf{p}[i]}\widetilde{\phi}_{k}(\mathbf{s}).

(6)

With the above difference presented in the objective functions, the main challenges in solving the maximization problems are also distinguishable in principle. Whilst the former is to extract the true information hidden in $\bm{\phi}_{k}$ in the case that only $\widetilde{\bm{\phi}}_{k}$ is accessible, the latter is to identify and compensate the unmodeled disturbance $\bm{\delta}_{k}$ . Despite this difference, we develop in this paper an unified algorithmic framework for both cases, enabling the agents to track the dynamical sources in an online manner. We remark that this is also one of the main contributions of our work.

Another common technical issue, regardless of types of the disturbances involved, is to deal with the estimation of the environment. Therefore, we leverage on the following linear stochastic measurement model,

\displaystyle\mathbf{z}^{i}_{k}=H^{i}\big{(}\mathbf{p}_{k}[i]\big{)}\widetilde{\bm{\phi}}_{k}+\mathbf{n}_{k}^{i},

(7)

where $\mathbf{z}_{k}^{i}\in\mathbb{R}^{m}$ is the $i$ -th agent’s obtained measurement at the time-step $k$ ; $H^{i}\big{(}\mathbf{p}_{k}[i]\big{)}\in\mathbb{R}^{m\times N}$ denotes the measurement matrix depending on the agent’s position $\mathbf{p}_{k}[i]$ ; and $\mathbf{n}_{k}^{i}\in\mathbb{R}^{m}$ is the measurement noise which is assumed to be independent and identically distributed (i.i.d.) Gaussian with zero mean and variance $V^{i}=v^{i}\cdot\mathbf{I}_{m}$ . We shall note that the measurement matrix $H^{i}\big{(}\mathbf{p}_{k}[i]\big{)}$ is not specified in (7). In fact, it can be defined by various means based on the agent’s position. Nevertheless, we assume that each $H^{i}\big{(}\mathbf{p}_{k}[i]\big{)}$ has the following basic form,

\displaystyle H^{i}\big{(}\mathbf{p}_{k}[i]\big{)}=\big{[}\mathbf{e}_{l}\big{]}^{\top}_{l\in\mathcal{C}_{k}^{i}},

(8)

where $\mathbf{e}_{l}$ denotes the unit vector, i.e., the $l$ -th column of the identity matrix, and $\mathcal{C}_{k}^{i}$ is the set of positions which are covered by the agent’s sensing area at the time-step $k$ . It is natural to assume that the position where the agent currently locates falls into its sensing area, i.e., $\mathbf{p}_{k}[i]\in\mathcal{C}_{k}^{i}$ .

III Development of the Online ALgorithm

In this section, we develop our online source seeking algorithm which relies on two central ingredients: 1) a discounted Kalman filter, which is capable of providing an estimation on the unknown environment while dealing with the two types of non-stochastic disturbances in an unified framework; and 2) a D-UCB approach, which helps determine the agents’ seeking positions sequentially in an computation-efficient manner.

III-A Estimation of the Environment States with Disturbances

According to the measurement model (7) introduced in the previous section, let us first express it as a compact form which counts for all agents within the network. For this purpose, we stack all the measurements $\mathbf{z}_{k}^{i}$ ’s and also the noise $\mathbf{n}_{k}^{i}$ ’s as the concatenated vectors $\mathbf{z}_{k}\in\mathbb{R}^{M}$ and $\mathbf{n}_{k}\in\mathbb{R}^{M}$ with $M=mI$ , e.g., $\mathbf{z}_{k}:=[(\mathbf{z}^{1}_{k})^{\top},(\mathbf{z}^{2}_{k})^{\top},\cdots,(\mathbf{z}^{I}_{k})^{\top}]^{\top}\in\mathbb{R}^{M}$ . Likewise, we define the concatenated measurement matrix $H_{k}\in\mathbb{R}^{M\times N}$ by stacking all local $H^{i}(\mathbf{p}_{k}[i])$ ’s. Consequently, the measurement model of the compact form can be written as

\displaystyle\mathbf{z}_{k}=H_{k}\widetilde{\bm{\phi}}_{k}+\mathbf{n}_{k}.

(9)

Note that, in the notation $H_{k}$ , we have absorbed for simplicity the dependency on agents’ positions $\mathbf{p}_{k}[i]$ ’s into the index $k$ . In addition, by our assumption on the measurement noise, one can have that the concatenated noise $\mathbf{n}_{k}$ is also i.i.d. Gaussian with zero-mean and variance being

\displaystyle V:=\text{Diag}\{V^{1},V^{2},\cdots,V^{I}\}\in\mathbb{R}^{M\times M}.

(10)

Equipped with the agents’ measurement model in its compact form (9), we are now ready to present the technique of discounted Kalman filtering. Similar to the standard Kalman filter, we also use mean $\widehat{\bm{\phi}}_{k}\in\mathbb{R}^{N}$ and covariance $\Sigma_{k}\in\mathbb{R}^{N\times N}$ to recursively generate estimates of the unknown environment. However, a primary difference is that two positive sequences of weights $\{\lambda_{k}\}_{k\in\mathbb{N}_{+}}$ and $\{\omega_{k}\}_{k\in\mathbb{N}_{+}}$ are imposed in the filtering process with the aim to mitigate the effect of disturbances presented in the environment. Keep this in mind, the discounted Kalman filter performs the following recursions,


	$\displaystyle\Sigma_{k+\scriptscriptstyle{1/2}}=\big{(}\Sigma_{k}^{-1}+\lambda_{k}Y_{k}\big{)}^{-1},$		(11a)
	$\displaystyle\widehat{\bm{\phi}}_{k+\scriptscriptstyle{1/2}}=\widehat{\bm{\phi}}_{k}+\lambda_{k}\Sigma_{k+\scriptscriptstyle{1/2}}(\mathbf{y}_{k}-Y_{k}\widehat{\bm{\phi}}_{k}),$		(11b)
	$\displaystyle\Sigma_{k+1}=A_{k+1}\big{(}\Sigma_{k+\scriptscriptstyle{1/2}}^{-1}+(\omega_{k}-\omega_{k-1})\Gamma_{k}^{-1}\big{)}^{-1}A_{k+1}^{\top},$		(11c)
	$\displaystyle\widehat{\bm{\phi}}_{k+1}=\Sigma_{k+1}A_{k+1}^{-\top}\Sigma_{k+\scriptscriptstyle{1/2}}^{-1}\widehat{\bm{\phi}}_{k+\scriptscriptstyle{1/2}},$		(11d)
	$\displaystyle\Gamma_{k+1}=A_{k+1}\Gamma_{k}A_{k+1}^{\top}.$		(11e)

Notice that $\Sigma_{k+\scriptscriptstyle{1/2}}\in\mathbb{R}^{N\times N}$ and $\widehat{\bm{\phi}}_{k+\scriptscriptstyle{1/2}}\in\mathbb{R}^{N}$ here denote the intermediate results during the recursions; $\Gamma_{k}\in\mathbb{R}^{N\times N}$ is an auxiliary matrix initialized by $\Gamma_{0}=\mathbf{I}_{N}$ ; and the variables $\mathbf{y}_{k}:=H_{k}^{\top}V^{-1}\mathbf{z}_{k}\in\mathbb{R}^{N}$ and $Y_{k}:=H_{k}^{\top}V^{-1}H_{k}\in\mathbb{R}^{N\times N}$ , which can be readily acquired by consensus schemes, i.e., [35], incorporate latest measurements into the update of estimates. Next, to better show how the imposed weights help deal with the non-stochastic disturbances, we present in the subsequent lemma another expression of the discounted Kalman filter (11).

Lemma 1

Suppose that the state estimates $\widehat{\bm{\phi}}_{k}$ and $\Sigma_{k}$ are generated by (11) with the initialization and $\omega_{-1}=0$ , then at each iteration $k$ , it is equivalent to have


$\displaystyle\Sigma_{k}$	$\displaystyle=A[k:1]\Upsilon_{k}^{-1}A[k:1]^{\top};$	(12a)
$\displaystyle\widehat{\bm{\phi}}_{k}$	$\displaystyle=A[k:1]\Upsilon_{k}^{-1}\Big{(}\Sigma_{0}^{-1}\widehat{\bm{\phi}}_{0}+\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}\mathbf{y}_{t}\Big{)},$	(12b)

where the matrix $\Upsilon_{k}\in\mathbb{R}^{N\times N}$ is defined as

\displaystyle\Upsilon_{k}:=\Sigma_{0}^{-1}+\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}A[t:1]+\omega_{k-1}\cdot\mathbf{I}_{N}.

(13)

Proof:

See Appendix I. ∎

Remark 2

According to the form (12) of the discounted Kalman filter, it can be observed that the sequence $\{\lambda_{k}\}_{k\in\mathbb{N}_{+}}$ serves to adjust weights on the measurements obtained during the process. Considering that the cumulative quantity of the disturbance is upper bounded by the sequence $\{B_{K}\}_{K\in\mathbb{N}+}$ ; see Assumption 2, this implies that, in general, the influence of disturbances vanishes over time if $B_{K}$ increases sub-linearly. In this case, the significant disturbance which took place at the early stage can be expected to be gradually mitigated by performing the discounted Kalman filtering. Further, unlike the weight $\lambda_{k}$ which is only added to the measurements locally, another sequence of weights $\{\omega_{k}\}_{k\in\mathbb{N}_{+}}$ is applied to globally adjust the covariance $\Sigma_{k}$ so that it can compensate the effect of internal disturbances more directly.

III-B Multi-Agent Online Source Seeking via D-UCB

Based on $\widehat{\bm{\phi}}_{k}$ and $\Sigma_{k}$ , we now introduce the key notion of D-UCB $\bm{\mu}_{k}\in\mathbb{R}^{N}$ , which is defined as follows,

\displaystyle\bm{\mu}_{k}:=\widehat{\bm{\phi}}_{k}+\beta_{k}(\delta)\cdot\text{diag}^{1/2}(\Sigma_{k}).

(14)

Note that the operator $\text{diag}^{1/2}(\cdot):\mathbb{R}^{N\times N}\to\mathbb{R}^{N}$ maps the square root of the matrix diagonal elements to a vector, and the sequence $\{\beta_{k}(\delta)\}_{k\in\mathbb{N}_{+}}$ depending on a predefined confidence level $\delta$ will be specified subsequently in the next section. With the aid of the defined D-UCB $\bm{\mu}_{k}$ , one can update the agents’ seeking positions in an online manner, by solving the following maximization problem:

\displaystyle\mathbf{p}_{k}\in\operatorname*{arg\,max}_{\mathbf{p}[i]\in\mathcal{S},\,i\in\mathcal{I}}\sum_{\mathbf{s}\in\cup_{i=1}^{I}\mathbf{p}[i]}{\mu}_{k}(\mathbf{s}).

(15)

Here, $\mathbf{p}_{k}\in\mathcal{S}^{I}$ stacks the decided seeking positions $\mathbf{p}[i]_{k}$ ’s for all agents, and likewise, ${\mu}_{k}(\mathbf{s})\in\mathbb{R}$ represents one component of the vector ${\bm{\mu}}_{k}$ which corresponds to the position $\mathbf{s}\in\mathcal{S}$ . The complete multi-agent online source seeking scheme under the environment with disturbances is outlined in Algorithm 1.

Initialization: Each agent

i

initializes its estimates

\widehat{\bm{\phi}}^{0}

and

\Sigma_{0}

, and computes its initial position

\mathbf{p}_{0}[i]

. Set the

confidence

\delta

and generate the sequence

\{\beta_{k}(\delta)\}_{k\in\mathbb{N}_{+}}

while the stopping criteria is NOT satisfied do

Each agent

i

simultaneously performs

Step 1 (Measuring): Obtain the measurement

\mathbf{z}_{i}^{k}

based on the measurement matrix

H^{i}(\mathbf{p}_{k}[i])

;

Step 2 (Discounted Kalman Filtering): Collect information from neighbors, obtain the estimates

\widehat{\bm{\phi}}_{k+1}

and

\Sigma_{k+1}

by (11);

Step 3 (D-UCB Computing): Compute via (14) the updated D-UCB

\bm{\mu}^{k+1}

\widehat{\bm{\phi}}_{k+1}

and

\Sigma_{k+1}

;

Step 4 (Seeking Positions Updating): Assign the new seeking position

\mathbf{p}_{k+1}[i]

by solving (15).

Let

k\leftarrow k+1

, and continue.

end while

Algorithm 1 Multi-agent online source seeking under environment with non-stochastic disturbances

IV Regret Analysis

In this section, we provide theoretical performance guarantee for our algorithm by the notion of regret. More specifically, we perform the regret analysis for both cases which are subject to the two types of non-stochastic disturbances, respectively. By showing the sub-linear cumulative regrets for both cases, it is ensured that the agents are capable of tracking the dynamical sources under an unknown and disturbed environment.

IV-A On the Disturbance of Type I

As we have informed in the previous discussion, for the first type of disturbance, the objective function $F_{k}(\cdot)$ in (5) takes into account the undisturbed state $\bm{\phi}_{k}$ . Therefore, we introduce the notion of regret for the first case as follows,

\displaystyle r_{k}:=F_{k}(\mathbf{p}_{k}^{\star})-F_{k}(\mathbf{p}_{k}),

(16)

where $\mathbf{p}_{k}^{\star}$ denotes the optimal solution to problem (5) and $\mathbf{p}_{k}$ corresponds to the decision generated by our source seeking algorithm. Here, we aim to show that the cumulative regret, i.e., $R_{K}:=\sum_{k=0}^{K}r_{k}$ , increases sub-linearly with respect to the number of time-steps $K$ , namely the regret $r_{k}$ converges to zero on average. To this end, let us first show the following result which formalizes that the D-UCB $\bm{\mu}_{k}$ provides indeed a valid upper bound for the unknown state $\bm{\phi}_{k}$ .

Proposition 1

Under Assumptions 1 and 2, let $\widehat{\bm{\phi}}_{k}$ and $\Sigma_{k}$ be generated by the discounted Kalman filter (11) with $\omega_{k}\equiv 0$ and $\lambda_{k}=\min\{1,\bar{\lambda}/\|Y_{k}\|_{\Sigma_{k}}\}$ . Suppose that the initialization satisfies $\underaccent{\bar}{\sigma}\cdot\mathbf{I}_{N}\preceq\Sigma_{0}\preceq\bar{\sigma}\cdot\mathbf{I}_{N}$ and likewise the noise variance has $\underaccent{\bar}{v}\cdot\mathbf{I}_{M}\preceq V\preceq\bar{v}\cdot\mathbf{I}_{M}$ , then it holds that,

\displaystyle{\mathbb{P}(\bm{\phi}_{k}\;{\preceq}\;\bm{\mu}_{k})\geq 1-\delta},\quad\forall k\geq 0,

(17)

where ${\preceq}$ is defined in element-wise, the probability $\mathbb{P}(\cdot)$ is taken on random noises $(\mathbf{n}_{1},\mathbf{n}_{2},\cdots,\mathbf{n}_{k})$ and the sequence $\{\beta_{k}(\delta)\}_{k\in\mathbb{N}_{+}}$ in D-UCB is chosen satisfying

	$\displaystyle\beta_{k}(\delta)\geq$	$\displaystyle\sqrt{N}\cdot\Bigg{(}\bar{\lambda}B_{k}+C_{1}$		(18)
		$\displaystyle+C_{2}\sqrt{N}\cdot\sqrt{\log\Big{(}\frac{\bar{\sigma}/\underaccent{\bar}{\sigma}+\bar{\alpha}\bar{\sigma}\cdot k/\underaccent{\bar}{v}^{2}}{\delta^{2/N}}\Big{)}}\Bigg{)},$		(18)

where $B_{k}$ is defined in Assumption 2, $C_{1}=\|\widehat{\bm{\phi}}_{0}-\bm{\phi}_{0}\|/\sqrt{\underaccent{\bar}{\sigma}}$ and $C_{2}=\bar{v}^{2}\sqrt{\max\{2,2/\underaccent{\bar}{v}\}}$ .

Proof:

See Appendix II-A. ∎

It can be concluded from Proposition 1 that the D-UCB $\bm{\mu}_{k}$ is guaranteed to be an upper bound for $\bm{\phi}_{k}$ with probability at least $1-\delta$ . In fact, considering that the disturbance of type I is not really evolved in the environment dynamics, the weight $\omega_{k}$ is thus set to be zero during the whole process. Further, to extract the true information, we set the weight $\lambda_{k}$ adaptively according to the timely estimation of the environment. Since the estimate covariance $\Sigma_{k}$ is, in general, decreased as more measurements are absorbed during the filtering process, it can be seen that the sequence $\{\lambda_{k}\}_{k\in\mathbb{N}_{+}}$ will increase with an upper bound set to be $\bar{\lambda}$ . With the help of Proposition 1, we are now ready to present the result of regret analysis for our algorithm.

Theorem 1

Suppose that $\{\mathbf{p}_{k}\}_{k\in\mathbb{N}_{+}}$ is the sequence generated by Algorithm 1 under the conditions in Proposition 1, let $\bar{\lambda}$ be specified as $\bar{\lambda}=\sqrt{N}/B_{K}$ , it holds with probability at least $1-\delta$ that,

\displaystyle R_{K}\leq\widetilde{\mathcal{O}}\Big{(}N^{2}\sqrt{K}+N^{5/2}B_{K}\Big{)},\quad\forall K>0.

(19)

Proof:

See Appendix II-B. ∎

IV-B On the Disturbance of Type II

Similar to the previous analysis, to provide the performance guarantee for our algorithm in this part, we also rely on the notion of regret. However, considering the difference in features of the disturbance of type II; see details in Section II-B, the definition of regret should be amended accordingly,

\displaystyle\widetilde{r}_{k}:=\widetilde{F}_{k}(\mathbf{p}_{k}^{\star})-\widetilde{F}_{k}(\mathbf{p}_{k}),

(20)

where the objective function $\widetilde{F}_{k}(\cdot)$ is defined in (6). Likewise, it is also expected to show a sub-linear cumulative regret, i.e., $\widetilde{R}_{K}:=\sum_{k=0}^{K}\widetilde{r}_{k}$ increases sub-linearly with respect to $K$ .

Due to the fact that the disturbance of type II is imposed in the environment dynamics, the current state $\widetilde{\bm{\phi}}_{k}$ inherently accumulates all disturbances prior to the time $k$ . As a result, it is not necessarily implied by Assumptions 1 and 2 that $\widetilde{\bm{\phi}}_{k}$ is upper bounded if the sequence $B_{K}$ is allowed to be increased infinitely. Thus, to ensure the well-definedness of our problem, we need an additional assumption.

Assumption 3

There exists an uniform upper bound $\bar{\phi}>0$ such that $\|\widetilde{\bm{\phi}}_{k}\|\leq\bar{\phi},\forall k\geq 0$ .

Now, we follow the similar path as in the previous analysis to show the sub-linear regret of $\widetilde{R}_{K}$ . Note that, due to the long term effect of the second type disturbance in the state $\widetilde{\bm{\phi}}_{k}$ , one cannot expect that the D-UCB $\bm{\mu}_{k}$ serves as an upper bound for $\widetilde{\bm{\phi}}_{k}$ . To deal with this issue, we construct an auxiliary variable $\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\in\mathbb{R}^{N}$ , i.e.,

	$\displaystyle\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}$	$\displaystyle:=A[k:1]\Upsilon_{k}^{-1}\Big{(}\Sigma_{0}^{-1}\widetilde{\bm{\phi}}_{0}$		(21)
		$\displaystyle+\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}H_{t}\widetilde{\bm{\phi}}_{t}+\lambda_{k-1}A[k:1]^{-1}\widetilde{\bm{\phi}}_{k}\Big{)},$		(21)

which helps build a connection between $\bm{\mu}_{k}$ and the state $\widetilde{\bm{\phi}}_{k}$ as shown in the following propositions.

Proposition 2

Under Assumptions 1–3 and the conditions in Proposition 1, let $\widehat{\bm{\phi}}_{k}$ and $\Sigma_{k}$ be generated by the discounted Kalman filter (11) with $\lambda_{k}=\omega_{k}=(1/\gamma)^{k}$ where $0<\gamma<1$ , then it holds that,

\displaystyle{\mathbb{P}(\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\;{\preceq}\;\bm{\mu}_{k})\geq 1-\delta},\quad\forall k\geq 0,

(22)

when the sequence $\{\beta_{k}(\delta)\}_{k\in\mathbb{N}_{+}}$ in D-UCB satisfies

		$\displaystyle\beta_{k}(\delta)\geq\sqrt{N}\cdot\Bigg{(}C_{1}+C_{3}{\gamma}^{(1-k)/2}$		(23)
		$\displaystyle+C_{2}\sqrt{N}{\gamma}^{(1-k)/2}\cdot\sqrt{\log\Big{(}\frac{1+\bar{\alpha}/\underaccent{\bar}{v}^{2}\cdot\sum_{t=0}^{k-1}\gamma^{2(k-t-1)}}{\delta^{2/N}}\Big{)}}\Bigg{)},$		(23)

where $C_{1}$ and $C_{2}$ are defined as same as in Proposition 1 and $C_{3}=\bar{\phi}/\sqrt{\underaccent{\bar}{\alpha}}$ .

Proof:

See Appendix II-C. ∎

It is proved by Proposition 2 that the D-UCB $\bm{\mu}_{k}$ provides a valid upper bound for the constructed variable $\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}$ if $\beta_{k}(\delta)$ is chosen appropriately. To further build the connection between $\bm{\mu}_{k}$ and the true state $\widetilde{\bm{\phi}}_{k}$ , it can be shown that the discrepancy between $\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}$ and $\widetilde{\bm{\phi}}_{k}$ will be bounded by a term related to the disturbances $\bm{\delta}_{k}$ . However, for the sake of presentation, such a result will be deferred, and we directly provide the statement of sub-linear regret for our algorithm in the following theorem. The bound of $\|\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}-\widetilde{\bm{\phi}}_{k}\|$ will be shown as an intermediate step of proofs for the theorem in Appendix.

Theorem 2

Suppose that $\{\mathbf{p}_{k}\}_{k\in\mathbb{N}_{+}}$ is the sequence generated by Algorithm 1 under the conditions in Proposition 2, let ${\gamma}$ be specified as $\gamma=1-(B_{K}/K)^{2/3}$ , then it holds that with probability $1-\delta$ ,

\displaystyle\widetilde{R}_{K}\leq\widetilde{\mathcal{O}}\Big{(}N^{2}B_{K}^{1/3}K^{2/3}\Big{)},\quad\forall K>0.

(24)

Proof:

See Appendix II-D ∎

Remark 3

Note that, in Proposition 2 and Theorem 2, the two weights $\lambda_{k}$ and $\omega_{k}$ are specified as $\lambda_{k}=\omega_{k}=(1/\gamma)^{k}$ where $\gamma<1$ . This means that they will increase exponentially with respect to the time-step $k$ . Therefore, numerical overflow may arise in the discounted Kalman filtering as shown in (11), when $k$ is large. To deal with this issue, we notice that the discounted Kalman filter, when $\lambda_{k}$ and $\omega_{k}$ are chosen as above, can be implemented equivalently by the following recursions,


	$\displaystyle\widetilde{\Sigma}_{k+\scriptscriptstyle{1/2}}=\big{(}\gamma\widetilde{\Sigma}_{k}^{-1}+Y_{k}\big{)}^{-1},$	(25a)
	$\displaystyle\widehat{\bm{\phi}}_{k+\scriptscriptstyle{1/2}}=\widehat{\bm{\phi}}_{k}+\widetilde{\Sigma}_{k+\scriptscriptstyle{1/2}}(\mathbf{y}_{k}-Y_{k}\widehat{\bm{\phi}}_{k}),$	(25b)
	$\displaystyle\widetilde{\Sigma}_{k+1}=A_{k+1}\Big{(}\widetilde{\Sigma}_{k+\scriptscriptstyle{1/2}}^{-1}+(1-\gamma)\Gamma_{k}^{-1}\Big{)}^{-1}A_{k+1}^{\top},$	(25c)
	$\displaystyle\widehat{\bm{\phi}}_{k+1}=\Big{(}A_{k+1}-(1-\gamma)\widetilde{\Sigma}_{k+1}A_{k+1}^{-\top}\Gamma_{k}^{-1}\Big{)}\widehat{\bm{\phi}}_{k+\scriptscriptstyle{1/2}},$	(25d)
where $\Gamma_{k}$ is defined as same as before. It should be also noted that, in (25), the covariance is slightly different from the one in (11), in the sense that $\widetilde{\Sigma}_{k}=(1/\gamma)^{k-1}\Sigma_{k}$ . This needs to be taken into account in Algorithm 1 when generating the D-UCB $\bm{\mu}_{k}$ by using $\widetilde{\Sigma}_{k}$ .

IV-C Further Discussions

Before the end of this section, a few more remarks should be added on the obtained results of the above regret analysis.

First, to tackle with the two types of non-stochastic disturbances, it can be seen from the propositions that the sequences of weights $\lambda_{k}$ and $\omega_{k}$ are also determined differently. More specifically, for the external disturbance which only affects the measured state $\widetilde{\bm{\phi}}_{k}$ but not evolve with the nominal dynamics, the sequence of $\lambda_{k}$ is chosen as increased at the same rate of $1/\|Y_{k}\|_{\Sigma_{k}}$ . This is due to the fact that the disturbance $\bm{\delta}_{k}$ in this case only comes into play when the state is measured, and thus the weight $\lambda_{k}$ is also adjusted according to the measurement information in $Y_{k}$ and the current progress on $\Sigma_{k}$ . Since the covariance $\Sigma_{k}$ , which basically suggests the uncertainty of our estimation, decreases as more measurements are absorbed in the estimation, the weight $\lambda_{k}$ is increased during the process, meaning that the measurement received later are more trusted. For the internal disturbance, the sequence of $\lambda_{k}$ is also chosen as increased, but at a fixed exponential rate of $(1/\gamma)^{k}$ . Another primary difference is that, while $\omega_{k}$ is set to be zero previously, here we let $\omega_{k}$ increase at the same exponential rate of $(1/\gamma)^{k}$ . The reason for such a difference can be explained as follows. Since the internal disturbance, regardless of the measurements, is accumulated during the whole process, an additional weight needs to be incorporated to deal with it globally, and therefore the increasing $\omega_{k}$ is introduced to decrease the covariance $\Sigma_{k}$ accordingly. Note that this does not mean the uncertainty of our estimation is decreased brutally, as in the D-UCB $\bm{\mu}_{k}$ , the sequence of $\beta_{k}(\delta)$ is also increased by an extra term related to $1/\gamma$ to adjust our construction of the confidence bound.

Second, it can be concluded by the two theorems that once the disturbance bound $B_{K}$ increases sub-linearly, the regrets generated by our algorithm for both cases also grow at a sub-linear rate, meaning that the agents will be able to track the moving sources dynamically under the disturbed environment. More precisely, while the regret for the first case increases at the rate of $\widetilde{\mathcal{O}}(\sqrt{K}+B_{K})$ , the rate is $\widetilde{\mathcal{O}}(B_{K}^{1/3}K^{2/3})$ for the second case. Note that both of them are identical to the state-of-the-art results in the study of bandit algorithms with non-stationary and adversarial settings. Therefore, we can conclude that our developments of discounted Kalman filter and D-UCB do not degrade the performance of algorithm with respect to its convergence. However, in terms of the scale of the problem $N$ , i.e., size of the searching environment, the complexity of our algorithm indeed grows at the rate of $\widetilde{\mathcal{O}}(N^{2})$ , as compared to $\widetilde{\mathcal{O}}(N)$ in the literature. This is mainly due to the reason that the ellipsoid confidence sets in the classical UCB-based methods is changed to the polytope one in our algorithm. Despite this fact, we argue that such an increase of complexity is actually reasonable, since more computational complexities have been reduced by avoiding the combinatorial problems at each step.

V Simulation

In this section, numerical examples are provided to validate the effectiveness of our multi-agent source seeking algorithm. We consider a pollution monitoring application where three mobile robots are deployed in a pollution diffusion field with the aim to localize as many leaking sources as possible. The dynamics of the pollution field is governed a convection diffusion equation. More details of the simulation settings can be found in [9], including linearization of the partial differential equation, robots’ measurement models and their communication topology, specification of the pollution field, etc. However, a key difference here is that the non-stochastic disturbances are assumed to be present after the linearization of the dynamics. More concretely, the linearized model of the pollution field is represented by

\displaystyle\bm{\Phi}_{k+1}=A\mathbf{\bm{\Phi}}_{k}+\bm{\delta_{k}},

(26)

where $\mathbf{\bm{\Phi}}_{k}$ denotes the discretized states of the field, $A$ is the state transition matrix and $\bm{\delta_{k}}$ represents the non-stochastic disturbance.

In particular, we consider in this simulation that the pollution field is modeled by a $D\times D$ lattice with $D=50$ . Each of the mobile robots is capable of sensing a circular area with radius $R=5$ during the searching process. The sensing noise is assumed to be i.i.d. Gaussian with zero-mean and covariance $V^{i}=4\cdot\mathbf{I}_{m},i=1,2,3$ . In terms of the disturbance, we here consider two different scenarios: i) a slowly-varying disturbance which occurs externally; and ii) an abruptly-changing which occurs internally. For the slowly-varying disturbance of type I, it is assumed that $\bm{\delta}_{k}=0$ when $k<100$ and $\bm{\delta}_{k}=1/k^{2}\cdot\bm{\Pi}_{0}$ when $k\geq 100$ where $\bm{\Pi}_{0}$ is randomly generated. For the abruptly-changing disturbance of type II, we consider that two more leaking sources are randomly injected into the field during the period of $[150,165]$ and $[600,615]$ . That is, $\bm{\delta}_{k}=\bm{\Pi}_{1}$ for $150\leq k\leq 165$ and $\bm{\delta}_{k}=\bm{\Pi}_{2}$ for $600\leq k\leq 615$ where $\bm{\Pi}_{1},\bm{\Pi}_{2}\in\mathbb{R}^{N}$ are randomly generated, and $\bm{\delta}_{k}=0$ otherwise.

Refer to caption — (a) Slowly-varying disturbance

To illustrate the performance of our algorithm in seeking the dynamical pollution sources with the two types of disturbances, we show the cumulative regrets $R_{T}$ produced by Algorithm 1, respectively. The obtained numerical results are shown in Fig. 1, in which each curve is corresponding to $20$ independent trials. It can be observed from the figures that our algorithm produces the smaller cumulative regret than the one generated by the standard D-UCB algorithm. We can thus conclude that, while the standard D-UCB algorithm fails to localize the sources when the disturbances are present in the field, our algorithm manages to complete the task in both scenarios with the external and internal disturbances. More specifically, with respect to the internal abruptly-changing disturbance, we also compare the performance of our algorithm with different choices of the parameter $\gamma$ . Note that by setting $\gamma=1$ , our algorithm will be naturally reduced to standard D-UCB algorithm. It can be observed that, after the disturbance are injected, our algorithm will soon adapt to the disturbed pollution field and then track the newly-added sources accordingly. On the contrary, the standard D-UCB algorithm fails to do so. In addition, it can be also seen from Fig. 1(b) that the smaller $\gamma$ results in a shorter period of the adaption process. This is mainly due to the fact that the agents tend to perform more explorations when the small $\gamma$ is chosen. As a result of the classical dilemma between exploration and exploitation, however, an disadvantage of the smaller $\gamma$ is that the cumulative regret grows more rapidly after the sources are localized.

VI Conclusion

In this paper, a learning based algorithm is developed to solve the problem of multi-agent online source seeking under the environment disturbed by non-stochastic perturbations. Building on the technique of discounted Kalman filtering as well as the notion of D-UCB proposed in our previous work, our algorithm enables the computation-efficient cooperation among the multi-agent network and is robust against the non-stochastic perturbations (also interpreted as the adversarial disturbances in the context of multi-armed bandits). It is shown that a sub-linear cumulative regret is achieved by our algorithm, which is comparable to the state-of-art. Numerical results on a real-world pollution monitoring application is finally provided to support our theoretical findings.

Appendix I: proof of Lemma 1

Let us prove Lemma 1 by mathematical induction. First, it is straightforward to confirm that, given the initialization $\widehat{\bm{\phi}}_{0}$ , $\Sigma_{0}$ , and $\omega_{-1}=0$ , the recursions (11) and (12) produce the identical $\widehat{\bm{\phi}}_{1}$ and $\Sigma_{1}$ . Next, we assume that (12) generates the same results as (11) up to the time-step $k$ , it will suffice to prove the consistency for the time-step $k+1$ .

In fact, based on the recursion of $\Sigma_{k}$ in (11), we can have

		$\displaystyle\Sigma_{k+1}=A_{k+1}\Big{(}\Sigma_{k}^{-1}+\lambda_{k}Y_{k}+(\omega_{k}-\omega_{k-1})\Gamma_{k}^{-1}\Big{)}^{-1}A_{k+1}^{\top}$		(27)
		$\displaystyle=A_{k+1}\Big{(}A[k:1]^{-\top}\Upsilon_{k}A[k:1]^{-1}+\lambda_{k}Y_{k}$
		$\displaystyle\hskip 40.0pt+(\omega_{k}-\omega_{k-1})\Gamma_{k}^{-1}\Big{)}^{-1}A_{k+1}^{\top}$
		$\displaystyle=A[k+1:1]\Big{(}\Upsilon_{k}+A[k:1]^{\top}Y_{k}A[k:1]$
		$\displaystyle\hskip 40.0pt+(\omega_{k}-\omega_{k-1})\mathbf{I}_{N}\Big{)}^{-1}A[k+1:1]^{\top}$
		$\displaystyle=A[k+1:1]\Upsilon_{k+1}^{-1}A[k+1:1]^{\top},$

where the second equality comes from our assumption of $\Sigma_{k}$ in the form of (12a) and the last equality is due to the definition of $\Upsilon_{k}$ in (13). Similarly, based on the recursion of $\widehat{\bm{\phi}}_{k}$ in (11), we can have

		$\displaystyle\widehat{\bm{\phi}}_{k+1}=A[k+1:1]\Upsilon_{k+1}^{-1}A[k+1:1]^{\top}A_{k+1}^{-\top}\Sigma_{k+\scriptscriptstyle{1/2}}^{-1}\widehat{\bm{\phi}}_{k+\scriptscriptstyle{1/2}}$		(28)
		$\displaystyle=A[k+1:1]\Upsilon_{k+1}^{-1}A[k:1]^{\top}\Big{(}\Sigma_{k+\scriptscriptstyle{1/2}}^{-1}\widehat{\bm{\phi}}_{k}+\lambda_{k}(\mathbf{y}_{k}-Y_{k}\widehat{\bm{\phi}}_{k})\Big{)}$
		$\displaystyle=A[k+1:1]\Upsilon_{k+1}^{-1}A[k:1]^{\top}\Big{(}\Sigma_{k}^{-1}\widehat{\bm{\phi}}_{k}+\lambda_{k}\mathbf{y}_{k}\Big{)}$
		$\displaystyle=A[k+1:1]\Upsilon_{k+1}^{-1}\Big{(}\Upsilon_{k}A[k:1]^{-1}\widehat{\bm{\phi}}_{k}+\lambda_{k}A[k:1]^{\top}\mathbf{y}_{k}\Big{)}$
		$\displaystyle=A[k+1:1]\Upsilon_{k+1}^{-1}\Big{(}\Sigma_{0}^{-1}\widehat{\bm{\phi}}_{0}+\sum_{t=0}^{k}\lambda_{t}A[t:1]^{\top}\mathbf{y}_{t}\Big{)},$

where the first equality comes from (27) which just has been proved; the second and third equalities are due to (11); the second last equality follows $\Sigma_{k}$ in the form of (12a); and the last one is due to our assumption of $\widehat{\bm{\phi}}_{k}$ in the form of (12b).

Appendix II: proofs of Main Theorems

We shall notice that the proofs in this section are mainly inspired by [25] and [29], which performed the regret analysis in the context of stochastic linear bandits under non-stationary and adversarial environments, respectively. The contributions of our proofs are i) integration of linear dynamics and Kalman filtering into the algorithmic framework; and ii) adaptation of the new notion of D-UCB into the regret analysis.

To facilitate the following proofs, let us start by introducing some useful vector norms. First, associated with the diagonal matrix of an arbitrary positive definite matrix $M\in\mathbb{R}^{N\times N}$ , i.e., $\mathcal{D}_{M}=\text{Diag}\{m_{11},m_{22},\cdots,m_{NN}\}\in\mathbb{R}^{N\times N}$ , we define the $\mathcal{L}_{2}$ -based vector norm $\|\cdot\|_{\mathcal{D}_{M}}:\mathbb{R}^{N}\to\mathbb{R}_{+}$ as

\displaystyle\|\mathbf{x}\|_{\mathcal{D}_{M}}:=\sqrt{\sum_{i=1}^{N}m_{ii}\cdot x_{i}^{2}},

(29)

where $\mathbf{x}=[x_{1},x_{2},\cdots x_{N}]^{\top}\in\mathbb{R}^{N}$ . Further, let us define the $\mathcal{L}_{\infty}$ -based norm $\|\cdot\|_{\mathcal{D}_{M},\infty}:\mathbb{R}^{N}\to\mathbb{R}_{+}$ with respect to the matrix $\mathcal{D}_{M}$ as

\displaystyle\|\mathbf{x}\|_{\mathcal{D}_{M},\infty}:=\max_{1\leq i\leq N}\;m_{ii}\cdot|x_{i}|.

(30)

Note that the above norm $\|\cdot\|_{\mathcal{D}_{M},\infty}$ is well-defined since the positive definiteness of $M$ ensures that $m_{ii}>0$ . Similarly, we define the $\mathcal{L}_{1}$ -based norm $\|\cdot\|_{\mathcal{D}_{M},1}:\mathbb{R}^{N}\to\mathbb{R}_{+}$ as

\displaystyle\|\mathbf{x}\|_{\mathcal{D}_{M},1}:=\sum_{i=1}^{N}m_{ii}\cdot|x_{i}|.

(31)

With the vector norms introduced above, it can be immediately verified that $\|\cdot\|_{\mathcal{D}_{M},1}$ and $\|\cdot\|_{\mathcal{D}_{M}^{-1},\infty}$ are dual norms where $\mathcal{D}_{M}^{-1}$ takes the inverse of the matrix $\mathcal{D}_{M}$ . In addition, we provide in the following lemma the connections among all defined norms.

Lemma 2

For arbitrary positive definite matrix $M$ , it holds that, 1) $\|\mathbf{x}\|_{\mathcal{D}_{M},\infty}\leq\|\mathbf{x}\|_{\mathcal{D}_{M}^{2}}$ ; 2) $\|\mathbf{x}\|_{\mathcal{D}_{M},1}\leq\sqrt{N}\cdot\|\mathbf{x}\|_{\mathcal{D}_{M}^{2}}$ ; and 3) $\|\mathbf{x}\|_{M}\leq{\sqrt{N}\cdot\|\mathbf{x}\|_{\mathcal{D}_{M}}}$ .

Proof:

While the inequalities a) and b) can be straightforwardly confirmed by the definitions and the inequality of arithmetic and geometric means, respectively, the part c) is proved as follows.

$\displaystyle\\|\mathbf{x}\\|_{M}^{2}$	$\displaystyle\leq\sum_{i=1}^{N}m_{ii}\cdot x_{i}^{2}+\sum_{i=1}^{N}\sum_{j\neq i}\|m_{ij}\|\cdot\|x_{i}x_{j}\|$	(32)
	$\displaystyle\leq\sum_{i=1}^{N}m_{ii}\cdot x_{i}^{2}+\sum_{i=1}^{N}\sum_{j\neq i}\sqrt{m_{ii}m_{jj}}\cdot\|x_{i}x_{j}\|$
	$\displaystyle\leq\sum_{i=1}^{N}m_{ii}\cdot x_{i}^{2}+\sum_{i=1}^{N}\sum_{j\neq i}\frac{1}{2}(m_{ii}\cdot x_{i}^{2}+m_{jj}\cdot x_{j}^{2})$
	$\displaystyle=N\cdot\\|\mathbf{x}\\|^{2}_{\mathcal{D}_{M}}.$

Note that the first inequality is due to the positive definiteness of $M$ , i.e., $|m_{ij}|\leq\sqrt{m_{ii}m_{jj}}$ . Hence, the proof is completed. ∎

VI-A Proof of Proposition 1

To prove the inequality (17) in Proposition 1, with the help of the above defined vector norms, it will suffice to show

\displaystyle\mathbb{P}\Big{(}\big{\|}\widehat{\bm{\phi}}_{k}-\bm{\phi}_{k}\big{\|}_{\mathcal{D}^{-1/2}_{\Sigma_{k}},\infty}\leq\beta_{k}(\delta)\Big{)}\geq 1-\delta.

(33)

Note that the inequality in (33) is stronger than the one in (17), in the sense that the state $\bm{\phi}_{k}$ is both upper and lower bounded. Though the lower bound is not reflected in the development of our algorithm, it helps the proof for the sub-linear regret. In addition, due to the fact that the weight $\omega$ is specified as $\omega_{k}\equiv 0$ in this part, it indeed changes the generation of state estimates by simplifying the matrix $\Upsilon_{k}$ as

\displaystyle\Upsilon_{k}:=\Sigma_{0}^{-1}+\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}A[t:1].

(34)

According to the nature of the first type of disturbance, the disturbed state can be expressed as $\widetilde{\bm{\phi}}_{k}=A[k:1]\bm{\phi}_{0}+\bm{\delta}_{k}$ and thus the measurement is $\mathbf{z}_{k}=H_{k}(A[k:1]\bm{\phi}_{0}+\bm{\delta}_{k})+\mathbf{n}_{k}$ . Then, by Lemma 1 and the definitions of $\Upsilon_{k}$ and $Y_{k}$ , the state estimate $\widehat{\bm{\phi}}_{k}$ has

$\displaystyle\widehat{\bm{\phi}}_{k}$	$\displaystyle=A[k:1]\Upsilon_{k}^{-1}\Big{(}\Sigma_{0}^{-1}\widehat{\bm{\phi}}_{0}+\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}$	(35)
	$\displaystyle\hskip 10.0pt+\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}\big{(}A[t:1]{\bm{\phi}}_{0}+\bm{\delta}_{t}\big{)}\Big{)}$
	$\displaystyle=\bm{\phi}_{k}+A[k:1]\Upsilon_{k}^{-1}\Big{(}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}$
	$\displaystyle\hskip 10.0pt+\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}\bm{\delta}_{t}+\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})\Big{)}.$

Therefore, it holds for $\forall\mathbf{x}\in\mathbb{R}^{N}$ that

		$\displaystyle\mathbf{x}^{\top}(\widehat{\bm{\phi}}_{k}-\bm{\phi}_{k})$		(36)
		$\displaystyle\overset{(1.a)}{\leq}\big{\\|}A[k:1]^{\top}\mathbf{x}\big{\\|}_{\Upsilon_{k}^{-1}}\cdot\Bigg{(}\big{\\|}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})\big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}\bm{\delta}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\Bigg{)}$
		$\displaystyle\overset{(1.b)}{=}\big{\\|}\mathbf{x}\big{\\|}_{\Sigma_{k}}\cdot\Bigg{(}\big{\\|}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})\big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}\bm{\delta}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\Bigg{)}$
		$\displaystyle\overset{(1.c)}{\leq}{\sqrt{N}}\cdot\big{\\|}\mathbf{x}\big{\\|}_{\mathcal{D}_{\Sigma_{k}}}\cdot\Bigg{(}\big{\\|}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})\big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}\bm{\delta}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\Bigg{)},$

where $(1.a)$ is according to the Cauchy-Schwartz and triangle inequalities; $(1.b)$ is due to the recursion of $\Sigma_{k}$ in the form of (12a); and $(1.c)$ is based on Lemma 2-3).

Now, by Lemma 2-1), it follows that

$\displaystyle\big{\\|}\widehat{\bm{\phi}}_{k}$	$\displaystyle-\bm{\phi}_{k}\big{\\|}_{\mathcal{D}^{-1/2}_{\Sigma_{k}},\infty}\leq\big{\\|}\widehat{\bm{\phi}}_{k}-\bm{\phi}_{k}\big{\\|}_{\mathcal{D}^{-1}_{\Sigma_{k}}}$	(37)
	$\displaystyle\leq{\sqrt{N}}\cdot\Bigg{(}\big{\\|}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})\big{\\|}_{\Upsilon_{k}^{-1}}$
	$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}\bm{\delta}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}$
	$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\Bigg{)},$

where the last inequality is according to (36) and meanwhile taking $\mathbf{x}=\mathcal{D}^{-1}_{\Sigma_{k}}(\widehat{\bm{\phi}}_{k}-\bm{\phi}_{k})$ . Next, to prove the inequality (33), we upper bound the three terms on the right hand side of (37) in the following three lemmas, respectively.

Lemma 3

Under the conditions in Proposition 1, there exists a constant $C_{1}=\|\widehat{\bm{\phi}}_{0}-\bm{\phi}_{0}\|/\sqrt{\underaccent{\bar}{\sigma}}$ such that,

\displaystyle\big{\|}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})\big{\|}_{\Upsilon_{k}^{-1}}\leq C_{1},\quad\forall k\geq 0.

(38)

Proof:

By the definition of the matrix $\Upsilon_{k}$ , it is straightforward to see that ${\Upsilon_{k}^{-1}}\;{\preceq}\;\Sigma_{0}$ , and therefore,

		$\displaystyle\big{\\|}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})\big{\\|}^{2}_{\Upsilon_{k}^{-1}}$		(39)
		$\displaystyle=(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})^{\top}\Sigma_{0}^{-1}\Upsilon_{k}^{-1}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})$
		$\displaystyle\leq(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})^{\top}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})$
		$\displaystyle\leq 1/\underaccent{\bar}{\sigma}\cdot\\|\widehat{\bm{\phi}}_{0}-\bm{\phi}_{0}\\|^{2},$

where the last inequality is due to the assumption $\Sigma_{0}\succeq\underaccent{\bar}{\sigma}\cdot\mathbf{I}$ . Thus, the proof is completed. ∎

Lemma 4

Under the conditions in Proposition 1, one can have that

\displaystyle\Big{\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}\bm{\delta}_{t}\Big{\|}_{\Upsilon_{k}^{-1}}\leq\bar{\lambda}B_{k},

(40)

where the sequence $\{B_{k}\}_{k\in\mathbb{N}_{+}}$ is defined in Assumption 2.

Proof:

It holds that

		$\displaystyle\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}\bm{\delta}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\overset{(2.a)}{\leq}\sum_{t=0}^{k-1}\lambda_{t}\big{\\|}A[t:1]^{\top}Y_{t}\big{\\|}_{\Upsilon_{t}^{-1}}\\|\bm{\delta}_{t}\\|$		(41)
		$\displaystyle\overset{(2.b)}{=}\sum_{t=0}^{k-1}\lambda_{t}\big{\\|}Y_{t}\big{\\|}_{\Sigma_{t}}\\|\bm{\delta}_{t}\\|\overset{(2.c)}{\leq}\sum_{t=0}^{k-1}\\|\bar{\lambda}\bm{\delta}_{t}\\|\overset{(2.d)}{\leq}\bar{\lambda}B_{k},$		(41)

where $(2.a)$ is due to the triangle inequality and the fact that $\Upsilon_{k}\geq\Upsilon_{t},\forall k\geq t$ by (34); $(2.b)$ is based on the recursion (12a) of $\Sigma_{t}$ ; $(2.c)$ is due to the specification of the weight $\lambda_{k}$ , i.e., $\lambda_{k}=\min\{1,\bar{\lambda}/\|Y_{t}\|_{\Sigma_{t}}\}$ ; and $(2.d)$ is by Assumption 2. ∎

Lemma 5

Under the conditions in Proposition 1, there exists a constant $C_{2}=\bar{v}^{2}\sqrt{\max\{2,2/\underaccent{\bar}{v}\}}$ such that the following inequality holds with probability at least $1-\delta$ ,

		$\displaystyle\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}$		(42)
		$\displaystyle\hskip 50.0pt\leq C_{2}\sqrt{N}\cdot\sqrt{\log\Big{(}\frac{\bar{\sigma}/\underaccent{\bar}{\sigma}+\bar{\alpha}\bar{\sigma}\cdot k/\underaccent{\bar}{v}^{2}}{\delta^{2/N}}\Big{)}}.$		(42)

Proof:

This proof is based on the existing results on the self-normalized Martingale, see e.g., [25]. For the notational simplicity, let us define

\displaystyle X_{t}:=\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\in\mathbb{R}^{N\times M}.

(43)

Then, according to the result of self-normalized Martingale, it holds with probability at least $1-\delta$ that,

\displaystyle\Big{\|}\sum_{t=0}^{k-1}X_{t}\mathbf{n}_{t}\Big{\|}_{\Omega_{k}^{-1}}\leq 2\bar{v}^{2}\cdot\sqrt{\log\Big{(}\frac{\det(\Omega_{k})^{1/2}\det(\Sigma_{0})^{1/2}}{\delta}\Big{)}},

(44)

where $\Omega_{k}:=\Sigma_{0}^{-1}+\sum_{t=0}^{k-1}X_{t}X_{t}^{\top}\in\mathbb{R}^{N\times N}$ . Note that there is a slight difference between $\Omega_{k}$ and $\Upsilon_{k}$ , and we show there exists a constant $C^{\prime}_{2}=\max\{1,1/\underaccent{\bar}{v}\}$ such that $\Omega_{k}\;{\preceq}\;C^{\prime}_{2}\Upsilon_{k}$ . In fact, it holds that

$\displaystyle\Omega_{k}$	$\displaystyle=\Sigma_{0}^{-1}+\sum_{t=0}^{k-1}\lambda_{t}^{2}A[t:1]^{\top}H_{t}^{\top}V^{-2}H_{t}A[t:1]$	(45)
	$\displaystyle\;{\preceq}\;\Sigma_{0}^{-1}+1/\underaccent{\bar}{v}\cdot\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}H_{t}A[t:1]$
	$\displaystyle\;{\preceq}\;\max\{1,1/\underaccent{\bar}{v}\}\cdot\Upsilon_{k}.$

Note that the first inequality is due to $\omega_{t}\leq 1$ and the assumption $\underaccent{\bar}{v}\cdot\mathbf{I}_{M}\preceq V\preceq\bar{v}\cdot\mathbf{I}_{M}$ . Therefore, the previous statement can be immediately verified by letting $C^{\prime}_{2}=\max\{1,1/\underaccent{\bar}{v}\}$ and implies that $\Upsilon_{k}^{-1}{\preceq}\;C^{\prime}_{2}\Omega_{k}^{-1}$ . Together with the inequality (44), it follows that

		$\displaystyle\Big{\\|}\sum_{t=0}^{k-1}X_{t}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\leq\sqrt{C^{\prime}_{2}}\cdot\Big{\\|}\sum_{t=0}^{k-1}X_{t}\mathbf{n}_{t}\Big{\\|}_{\Omega_{k}^{-1}}$		(46)
		$\displaystyle\leq 2\bar{v}^{2}\sqrt{\max\{1,1/\underaccent{\bar}{v}\}}\cdot\sqrt{\log\Big{(}\frac{\det(\Omega_{k})^{1/2}\det(\Sigma_{0})^{1/2}}{\delta}\Big{)}}.$		(46)

Moreover, based on the inequality of arithmetic and geometric means and the definition of $\Omega_{k}$ , it holds that

\displaystyle\det(\Omega_{k})

\displaystyle\leq\Big{(}1/N\cdot\text{Tr}\big{(}\Sigma_{0}^{-1}\big{)}+1/N\cdot\sum_{t=0}^{k-1}\text{Tr}(X_{t}X_{t}^{\top})\Big{)}^{N},

(47)

where the trace of the matrix $X_{t}X_{t}^{\top}$ further has

$\displaystyle\text{Tr}(X_{t}X_{t}^{\top})$	$\displaystyle=\text{Tr}\Big{(}\lambda_{t}^{2}A[t:1]^{\top}H_{t}^{\top}V^{-2}H_{t}A[t:1]\Big{)}$	(48)
	$\displaystyle\overset{(2.a)}{\leq}1/\underaccent{\bar}{v}^{2}\cdot\sum_{n=1}^{N}\mathbf{e}_{n}^{\top}A[t:1]^{\top}H_{t}^{\top}H_{t}A[t:1]\mathbf{e}_{n}$
	$\displaystyle\overset{(2.b)}{\leq}1/\underaccent{\bar}{v}^{2}\cdot\sum_{n=1}^{N}\mathbf{e}_{n}^{\top}A[t:1]^{\top}A[t:1]\mathbf{e}_{n}$
	$\displaystyle\overset{(2.c)}{\leq}N\cdot\bar{\alpha}/\underaccent{\bar}{v}^{2}.$

Note that $(2.a)$ is due to the assumption $\underaccent{\bar}{v}\cdot\mathbf{I}_{M}\preceq V\preceq\bar{v}\cdot\mathbf{I}_{M}$ and $\mathbf{e}_{n}\in\mathbb{R}^{N}$ denotes the unit vector; $(2.b)$ follows from the special form of the measurement matrix $H_{t}$ , i.e., each row has only one element equal to one and all others equal to zero; and $(2.c)$ is based on Assumption 1. In addition, given that the initialization $\Sigma_{0}$ has $\underaccent{\bar}{\sigma}\cdot\mathbf{I}_{N}\;{\preceq}\;\Sigma_{0}\;{\preceq}\;\bar{\sigma}\cdot\mathbf{I}_{N}$ , it follows that $\text{Tr}(\Sigma_{0}^{-1})\leq N/\underaccent{\bar}{\sigma}$ and $\det(\Sigma_{0})\leq\bar{\sigma}^{N}$ . As a result, we can have

		$\displaystyle\sqrt{\log\Big{(}{\det(\Omega_{k})^{1/2}\det(\Sigma_{0})^{1/2}}/{\delta}\Big{)}}$		(49)
		$\displaystyle=\sqrt{1/2\cdot\log\big{(}\det(\Omega_{k})\big{)}+1/2\cdot\log\big{(}\det(\Sigma_{0})\big{)}-\log(\delta)}$
		$\displaystyle\leq\sqrt{N/2}\cdot\sqrt{\log\Big{(}\frac{\bar{\sigma}/\underaccent{\bar}{\sigma}+\bar{\alpha}\bar{\sigma}\cdot k/\underaccent{\bar}{v}^{2}}{\delta^{2/N}}\Big{)}}.$

Based on the inequality (44), the proof is completed. ∎

Now, combining Lemmas 3–5 together with (37), it has been shown that, with probability $1-\delta$

		$\displaystyle\big{\\|}\widehat{\bm{\phi}}_{k}-\bm{\phi}_{k}\big{\\|}_{\mathcal{D}^{-1/2}_{\Sigma_{k}},\infty}$		(50)
		$\displaystyle\leq\sqrt{N}\cdot\Bigg{(}\bar{\lambda}B_{k}+C_{1}+C_{2}\sqrt{N}\cdot\sqrt{\log\Big{(}\frac{\bar{\sigma}/\underaccent{\bar}{\sigma}+\bar{\alpha}\bar{\sigma}\cdot k/\underaccent{\bar}{v}^{2}}{\delta^{2/N}}\Big{)}}\Bigg{)}.$		(50)

Recall the definition of $\beta_{k}(\delta)$ in (18), the inequality in (33) is proved and so is Proposition 1.

VI-B Proof of Theorem 1

To facilitate the following proof, let us first introduce a new mapping $\mathbf{a}(\cdot):\mathcal{S}^{I}\to\mathbb{R}^{N}$ which translates the positional information $\mathbf{p}=\big{[}\mathbf{p}[1],\mathbf{p}[2],\cdots,\mathbf{p}[I]\big{]}\in\mathcal{S}^{I}$ into a $N$ -dimensional action vector $\mathbf{a}(\mathbf{p})\in\mathbb{R}^{N}$ , i.e.,

\displaystyle\mathbf{a}(\mathbf{p})=\sum_{i=1}^{I}\mathbf{e}_{s_{i}},

(51)

where each $s_{i}$ corresponds to the index of the position $\mathbf{p}[i]$ in the environment $\mathcal{S}$ and $\mathbf{e}_{s_{i}}\in\mathbb{R}^{N}$ denotes the unit vector. Now, by the definitions of $\mathbf{p}_{k}$ and $\mathbf{p}^{\star}_{k}$ , it can be immediately verified that the vectors $\mathbf{a}(\mathbf{p}_{k})$ and $\mathbf{a}(\mathbf{p}^{\star}_{k})$ must have $I$ elements equal to one and all others equal to zero. Further, we denote $\mathcal{A}$ the set of all possibilities of these vectors, i.e.,

\displaystyle\mathcal{A}:=\{\mathbf{a}\,|\,\mathbf{a}\in\{0,1\}^{N},\mathbf{1}^{\top}\mathbf{a}=I\}.

(52)

For simplicity, we abbreviate the above $\mathbf{a}(\mathbf{p}_{k})$ and $\mathbf{a}(\mathbf{p}^{\star}_{k})$ to $\mathbf{a}_{k}\in\mathcal{A}$ and $\mathbf{a}^{\star}_{k}\in\mathcal{A}$ , subsequently. Based on the definition of $F_{k}(\cdot)$ as well as the introduced notations, the regret $r_{k}$ can be expressed as,

\displaystyle{r_{k}}=F_{k}\big{(}\mathbf{p}^{\star}_{k}\big{)}-F_{k}(\mathbf{p}_{k})=\langle\mathbf{a}^{\star}_{k}-\mathbf{a}_{k},\bm{\phi}_{k}\rangle.

(53)

To proceed, we show the following lemma which provides an upper bound for the regret $r_{k}$ at each time-step $k$ .

Lemma 6

Under the conditions in Proposition 1 and let the positional information $\mathbf{a}_{k}$ ’s be generated by Algorithm 1, then it holds with probability at least $1-\delta$ that,

\displaystyle r_{k}\leq 2\sqrt{N}\beta_{k}(\delta)\cdot\|\mathbf{a}_{k}\|_{\mathcal{D}_{\Sigma_{k}}}.

(54)

Proof:

By the expression of $r_{k}$ in (53), it follows that

$\displaystyle r_{k}$	$\displaystyle=\langle\mathbf{a}^{\star}_{k},\;\bm{\phi}_{k}\rangle-\langle\mathbf{a}_{k},\;\bm{\phi}_{k}\rangle$	(55)
	$\displaystyle\overset{(3.a)}{\leq}\langle\mathbf{a}_{k},\;\bm{\mu}_{k}-\bm{\phi}_{k}\rangle$
	$\displaystyle\overset{(3.b)}{\leq}\\|\mathbf{a}_{k}\\|_{\mathcal{D}^{1/2}_{\Sigma_{k}},1}\cdot\\|\bm{\mu}_{k}-\bm{\phi}_{k}\\|_{\mathcal{D}^{-1/2}_{\Sigma_{k}},\infty}$
	$\displaystyle\overset{(3.c)}{\leq}2\beta_{k}(\delta)\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}^{1/2}_{\Sigma_{k}},1}$
	$\displaystyle\overset{(3.d)}{\leq}2\sqrt{N}\beta_{k}(\delta)\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}},$

where $(3.a)$ is due to $\langle\mathbf{a}^{\star}_{k},\;\bm{\phi}_{k}\rangle\leq\langle\mathbf{a}^{\star}_{k},\;\bm{\mu}_{k}\rangle\leq\langle\mathbf{a}_{k},\;\bm{\mu}_{k}\rangle$ which can be verified by Proposition 1 and the definition of $\mathbf{a}_{k}$ ; $(3.b)$ follows from the Hölder’s inequality and the fact that $\|\cdot\|_{\mathcal{D}_{M},1}$ and $\|\cdot\|_{\mathcal{D}_{M}^{-1},\infty}$ are dual norms; $(3.c)$ is based on the inequality (33) which has been proved previously; and $(3.d)$ is due to Lemma 2-2). ∎

Based on the above Lemma 6, it is shown that the regret can be upper bounded by $\|\mathbf{a}_{k}\|_{\mathcal{D}_{\Sigma_{k}}}$ . In order to investigate the key term $\|\mathbf{a}_{k}\|_{\mathcal{D}_{\Sigma_{k}}}$ , we next show in the following lemma an intermediate result which can be used to bound $\|\mathbf{a}_{k}\|_{\mathcal{D}_{\Sigma_{k}}}$ .

Lemma 7

Under the conditions in Proposition 1, it holds,

\displaystyle\sum_{k=0}^{K-1}\min

\displaystyle\big{\{}1,\lambda_{k}\text{Tr}(Y_{k}\Sigma_{k})\big{\}}\leq 2N\cdot\log\Big{(}\bar{\sigma}/\underaccent{\bar}{\sigma}+K\bar{\sigma}\bar{\alpha}I/\underaccent{\bar}{v}\Big{)}.

(56)

Proof:

Recall the recursion (34) of $\Upsilon_{k}$ , the matrix can be also generated as follows,

\displaystyle\Upsilon_{k+1}=\Upsilon_{k}+\lambda_{k}A[k:1]^{\top}Y_{k}A[k:1].

(57)

For simplicity, let us further denote $A[k:1]^{\top}Y_{k}A[k:1]$ by a new matrix $\Xi_{k}\in\mathbb{R}^{N}$ . Now, considering determinant of $\Upsilon_{k}$ ’s, it then holds that

$\displaystyle\det(\Upsilon_{k+1})$	$\displaystyle=\det\Big{(}\Upsilon_{k}^{1/2}\big{(}\mathbf{I}+\lambda_{k}\Upsilon_{k}^{-1/2}\Xi_{k}\Upsilon_{k}^{-1/2}\big{)}\Upsilon_{k}^{1/2}\Big{)}$	(58)
	$\displaystyle=\det(\Upsilon_{k})\cdot\det\big{(}\mathbf{I}+\lambda_{k}\Upsilon_{k}^{-1/2}\Xi_{k}\Upsilon_{k}^{-1/2}\big{)}$
	$\displaystyle\overset{(4.a)}{=}\det(\Upsilon_{k})\cdot\prod_{n=1}^{N}\Big{(}1+\lambda_{k}\bm{\lambda}_{n}(\Upsilon_{k}^{-1/2}\Xi_{k}\Upsilon_{k}^{-1/2})\Big{)}$
	$\displaystyle\overset{(4.b)}{\geq}\det(\Upsilon_{k})\cdot\Big{(}1+\sum_{n=1}^{N}\lambda_{k}\bm{\lambda}_{n}(\Upsilon_{k}^{-1/2}\Xi_{k}\Upsilon_{k}^{-1/2})\Big{)}$
	$\displaystyle=\det(\Upsilon_{k})\cdot\Big{(}1+\lambda_{k}\text{Tr}(\Upsilon_{k}^{-1/2}\Xi_{k}\Upsilon_{k}^{-1/2})\Big{)},$

where $\bm{\lambda}_{n}(\cdot)$ denotes the $n$ -th eigenvalue of the matrix in $(4.a)$ and $(4.b)$ is due to the inequality of arithmetic and geometric means. Based on the cyclic property of the matrix trace and the recursion of $\Sigma_{k}$ in (12a), it follows

\displaystyle\text{Tr}(\Upsilon_{k}^{-1/2}\Xi_{k}\Upsilon_{k}^{-1/2})

\displaystyle=\text{Tr}(Y_{k}\Sigma_{k}).

(59)

Therefore, (58) can be continued as

\displaystyle\det(\Upsilon_{k+1})\geq\det(\Upsilon_{k})\cdot\Big{(}1+\lambda_{k}\text{Tr}(Y_{k}\Sigma_{k})\Big{)}.

(60)

Now, applying the above inequality (60) recursively yields

\displaystyle\det(\Upsilon_{k+1})\geq\det(\Upsilon_{0})\cdot\prod_{t=0}^{k}\Big{(}1+\lambda_{t}\text{Tr}(Y_{t}\Sigma_{t})\Big{)}.

(61)

Notice that $\min\{1,x\}\leq 2\log(1+x)$ is always true for any non-negative scalar $x\geq 0$ , thus one can have that

	$\displaystyle\sum_{t=0}^{k}\min\big{\{}1,\lambda_{t}\text{Tr}(Y_{t}\Sigma_{t})\big{\}}$	$\displaystyle\leq\sum_{t=0}^{k}2\log\big{(}1+\lambda_{t}\text{Tr}(Y_{t}\Sigma_{t})\big{)}$		(62)
		$\displaystyle\leq 2\log\Big{(}\det({\Upsilon_{k+1}})/\det(\Upsilon_{0})\Big{)}.$		(62)

Based on the definition (13) of $\Upsilon_{k}$ , it follows that

	$\displaystyle\det(\Upsilon_{k+1})$	$\displaystyle\leq\Big{(}1/N\cdot\text{Tr}(\Sigma_{0}^{-1})+1/N\cdot\sum_{t=0}^{k}\lambda_{t}\text{Tr}(\Xi_{t})\Big{)}^{N}$		(63)
		$\displaystyle\leq\Big{(}1/\underaccent{\bar}{\sigma}+(k+1)\cdot\bar{\alpha}I/\underaccent{\bar}{v}\Big{)}^{N}.$		(63)

Therefore, combining (62) and (63), we can have that

\displaystyle\sum_{t=0}^{k}\min\big{\{}1,\lambda_{t}\text{Tr}(Y_{t}\Sigma_{t})\big{\}}\leq 2N\cdot\log\Big{(}\bar{\sigma}/\underaccent{\bar}{\sigma}+(k+1)\bar{\sigma}\bar{\alpha}I/\underaccent{\bar}{v}\Big{)},

(64)

which completes the proof. ∎

Next, in order to bound $\|\mathbf{a}_{k}\|_{\mathcal{D}_{\Sigma_{k}}}$ by using the above Lemma 7, we build the connection between $\text{Tr}(H_{t}^{\top}V^{-1}H_{t}\Sigma_{t})$ and $\|\mathbf{a}_{k}\|_{\mathcal{D}_{\Sigma_{k}}}$ as follows.

Lemma 8

Let the matrices $\Sigma_{k}$ ’s be generated by (12a) and the positional information $\mathbf{a}_{k}$ ’s be generated by Algorithm 1, then the following statements hold for $\forall k\geq 0$ ,

1.

$\text{Tr}(Y_{k}\Sigma_{k})\geq 1/\bar{v}\cdot\|\mathbf{a}_{k}\|^{2}_{\mathcal{D}_{\Sigma_{k}}}$ ;
2.

$\underaccent{\bar}{v}\cdot\|Y_{k}\|^{2}_{\Sigma_{k}}\leq\text{Tr}(Y_{k}\Sigma_{k})\leq\bar{v}N\cdot\|Y_{k}\|^{2}_{\Sigma_{k}}$ .

Proof:

Statement 1): Due to specific forms of covariance matrix $V$ and the measurement matrix $H_{k}$ , it can be confirmed that $Y_{k}$ has to be diagonal and can be expressed as

\displaystyle Y_{k}=\sum_{i=1}^{I}\sum_{l\in\mathcal{C}^{i}}1/v^{i}\cdot\mathbf{e}_{l}\mathbf{e}_{l}^{\top},

(65)

where $\mathcal{C}_{k}^{i}$ denotes the $i$ -th agent’s sensing area at the time $k$ ; see definition in (8). Let us introduce a binary variable $\delta^{i}_{k}(n)$ ; let $\delta^{i}_{k}(n)=1$ if the position indexed by $n$ is in the sensing area $\mathcal{C}_{k}^{i}$ , and $\delta^{i}_{n}=0$ otherwise. As a direct result, it holds that

\displaystyle\text{Tr}(Y_{k}\Sigma_{k})=\sum_{n=1}^{N}\Big{(}\sigma_{n}^{k}\cdot\sum_{i=1}^{I}\delta_{k}^{i}(n)/v_{i}\Big{)},

(66)

where $\sigma_{n}^{k}$ denotes the $n$ -th diagonal entry of the matrix $\Sigma_{k}$ . Now, let $s_{k}^{i}$ be the index of the agent $i$ ’s position, one can have that $\delta_{k}^{i}(s_{k}^{i})=1$ and therefore,

$\displaystyle\text{Tr}(Y_{k}\Sigma_{k})$	$\displaystyle\geq 1/\bar{v}\cdot\sum_{i=1}^{I}\mathbf{e}_{s^{i}_{k}}^{\top}\Sigma_{k}\mathbf{e}_{s^{i}_{k}}$	(67)
	$\displaystyle=1/\bar{v}\cdot\mathbf{a}_{k}^{\top}\mathcal{D}_{\Sigma_{k}}\mathbf{a}_{k}$
	$\displaystyle=1/\bar{v}\cdot\\|\mathbf{a}_{k}\\|^{2}_{\mathcal{D}_{\Sigma_{k}}},$

where the first equality is due to the definition of $\mathbf{a}_{k}$ .

Statement 2): Based on the equality (66) and the fact that $\delta^{i}_{k}(n)$ is a binary variable, it follows that

$\displaystyle\text{Tr}(Y_{k}\Sigma_{k})$	$\displaystyle=\sum_{n=1}^{N}\Big{(}\sigma_{n}^{k}\cdot\sum_{i=1}^{I}v_{i}\big{(}\delta_{k}^{i}(n)/v_{i}\big{)}^{2}\Big{)}$	(68)
	$\displaystyle\geq\underaccent{\bar}{v}\cdot\sum_{n=1}^{N}\Big{(}\sigma_{n}^{k}\cdot\sum_{i=1}^{I}\big{(}\delta_{k}^{i}(n)/v_{i}\big{)}^{2}\Big{)}$
	$\displaystyle=\underaccent{\bar}{v}\cdot\text{Tr}(Y_{k}\Sigma_{k}Y_{k})$
	$\displaystyle\geq\underaccent{\bar}{v}\cdot\\|Y_{k}\\|^{2}_{\Sigma_{k}},$

where the last inequality is due to the definition of the matrix norm $\|\cdot\|_{\Sigma_{k}}$ , i.e., $\|Y_{k}\|^{2}_{\Sigma_{k}}$ equals the largest eigenvalue of the matrix $Y_{k}\Sigma_{k}Y_{k}$ .

On the other hand, one can also have that

$\displaystyle\text{Tr}(Y_{k}\Sigma_{k})$	$\displaystyle\leq\bar{v}\cdot\sum_{n=1}^{N}\Big{(}\sigma_{n}^{k}\cdot\sum_{i=1}^{I}\big{(}\delta_{k}^{i}(n)/v_{i}\big{)}^{2}\Big{)}$	(69)
	$\displaystyle=\bar{v}\cdot\text{Tr}(Y_{k}\Sigma_{k}Y_{k})$
	$\displaystyle\leq\bar{v}N\cdot\\|Y_{k}\\|^{2}_{\Sigma_{k}}.$

Therefore, the proof is completed. ∎

With the help of the above lemmas, we are in the position to prove the theorem. By the definition of regret in (53), it is easy to see that the regret $r_{k}$ has an uniform upper bound, i.e., $r_{k}\leq\bar{\gamma}:=2\sqrt{I\bar{\alpha}}\cdot\|\mathbf{\phi}_{0}\|^{2}$ . Based on the above Lemma 6, we can have that

	$\displaystyle r_{k}$	$\displaystyle\leq\min\big{\{}\bar{\gamma},\;2\sqrt{N}\beta_{k}(\delta)\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}}$		(70)
		$\displaystyle\leq\beta^{\prime}_{k}(\delta)\sqrt{N}\cdot\min\big{\{}1,\;1/\sqrt{\bar{v}}\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}},$		(70)

where we denote $\beta^{\prime}_{k}(\delta)=\max\{\bar{\gamma},2\sqrt{\bar{v}}\beta_{k}(\delta)\}$ . According to the definition (18) of the sequence $\{\beta_{k}(\delta)\}_{k\in\mathbb{N}_{+}}$ , it follows that $\beta^{\prime}_{k}(\delta)\leq\beta^{\prime}_{k+1}(\delta)$ . Therefore, the cumulative regret has

		$\displaystyle\sum_{k=0}^{K-1}r_{k}\leq\beta^{\prime}_{K}(\delta)\sqrt{N}\cdot\sum_{k=0}^{K-1}\min\big{\{}1,\;1/\sqrt{\bar{v}}\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}}$		(71)
		$\displaystyle=\beta^{\prime}_{K}(\delta)\sqrt{N}\cdot\Bigg{(}\sum_{k=0}^{K-1}\mathbbm{1}_{\lambda_{k}=1}\cdot\min\big{\{}1,\;\sqrt{\lambda_{k}/\bar{v}}\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}}$
		$\displaystyle\hskip 45.0pt+\sum_{k=0}^{K-1}\mathbbm{1}_{\lambda_{k}<1}\cdot\min\big{\{}1,\;1/\sqrt{\bar{v}}\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}}\Bigg{)}.$

Note that $\mathbbm{1}_{\lambda_{k}=1}$ and $\mathbbm{1}_{\lambda_{k}<1}$ represent the indicator functions and the last equality is due to the fact that $\lambda_{k}\leq 1,\forall k\geq 0$ . Now, let us investigate the two terms in (71) separately. For the first term, by Lemmas 7 and 8, it follows that

		$\displaystyle\sum_{k=0}^{K-1}\mathbbm{1}_{\lambda_{k}=1}\cdot\min\big{\{}1,\;\sqrt{\lambda_{k}/\bar{v}}\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}}$		(72)
		$\displaystyle\overset{(5.a)}{\leq}\sqrt{K}\cdot\sqrt{\sum_{k=0}^{K-1}\min\big{\{}1,\;{\lambda_{k}/\bar{v}}\cdot\\|\mathbf{a}_{k}\\|^{2}_{\mathcal{D}_{\Sigma_{k}}}\big{\}}}$
		$\displaystyle\overset{(5.b)}{\leq}\sqrt{K}\cdot\sqrt{\sum_{k=0}^{K-1}\min\big{\{}1,\;{\lambda_{k}}\cdot\text{Tr}(Y_{k}\Sigma_{k})\big{\}}}$
		$\displaystyle\overset{(5.c)}{\leq}\sqrt{2NK\cdot\log\big{(}\bar{\sigma}/\underaccent{\bar}{\sigma}+K\bar{\alpha}I/\underaccent{\bar}{v}\big{)}},$

where $(5.a)$ is due to the inequality of arithmetic and geometric means and the fact that $\sum_{k=0}^{K-1}\mathbbm{1}_{\lambda_{k}=1}\leq K$ ; $(5.b)$ is based on Lemma 8-1); and $(5.c)$ is according to Lemma 7. For the second term, it holds that

		$\displaystyle\sum_{k=0}^{K-1}\mathbbm{1}_{\lambda_{k}<1}\cdot\min\big{\{}1,\;1/\sqrt{\bar{v}}\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}}$		(73)
		$\displaystyle\overset{(6.a)}{\leq}\sum_{k=0}^{K-1}\mathbbm{1}_{\lambda_{k}<1}\cdot\min\big{\{}1,\;\sqrt{\bar{v}N}\cdot\\|Y_{k}\\|_{{\Sigma_{k}}}\big{\}}$
		$\displaystyle\overset{(6.b)}{=}\sum_{k=0}^{K-1}\mathbbm{1}_{\lambda_{k}<1}\cdot\min\big{\{}1,\;\sqrt{\bar{v}N}\lambda_{k}/\bar{\lambda}\cdot\\|Y_{k}\\|^{2}_{{\Sigma_{k}}}\big{\}}$
		$\displaystyle\overset{(6.c)}{\leq}\sum_{k=0}^{K-1}\mathbbm{1}_{\lambda_{k}<1}\cdot\min\big{\{}1,\;\sqrt{\bar{v}N}\lambda_{k}/(\bar{\lambda}\underaccent{\bar}{v})\cdot\text{Tr}(Y_{k}\Sigma_{k})\big{\}}$
		$\displaystyle\overset{(6.d)}{\leq}\lambda^{\prime}\cdot\sum_{k=0}^{K-1}\min\big{\{}1,\;\lambda_{k}\cdot\text{Tr}(Y_{k}\Sigma_{k})\big{\}}$
		$\displaystyle\overset{(6.e)}{\leq}2N\lambda^{\prime}\cdot\log\big{(}\bar{\sigma}/\underaccent{\bar}{\sigma}+K\bar{\sigma}\bar{\alpha}I/\underaccent{\bar}{v}\big{)},$

where $(6.a)$ is due to the fact that $\|\mathbf{a}_{k}\|_{\mathcal{D}_{\Sigma_{k}}}\leq\bar{v}N\cdot\|Y_{k}\|_{{\Sigma_{k}}}$ by Lemma 8; $(6.b)$ is according to the choice of the weights, i.e., $\lambda_{k}=\bar{\lambda}/\|Y_{k}\|_{\Sigma_{k}}$ given that $\lambda_{k}<1$ ; $(6.c)$ is based on Lemma 8-2); in $(6.d)$ , we let $\lambda^{\prime}=\max\{1,\sqrt{\bar{v}N}/(\bar{\lambda}\underaccent{\bar}{v})\}$ ; and $(6.e)$ is according to Lemma 7.

Now, combining the results obtained in (71) – (73) as well as the defintion of $\beta_{k}(\delta)$ ; see (18), and let $\bar{\lambda}=\sqrt{N}/B_{K}$ , it yields that

$\displaystyle\sum_{k=0}^{K-1}r_{k}\leq$	$\displaystyle\beta^{\prime}_{K}(\delta)\sqrt{N}\cdot\Big{(}\sqrt{2NK\cdot\log\big{(}\bar{\sigma}/\underaccent{\bar}{\sigma}+K\bar{\sigma}\bar{\alpha}I/\underaccent{\bar}{v}\big{)}}$	(74)
	$\displaystyle+2N\alpha^{\prime}\cdot\log\big{(}\bar{\sigma}/\underaccent{\bar}{\sigma}+K\bar{\sigma}\bar{\alpha}I/\underaccent{\bar}{v}\big{)}\Big{)}$
$\displaystyle\leq$	$\displaystyle\mathcal{O}\Big{(}N^{3/2}\sqrt{\log{K}}\cdot\big{(}\sqrt{NK\log{K}}+{N}B_{K}\log{K}\big{)}\Big{)}$
$\displaystyle=$	$\displaystyle\mathcal{O}\Big{(}N^{2}\sqrt{K}\log{K}+N^{3/2}B_{K}\log^{5/2}{K}\Big{)}$
$\displaystyle=$	$\displaystyle\widetilde{\mathcal{O}}\Big{(}N^{2}\sqrt{K}+N^{5/2}B_{K}\Big{)}.$

Therefore, the proof is completed.

VI-C Proof of Proposition 2

This proof can be completed by following the similar steps as the one for Proposition 1, except the main differences in the dynamics of state $\widetilde{\bm{\phi}}_{k}$ (due to the different type of disturbance) and the specification of weights $\lambda_{k}$ and $\omega_{k}$ .

Taking the dynamics (3b) into account and following the same steps previously, it can be proved without details that

		$\displaystyle\big{\\|}\widehat{\bm{\phi}}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\big{\\|}_{\mathcal{D}^{-1/2}_{\Sigma_{k}},\infty}$		(75)
		$\displaystyle\leq\sqrt{N}\cdot\Big{(}\big{\\|}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-\widetilde{\bm{\phi}}_{0})\big{\\|}_{\Upsilon_{k}^{-1}}+\lambda_{k-1}\big{\\|}A[k:1]^{-1}\widetilde{\bm{\phi}}^{k}\big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 45.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\Big{)},$

where $\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}$ is defined as in (21). While the first term on the right hand side can be exactly bounded by the previous Lemma 3, the last two need specific attentions to obtain the upper bounds.

First, we show in the following lemma that the second term can be indeed bounded by the weight $\lambda_{k}$ .

Lemma 9

Under the conditions in Proposition 2, there exists a constant $C_{3}=\bar{\phi}/\sqrt{\underaccent{\bar}{\alpha}}$ , such that

\displaystyle\lambda_{k-1}\big{\|}A[k:1]^{-1}\widetilde{\bm{\phi}}^{k}\big{\|}_{\Upsilon_{k}^{-1}}\leq C_{3}\sqrt{\lambda_{k-1}}.

(76)

Proof:

By the definition (13) of $\Upsilon_{k}$ and the specification of $\lambda_{k}=\omega_{k}=(1/\gamma)^{k}$ , it is verified that ${\Upsilon_{k}^{-1}}\leq 1/(\lambda_{k-1})\cdot\mathbf{I}_{N}$ . Therefore, it holds that

$\displaystyle\big{\\|}A[k:1]^{-1}\widetilde{\bm{\phi}}^{k}\big{\\|}^{2}_{\Upsilon_{k}^{-1}}$	$\displaystyle\leq 1/(\lambda_{k-1})\cdot\big{\\|}A[k:1]^{-1}\widetilde{\bm{\phi}}^{k}\big{\\|}^{2}$	(77)
	$\displaystyle=1/(\lambda_{k-1})\cdot\\|\widetilde{\bm{\phi}}^{k}\\|^{2}_{A[k:1]^{-\top}A[k:1]^{-1}}$
	$\displaystyle\leq 1/(\underaccent{\bar}{\alpha}\lambda_{k-1})\cdot\\|\widetilde{\bm{\phi}}^{k}\\|^{2}$
	$\displaystyle\leq\bar{\phi}^{2}/(\underaccent{\bar}{\alpha}\lambda_{k-1}),$

where the last two inequalities are due to Assumption 1 and the condition $\|\widetilde{\bm{\phi}}_{k}\|\leq\bar{\phi}$ in Assumption 3, respectively. Therefore, the proof is completed. ∎

Next, the third term can be handled by applying the result of self-normalized Martingale as in Lemma 5. Nevertheless, to adapt the change of the matrix $\Upsilon_{k}$ , we need to modify the definition of $\Omega_{k}$ accordingly,

\displaystyle\Omega_{k}:=\sum_{t=0}^{k-1}X_{t}X_{t}^{\top}+\lambda_{k-1}^{2}\cdot\mathbf{I}_{N}\in\mathbb{R}^{N\times N},

(78)

where $X_{t}$ is defined as same as before; see equation (43). As a result, we can have that, with probability at least $1-\delta$ ,

\displaystyle\Big{\|}\sum_{t=0}^{k-1}X_{t}\mathbf{n}_{t}\Big{\|}_{\Omega_{k}^{-1}}\leq 2\bar{v}^{2}\cdot\sqrt{\log\Big{(}\frac{\det(\Omega_{k})^{1/2}\cdot\lambda_{k-1}^{-N}}{\delta}\Big{)}}.

(79)

Due to the fact that the sequence $\{\lambda_{k}\}_{k\in\mathbb{N}_{+}}$ is increasing with $\lambda_{1}>1$ , it can be proved by following the same steps as before that $\Omega_{k}\leq\max\{1,1/\underaccent{\bar}{v}\}\cdot\lambda_{k-1}\Upsilon_{k}$ and furthermore,

		$\displaystyle\Big{\\|}\sum_{t=0}^{k-1}X_{t}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\leq\sqrt{\max\{1,1/\underaccent{\bar}{v}\}\cdot\lambda_{k-1}}\cdot\Big{\\|}\sum_{t=0}^{k-1}X_{t}\mathbf{n}_{t}\Big{\\|}_{\Omega_{k}^{-1}}$		(80)
		$\displaystyle\leq C_{2}\sqrt{N}\sqrt{\lambda_{k-1}}\cdot\sqrt{\log\Big{(}\frac{1+\bar{\alpha}/\underaccent{\bar}{v}^{2}\lambda_{k-1}^{-2}\cdot\sum_{t=0}^{k-1}\lambda_{t}^{2}}{\delta^{2/N}}\Big{)}},$
		$\displaystyle=C_{2}\sqrt{N}\sqrt{\lambda_{k-1}}\cdot\sqrt{\log\Big{(}\frac{1+\bar{\alpha}/\underaccent{\bar}{v}^{2}\cdot\sum_{t=0}^{k-1}\gamma^{2(k-t-1)}}{\delta^{2/N}}\Big{)}},$

where $C_{2}=\bar{v}^{2}\sqrt{\max\{2,2/\underaccent{\bar}{v}\}}$ .

Now, combining the above inequality together with (75) as well as Lemmas 3 and 9, one can have that,

		$\displaystyle\big{\\|}\widehat{\bm{\phi}}_{k}-\bm{\phi}_{k}\big{\\|}_{\mathcal{D}^{-1/2}_{\Sigma_{k}},\infty}\leq\sqrt{N}\cdot\Bigg{(}C_{1}+C_{3}\sqrt{\lambda_{k-1}}$		(81)
		$\displaystyle+C_{2}\sqrt{N}\sqrt{\lambda_{k-1}}\cdot\sqrt{\log\Big{(}\frac{1+\bar{\alpha}/\underaccent{\bar}{v}^{2}\cdot\sum_{t=0}^{k-1}\gamma^{2(k-t-1)}}{\delta^{2/N}}\Big{)}}\Bigg{)}.$		(81)

Therefore, recall the definition of $\beta_{k}(\delta)$ in (23), the proof of Proposition 2 is completed.

VI-D Proof of Theorem 2

By applying the notions defined in the proof of Theorem 1, one can have that,

$\displaystyle\widetilde{r}_{k}$	$\displaystyle=\widetilde{F}_{k}\big{(}\mathbf{p}^{\star}_{k}\big{)}-\widetilde{F}_{k}(\mathbf{p}_{k})=\langle\mathbf{a}^{\star}_{k}-\mathbf{a}_{k},\widetilde{\bm{\phi}}_{k}\rangle$	(82)
	$\displaystyle=\langle\mathbf{a}^{\star}_{k},\;\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\rangle-\langle\mathbf{a}_{k},\;\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\rangle+\langle\mathbf{a}^{\star}_{k}-\mathbf{a}_{k},\;\widetilde{\bm{\phi}}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\rangle$
	$\displaystyle\overset{(7.a)}{\leq}\langle\mathbf{a}_{k},\;\bm{\mu}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\rangle+2\sqrt{I}\cdot\\|\widetilde{\bm{\phi}}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\\|$
	$\displaystyle\overset{(7.b)}{\leq}\\|\mathbf{a}_{k}\\|_{\mathcal{D}^{1/2}_{\Sigma_{k}},1}\cdot\\|\bm{\mu}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\\|_{\mathcal{D}^{-1/2}_{\Sigma_{k}},\infty}+2\sqrt{I}\cdot\\|\widetilde{\bm{\phi}}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\\|$
	$\displaystyle\overset{(7.c)}{\leq}2\sqrt{N}\beta_{k}(\delta)\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}+2\sqrt{I}\cdot\\|\widetilde{\bm{\phi}}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\\|,$

where $(7.a)$ is due to the definitions of $\mathbf{a}^{\star}_{k}$ and $\mathbf{a}_{k}$ as well as the fact that $\langle\mathbf{a}^{\star}_{k},\;\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\rangle\leq\langle\mathbf{a}^{\star}_{k},\;\bm{\mu}_{k}\rangle\leq\langle\mathbf{a}_{k},\;\bm{\mu}_{k}\rangle$ ; $(7.b)$ is from the Hölder’s inequality; and $(7.c)$ comes from the inequality (81) which has been proved in the proof of Proposition 2. It can be seen from the above result that, due to the involvement of all prior disturbances $\bm{\delta}_{k}$ ’s in the state $\widetilde{\bm{\phi}}_{k}$ , the regret $\widetilde{r}_{k}$ can be no longer bounded by the action term $\|\mathbf{a}_{k}\|_{\mathcal{D}_{\Sigma_{k}}}$ solely, but needs to consider the extra term which is related to the discrepancy between $\widetilde{\bm{\phi}}_{k}$ and $\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}$ . Hence, we next provide an upper bound for $\|\widetilde{\bm{\phi}}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\|$ with respect to the disturbances $\bm{\delta}_{k}$ ’s.

Lemma 10

Under the conditions in Proposition 2, it holds that, for $0\leq D\leq k$ ,

\displaystyle\|\widetilde{\bm{\phi}}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\|\leq C_{4}/\lambda_{k-1}+C_{5}/\lambda_{k-1}\sum_{t=0}^{k-D-1}\lambda^{t}+C_{6}\sum_{t=k-D}^{k-1}\|\bm{\delta}_{t}\|,

(83)

where $C_{4}=\bar{\phi}\sqrt{\bar{\alpha}}\cdot(1+1/\sqrt{\underaccent{\bar}{\alpha}})/\underaccent{\bar}{\sigma}$ , $C_{5}={\bar{\alpha}}\bar{\phi}\cdot(1+1/\sqrt{\underaccent{\bar}{\alpha}})/(\underaccent{\bar}{v}I)$ , and $C_{6}=\sqrt{\bar{\alpha}/\underaccent{\bar}{\alpha}}$ .

Proof:

Recall the definition (21) of $\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}$ , it follows that

		$\displaystyle\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}-\widetilde{\bm{\phi}}_{k}=A[k:1]\Upsilon_{k}^{-1}\Big{(}\Sigma_{0}^{-1}\big{(}\widetilde{\bm{\phi}}_{0}-A[k:1]^{-1}\widetilde{\bm{\phi}}_{k}\big{)}$		(84)
		$\displaystyle\hskip 10.0pt+\sum_{t=0}^{k-D-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}H_{t}\big{(}\widetilde{\bm{\phi}}_{t}-A[k:t+1]^{-1}\widetilde{\bm{\phi}}_{k}\big{)}$
		$\displaystyle\hskip 10.0pt+\sum_{t=k-D}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}H_{t}\big{(}\widetilde{\bm{\phi}}_{t}-A[k:t+1]^{-1}\widetilde{\bm{\phi}}_{k}\big{)}\Big{)}.$

We next upper bound in order the three terms on the right-hand-side of (84).

Terms I: Due to the fact that $\Upsilon_{k}^{-1}\leq 1/\lambda_{k-1}\cdot\mathbf{I}_{N}$ , it holds

		$\displaystyle\Big{\\|}A[k:1]\Upsilon_{k}^{-1}\Sigma_{0}^{-1}\big{(}\widetilde{\bm{\phi}}_{0}-A[k:1]^{-1}\widetilde{\bm{\phi}}_{k}\big{)}\Big{\\|}$		(85)
		$\displaystyle\leq\big{\\|}\widetilde{\bm{\phi}}_{0}\big{\\|}_{M_{1}}+\big{\\|}\widetilde{\bm{\phi}}_{k}\big{\\|}_{M_{2}}.$		(85)

Note that, in (85), the two matrices $M_{1}$ and $M_{2}$ have

	$\displaystyle M_{1}:$	$\displaystyle=\Sigma_{0}^{-1}\Upsilon_{k}^{-1}A[k:1]^{\top}A[k:1]\Upsilon_{k}^{-1}\Sigma_{0}^{-1}$		(86)
		$\displaystyle\leq\bar{\alpha}/(\lambda_{k-1}\underaccent{\bar}{\sigma})^{2}\cdot\mathbf{I}_{N},$		(86)

and

	$\displaystyle M_{2}:$	$\displaystyle=A[k:1]^{-\top}\Sigma_{0}^{-1}\Upsilon_{k}^{-1}A[k:1]^{\top}A[k:1]\Upsilon_{k}^{-1}\Sigma_{0}^{-1}A[k:1]^{-1}$		(87)
		$\displaystyle\leq\bar{\alpha}/\big{(}\underaccent{\bar}{\alpha}(\lambda_{k-1}\underaccent{\bar}{\sigma})^{2}\big{)}\cdot\mathbf{I}_{N}.$		(87)

As a result of $\|\widetilde{\bm{\phi}}_{k}\|\leq\bar{\phi}$ in Assumption 3, we can have that

\displaystyle\Big{\|}A[k:1]\Upsilon_{k}^{-1}\Sigma_{0}^{-1}\big{(}\widetilde{\bm{\phi}}_{0}-A[k:1]^{-1}\widetilde{\bm{\phi}}_{k}\big{)}\Big{\|}\leq C_{4}/\lambda_{k-1},

(88)

where $C_{4}=\bar{\phi}\sqrt{\bar{\alpha}}\cdot(1+1/\sqrt{\underaccent{\bar}{\alpha}})/\underaccent{\bar}{\sigma}$ .

Term II: Following the same path for the analysis of the first term, it can be shown that

		$\displaystyle\Big{\\|}A[k:1]\Upsilon_{k}^{-1}$		(89)
		$\displaystyle\bm{\cdot}\sum_{t=0}^{k-D-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}H_{t}\big{(}\widetilde{\bm{\phi}}_{t}-A[k:t+1]^{-1}\widetilde{\bm{\phi}}_{k}\big{)}\Big{\\|}$
		$\displaystyle\leq\bar{\alpha}\bar{\phi}/(\underaccent{\bar}{v}I\lambda_{k-1})\cdot\sum_{t=0}^{k-D-1}\lambda_{t}+\bar{\alpha}\bar{\phi}/(\sqrt{\underaccent{\bar}{\alpha}}\underaccent{\bar}{v}I\lambda_{k-1})\cdot\sum_{t=0}^{k-D-1}\lambda_{t}$
		$\displaystyle\leq C_{5}/\lambda_{k-1}\sum_{t=0}^{k-D-1}\lambda_{t},$

where $C_{5}={\bar{\alpha}}\bar{\phi}\cdot(1+1/\sqrt{\underaccent{\bar}{\alpha}})/(\underaccent{\bar}{v}I)$ .

Term III: Note that, for $\forall k\geq t+1$ , we can have

		$\displaystyle A[k:t+1]^{-1}\widetilde{\bm{\phi}}_{k}-\widetilde{\bm{\phi}}_{t}$		(90)
		$\displaystyle\hskip 20.0pt=\sum_{s=t}^{k-1}A[s+1:t+1]^{-1}(\widetilde{\bm{\phi}}_{s+1}-A_{s+1}\widetilde{\bm{\phi}}_{s}).$		(90)

Therefore, it holds that

		$\displaystyle\Big{\\|}A[k:1]\Upsilon_{k}^{-1}$		(91)
		$\displaystyle\bm{\cdot}\sum_{t=k-D}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}H_{t}\big{(}A[k:t+1]^{-1}\widetilde{\bm{\phi}}_{k}-\widetilde{\bm{\phi}}_{t}\big{)}\Big{\\|}$
		$\displaystyle\overset{(8.a)}{\leq}\sqrt{\bar{\alpha}}\cdot\Big{\\|}\Upsilon_{k}^{-1}\sum_{t=k-D}^{k-1}\sum_{s=t}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}H_{t}$
		$\displaystyle\hskip 40.0pt\bm{\cdot}A[s+1:t+1]^{-1}(\widetilde{\bm{\phi}}_{s+1}-A_{s+1}\widetilde{\bm{\phi}}_{s})\Big{\\|}$
		$\displaystyle\overset{(8.b)}{=}\sqrt{\bar{\alpha}}\cdot\Big{\\|}\sum_{s=k-D}^{k-1}\Upsilon_{k}^{-1}\sum_{t=k-D}^{s}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}H_{t}A[t:1]$
		$\displaystyle\hskip 40.0pt\bm{\cdot}A[s+1:1]^{-1}(\widetilde{\bm{\phi}}_{s+1}-A_{s+1}\widetilde{\bm{\phi}}_{s})\Big{\\|}$
		$\displaystyle\overset{(8.c)}{\leq}\sqrt{\bar{\alpha}}\cdot\sum_{s=k-D}^{k-1}\Big{\\|}\Upsilon_{k}^{-1}\sum_{t=k-D}^{s}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}H_{t}A[t:1]$
		$\displaystyle\hskip 40.0pt\bm{\cdot}A[s+1:1]^{-1}(\widetilde{\bm{\phi}}_{s+1}-A_{s+1}\widetilde{\bm{\phi}}_{s})\Big{\\|}$
		$\displaystyle\overset{(8.d)}{\leq}\sqrt{\bar{\alpha}}\cdot\sum_{s=k-D}^{k-1}\Big{\\|}A[s+1:1]^{-1}(\widetilde{\bm{\phi}}_{s+1}-A_{s+1}\widetilde{\bm{\phi}}_{s})\Big{\\|}$
		$\displaystyle\overset{(8.e)}{\leq}\sqrt{\bar{\alpha}/\underaccent{\bar}{\alpha}}\cdot\sum_{s=k-D}^{k-1}\big{\\|}\widetilde{\bm{\phi}}_{s+1}-A_{s+1}\widetilde{\bm{\phi}}_{s}\big{\\|}.$

Note that $(8.a)$ and $(8.e)$ are due to Assumption 1; in $(8.b)$ , we exchange the summation indices and apply (90); $(8.c)$ follows from the triangle inequality; and $(8.d)$ is due to the fact that

		$\displaystyle\Upsilon_{k}^{-1}\sum_{t=k-D}^{s}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}H_{t}A[t:1]$		(92)
		$\displaystyle\leq\Upsilon_{k}^{-1}\Big{(}\Sigma_{0}^{-1}+\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}H_{t}A[t:1]+\lambda_{k-1}\cdot\mathbf{I}_{N}\Big{)}$
		$\displaystyle=\mathbf{I}_{N}.$

Thus, let $C_{6}=\sqrt{\bar{\alpha}/\underaccent{\bar}{\alpha}}$ and based on the upper bounds of the three terms; see inequalities (88), (89) and (91), the proof of Lemma 10 is completed. ∎

In terms of $\|\mathbf{a}_{k}\|_{\mathcal{D}_{\Sigma_{k}}}$ appearing the regret $\widetilde{r}_{k}$ ’s bound (82), we apply the same analysis as in the proof of Theorem 1; see Lemma 7 and obtain the following result.

Lemma 11

Under the conditions in Proposition 2, it holds,

		$\displaystyle\sum_{k=0}^{K}\min\big{\{}1,\lambda_{k}\text{Tr}(Y_{k}\Sigma_{k})\big{\}}$		(93)
		$\displaystyle\leq 2N\cdot\log\Big{(}\bar{\sigma}\big{(}\underaccent{\bar}{\sigma}^{-1}+\lambda_{K}+\bar{\alpha}I/\underaccent{\bar}{v}\cdot\sum_{k=0}^{K}\lambda_{k}\big{)}\Big{)}.$		(93)

Proof:

This proof can be finished by following the same steps as in the one for Lemma 7. However, two differences should be noted which are resulted from the distinct definition of $\Upsilon_{k}$ .

First, under the conditions in this lemma, the recursion of $\Upsilon_{k}$ follows

\displaystyle\Upsilon_{k+1}=\Upsilon_{k}+\lambda_{k}A[k:1]^{\top}Y_{k}A[k:1]+(\lambda_{k}-\lambda_{k-1})\cdot\mathbf{I}_{N}.

(94)

Despite the difference as compared to (57), due to the fact that

\displaystyle\det\big{(}{\Upsilon_{k+1}}\big{)}\geq\det\Big{(}{\Upsilon_{k}+\lambda_{k}A[k:1]^{\top}Y_{k}A[k:1]}\Big{)},

(95)

the (in)equalities in (58) are still valid, and so is the subsequent deduction. At last, based on the definition of $\Upsilon_{k}$ in (13), the final bound in (93) is obtained by

	$\displaystyle\det(\Upsilon_{k+1})$	$\displaystyle\leq\Big{(}1/N\cdot\text{Tr}(\Sigma_{0}^{-1})+1/N\cdot\sum_{t=0}^{k}\lambda_{t}\text{Tr}(\Xi_{t})+\lambda_{k}\Big{)}^{N}$		(96)
		$\displaystyle\leq\Big{(}\underaccent{\bar}{\sigma}^{-1}+\lambda_{k}+\bar{\alpha}I/\underaccent{\bar}{v}\cdot\sum_{t=0}^{k}\lambda_{t}\Big{)}^{N}.$		(96)

∎

With the help of the above lemmas, we are ready to prove the sub-linear regret as stated in Theorem 2. Notice that an uniform upper bound $\bar{\gamma}:=2\sqrt{I}\bar{\phi}$ still exists for the regret $\widetilde{r}_{k}$ . Therefore, it follows from (82) that

		$\displaystyle\widetilde{r}_{k}\leq\min\big{\{}\bar{\gamma},\;2\sqrt{N}\beta_{k}(\delta)\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}}+2\sqrt{I}\cdot\\|\widetilde{\bm{\phi}}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\\|$		(97)
		$\displaystyle\leq\sqrt{N}{\beta}^{\prime}_{k}(\delta)\min\big{\{}1,\sqrt{\lambda_{k}/\bar{v}}\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}}+2\sqrt{I}\cdot\\|\widetilde{\bm{\phi}}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\\|,$		(97)

where we let $\beta^{\prime}_{k}(\delta):=\max\{\bar{\gamma},\,2\sqrt{\bar{v}/\lambda_{k}}\beta_{k}(\delta)\}$ in the last inequality. Therefore, it holds that

		$\displaystyle\sum_{k=0}^{K-1}\widetilde{r}_{k}\leq\sqrt{N}{\beta}^{\prime}_{K}(\delta)\cdot\sum_{k=0}^{K-1}\min\big{\{}1,\;\sqrt{\lambda_{k}/\bar{v}}\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}}$		(98)
		$\displaystyle\hskip 10.0pt+2\sqrt{I}\cdot\sum_{k=0}^{K-1}\\|\widetilde{\bm{\phi}}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\\|$
		$\displaystyle\overset{(9.a)}{\leq}\sqrt{N}{\beta}^{\prime}_{K}(\delta)\cdot\sqrt{K\cdot\sum_{k=0}^{K-1}\min\big{\{}1,\;\lambda_{k}\text{Tr}(Y_{k}\Sigma_{k})\big{\}}}$
		$\displaystyle\hskip 10.0pt+2\sqrt{I}\cdot\sum_{k=0}^{K-1}\\|\widetilde{\bm{\phi}}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\\|$
		$\displaystyle\overset{(9.b)}{\leq}N{\beta}^{\prime}_{K}(\delta)\sqrt{2K}\cdot\log\Big{(}\bar{\sigma}\big{(}\underaccent{\bar}{\sigma}^{-1}+\lambda_{K}+\bar{\alpha}I/\underaccent{\bar}{v}\cdot\sum_{k=0}^{K-1}\lambda_{k}\big{)}\Big{)}$
		$\displaystyle\hskip 10.0pt+2\sqrt{I}C_{4}\cdot\sum_{k=0}^{K-1}1/\lambda_{k}+2\sqrt{I}C_{5}\cdot\sum_{k=0}^{K-1}1/\lambda_{k}\cdot\sum_{t=0}^{k-D-1}\lambda_{t}$
		$\displaystyle\hskip 10.0pt+2\sqrt{I}C_{6}\cdot\sum_{k=0}^{K-1}\sum_{t=k-D}^{k-1}\\|\bm{\phi}_{t+1}-A_{t+1}\bm{\phi}_{t}\\|$
		$\displaystyle\overset{(9.c)}{\leq}N{\beta}^{\prime}_{K}(\delta)\sqrt{2K}\cdot\log\Big{(}\bar{\sigma}\big{(}\underaccent{\bar}{\sigma}^{-1}+\gamma^{-K}+\bar{\alpha}I/\underaccent{\bar}{v}\cdot\gamma^{-K}/(1-\gamma)\big{)}\Big{)}$
		$\displaystyle\hskip 10.0pt+2\sqrt{I}C_{4}/(1-\gamma)+2\sqrt{I}C_{5}\cdot(K-D)\gamma^{D+1}/(1-\gamma)$
		$\displaystyle\hskip 10.0pt+2\sqrt{I}C_{6}\cdot DB_{K},$

where $(9.a)$ is due to the Cauchy-Schwartz inequality and Lemma 8-1); $(9.b)$ is by Lemmas 10 and 11; and $(9.c)$ follows from the specification of $\lambda_{k}=(1/\gamma)^{k}$ with $0<\gamma<1$ .

Now, provided that $\gamma=1-(B_{K}/K)^{2/3}$ ; see the condition of Theorem 2, and letting $D=\lfloor\log(K)/(1-\gamma)\rfloor$ , it can be confirmed that $D\leq B_{K}^{-2/3}K^{2/3}\log(K)$ and therefore,

\displaystyle 2\sqrt{I}C_{6}\cdot DB_{K}=\widetilde{\mathcal{O}}\Big{(}B_{K}^{1/3}K^{2/3}\Big{)}.

(99)

Further, considering that $\log(1/\gamma)\sim 1-\gamma=(B_{K}/K)^{2/3}$ , it holds that $\gamma^{D}=e^{D\log(\gamma)}\leq e^{\log(K)\log(\gamma)/{(1-\gamma)}}=\widetilde{\mathcal{O}}(1/K)$ , and consequently,

		$\displaystyle 2\sqrt{I}C_{4}/(1-\gamma)+2\sqrt{I}C_{5}\cdot(K-D)\gamma^{D+1}/(1-\gamma)$		(100)
		$\displaystyle\sim 1/(1-\gamma)=\widetilde{\mathcal{O}}\Big{(}B_{K}^{-2/3}K^{2/3}\Big{)}.$		(100)

According to the definitions of ${\beta}^{\prime}_{k}(\delta)$ and ${\beta}_{k}(\delta)$ , it holds that

	$\displaystyle{\beta}^{\prime}_{K}(\delta)$	$\displaystyle\sim\beta_{K}(\delta)/\sqrt{\lambda_{K}}\sim N\sqrt{\log\Big{(}\sum_{t=0}^{K-1}\gamma^{2(K-t-1)}\Big{)}}$		(101)
		$\displaystyle\leq N\sqrt{\log\big{(}{1}/(1-\gamma)\big{)}}={\mathcal{O}}\Big{(}N\sqrt{\log(K/B_{K})}\Big{)}.$		(101)

As a result, one can have that

		$\displaystyle N{\beta}^{\prime}_{K}(\delta)\sqrt{2K}\cdot\log\Big{(}\bar{\sigma}\big{(}\underaccent{\bar}{\sigma}^{-1}+\gamma^{-K}+\bar{\alpha}I/\underaccent{\bar}{v}\cdot\gamma^{-K}/(1-\gamma)\big{)}\Big{)}$		(102)
		$\displaystyle\sim N^{2}\sqrt{\log(K/B_{K})}\cdot\sqrt{K}\cdot\sqrt{K\log(1/\gamma)+\log\big{(}1/(1-\gamma)\big{)}}$
		$\displaystyle=\widetilde{\mathcal{O}}\Big{(}N^{2}B_{K}^{1/3}K^{2/3}\Big{)}.$

At last, combining (98)–(100) and (102) arrives at the conclusion in Theorem 2, i.e., the cumulative regret generated by our algorithm is upper bounded by $\widetilde{R}_{K}\leq\widetilde{\mathcal{O}}\big{(}N^{2}B_{K}^{1/3}K^{2/3}\big{)}$ .

References

[1] J. Poveda, M. Benosman, and R. Teel, A.and Sanfelice. Robust coordinated hybrid source seeking with obstacle avoidance in multi-vehicle autonomous systems. IEEE Transactions on Automatic Control, 2021.
[2] B. Angélico, L. Chamon, S. Paternain, A. Ribeiro, and G. Pappas. Source seeking in unknown environments with convex obstacles. In Proceedings of 2021 American Control Conference, pages 5055–5061. IEEE, 2021.
[3] T. Li, B. Jayawardhana, A. Kamat, and A. Kottapalli. Source-seeking control of unicycle robots with 3-D printed flexible piezoresistive sensors. IEEE Transactions on Robotics, 2021.
[4] W. Liu, X. Huo, G. Duan, and K. Ma. Semi-global stability analysis of source seeking with dynamic sensor reading and a class of nonlinear maps. International Journal of Control, pages 1–10, 2020.
[5] E. Ramirez-Llanos and S. Martinez. Stochastic source seeking for mobile robots in obstacle environments via the SPSA method. IEEE Transactions on Automatic Control, 64(4):1732–1739, 2018.
[6] S. Azuma, M. Sakar, and G. Pappas. Stochastic source seeking by mobile robots. IEEE Transactions on Automatic Control, 57(9):2308–2321, 2012.
[7] J. Habibi, H. Mahboubi, and A. Aghdam. A gradient-based coverage optimization strategy for mobile sensor networks. IEEE Transactions on Control of Network Systems, 4(3):477–488, 2016.
[8] E. Rolf, D. Fridovich-Keil, M. Simchowitz, B. Recht, and C. Tomlin. A successive-elimination approach to adaptive robotic source seeking. IEEE Transactions on Robotics, 37(1):34–47, 2020.
[9] B. Du, K. Qian, H. Iqbal, C. Claudel, and D. Sun. Multi-robot dynamical source seeking in unknown environments. In Proceedings of 2021 IEEE International Conference on Robotics and Automation, pages 9036–9042. IEEE, 2021.
[10] J. Feiling, S. Koga, M. Krstić, and T. Oliveira. Gradient extremum seeking for static maps with actuation dynamics governed by diffusion PDEs. Automatica, 95:197–206, 2018.
[11] S. Dougherty and M. Guay. An extremum-seeking controller for distributed optimization over sensor networks. IEEE Transactions on Automatic Control, 62(2):928–933, 2016.
[12] Shuai Li, Ruofan Kong, and Yi Guo. Cooperative distributed source seeking by multiple robots: Algorithms and experiments. IEEE/ASME Transactions on Mechatronics, 19(6):1810–1820, 2014.
[13] Ruggero Fabbiano, Carlos Canudas De Wit, and Federica Garin. Source localization by gradient estimation based on Poisson integral. Automatica, 50(6):1715–1724, 2014.
[14] Lara Briñón-Arranz, Luca Schenato, and Alexandre Seuret. Distributed source seeking via a circular formation of agents under communication constraints. IEEE Transactions on Control of Network Systems, 3(2):104–115, 2015.
[15] Ruggero Fabbiano, Federica Garin, and Carlos Canudas-de Wit. Distributed source seeking without global position information. IEEE Transactions on Control of Network Systems, 5(1):228–238, 2016.
[16] Nikolay Atanasov, Jerome Le Ny, Nathan Michael, and George J Pappas. Stochastic source seeking in complex environments. In 2012 IEEE International Conference on Robotics and Automation, pages 3013–3018. IEEE, 2012.
[17] Nikolay A Atanasov, Jerome Le Ny, and George J Pappas. Distributed algorithms for stochastic source seeking with mobile robot networks. Journal of Dynamic Systems, Measurement, and Control, 137(3), 2015.
[18] K. Zhou and J. Doyle. Essentials of Robust Control, volume 104. Prentice hall Upper Saddle River, NJ, 1998.
[19] N. Agarwal, B. Bullins, E. Hazan, S. Kakade, and K. Singh. Online control with adversarial disturbances. In Proceedings of 2019 International Conference on Machine Learning, pages 111–119. PMLR, 2019.
[20] D. Foster and M. Simchowitz. Logarithmic regret for adversarial online control. In Proceedings of 2020 International Conference on Machine Learning, pages 3211–3221. PMLR, 2020.
[21] M. Simchowitz, K. Singh, and E. Hazan. Improper learning for non-stochastic control. In Proceedings of 2020 Conference on Learning Theory, pages 3320–3436. PMLR, 2020.
[22] E. Hazan, S. Kakade, and K. Singh. The nonstochastic control problem. In Proceedings of the 31st International Conference on Algorithmic Learning Theory, pages 408–421. PMLR, 2020.
[23] M. Simchowitz. Making non-stochastic control (almost) as easy as stochastic. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pages 18318–18329. PMLR, 2020.
[24] W. Cheung, D. Simchi-Levi, and R. Zhu. Learning to optimize under non-stationarity. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, pages 1079–1087. PMLR, 2019.
[25] Y. Russac, C. Vernade, and O. Cappé. Weighted linear bandits for non-stationary environments. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pages 12040–12049, 2019.
[26] Peng Zhao, Lijun Zhang, Yuan Jiang, and Zhi-Hua Zhou. A simple approach for non-stationary linear bandits. In Silvia Chiappa and Roberto Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 746–755, 2020.
[27] Qin Ding, Cho-Jui Hsieh, and James Sharpnack. Robust stochastic linear contextual bandits under adversarial attacks. In International Conference on Artificial Intelligence and Statistics, pages 7111–7123. PMLR, 2022.
[28] Ilija Bogunovic, Arpan Losalka, Andreas Krause, and Jonathan Scarlett. Stochastic linear bandits robust to adversarial attacks. In International Conference on Artificial Intelligence and Statistics, pages 991–999. PMLR, 2021.
[29] Jiafan He, Dongruo Zhou, Tong Zhang, and Quanquan Gu. Nearly optimal algorithms for linear contextual bandits with adversarial corruptions. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
[30] W. Li, Z. Wang, D. Ho, and G. Wei. On boundedness of error covariances for Kalman consensus filtering problems. IEEE Transactions on Automatic Control, 65(6):2654–2661, 2019.
[31] G. Battistelli and L. Chisci. Kullback–Leibler average, consensus on probability densities, and distributed state estimation with guaranteed stability. Automatica, 50(3):707–718, 2014.
[32] G. Battistelli, L. Chisci, G. Mugnai, A. Farina, and A. Graziano. Consensus-based linear and nonlinear filtering. IEEE Transactions on Automatic Control, 60(5):1410–1415, 2014.
[33] F. Cattivelli and A. Sayed. Diffusion strategies for distributed Kalman filtering and smoothing. IEEE Transactions on automatic control, 55(9):2069–2084, 2010.
[34] Lin Yang, Mohammad Hassan Hajiesmaili, Mohammad Sadegh Talebi, John CS Lui, Wing Shing Wong, et al. Adversarial bandits with corruptions: Regret lower bound and no-regret algorithm. In NeurIPS, 2020.
[35] R. Olfati-Saber and J. Shamma. Consensus filters for sensor networks and distributed sensor fusion. In Proceedings of the 44th IEEE Conference on Decision and Control, pages 6698–6703. IEEE, 2005.

$\displaystyle\\|\mathbf{x}\\|_{M}^{2}$	$\displaystyle\leq\sum_{i=1}^{N}m_{ii}\cdot x_{i}^{2}+\sum_{i=1}^{N}\sum_{j\neq i}\|m_{ij}\|\cdot\|x_{i}x_{j}\|$	(32)
	$\displaystyle\leq\sum_{i=1}^{N}m_{ii}\cdot x_{i}^{2}+\sum_{i=1}^{N}\sum_{j\neq i}\sqrt{m_{ii}m_{jj}}\cdot\|x_{i}x_{j}\|$
	$\displaystyle\leq\sum_{i=1}^{N}m_{ii}\cdot x_{i}^{2}+\sum_{i=1}^{N}\sum_{j\neq i}\frac{1}{2}(m_{ii}\cdot x_{i}^{2}+m_{jj}\cdot x_{j}^{2})$
	$\displaystyle=N\cdot\\|\mathbf{x}\\|^{2}_{\mathcal{D}_{M}}.$

		$\displaystyle\mathbf{x}^{\top}(\widehat{\bm{\phi}}_{k}-\bm{\phi}_{k})$		(36)
		$\displaystyle\overset{(1.a)}{\leq}\big{\\|}A[k:1]^{\top}\mathbf{x}\big{\\|}_{\Upsilon_{k}^{-1}}\cdot\Bigg{(}\big{\\|}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})\big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}\bm{\delta}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\Bigg{)}$
		$\displaystyle\overset{(1.b)}{=}\big{\\|}\mathbf{x}\big{\\|}_{\Sigma_{k}}\cdot\Bigg{(}\big{\\|}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})\big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}\bm{\delta}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\Bigg{)}$
		$\displaystyle\overset{(1.c)}{\leq}{\sqrt{N}}\cdot\big{\\|}\mathbf{x}\big{\\|}_{\mathcal{D}_{\Sigma_{k}}}\cdot\Bigg{(}\big{\\|}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})\big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}\bm{\delta}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\Bigg{)},$

$\displaystyle\big{\\|}\widehat{\bm{\phi}}_{k}$	$\displaystyle-\bm{\phi}_{k}\big{\\|}_{\mathcal{D}^{-1/2}_{\Sigma_{k}},\infty}\leq\big{\\|}\widehat{\bm{\phi}}_{k}-\bm{\phi}_{k}\big{\\|}_{\mathcal{D}^{-1}_{\Sigma_{k}}}$	(37)
	$\displaystyle\leq{\sqrt{N}}\cdot\Bigg{(}\big{\\|}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-{\bm{\phi}}_{0})\big{\\|}_{\Upsilon_{k}^{-1}}$
	$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}Y_{t}\bm{\delta}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}$
	$\displaystyle\hskip 10.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\Bigg{)},$

		$\displaystyle\sum_{k=0}^{K-1}r_{k}\leq\beta^{\prime}_{K}(\delta)\sqrt{N}\cdot\sum_{k=0}^{K-1}\min\big{\{}1,\;1/\sqrt{\bar{v}}\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}}$		(71)
		$\displaystyle=\beta^{\prime}_{K}(\delta)\sqrt{N}\cdot\Bigg{(}\sum_{k=0}^{K-1}\mathbbm{1}_{\lambda_{k}=1}\cdot\min\big{\{}1,\;\sqrt{\lambda_{k}/\bar{v}}\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}}$
		$\displaystyle\hskip 45.0pt+\sum_{k=0}^{K-1}\mathbbm{1}_{\lambda_{k}<1}\cdot\min\big{\{}1,\;1/\sqrt{\bar{v}}\cdot\\|\mathbf{a}_{k}\\|_{\mathcal{D}_{\Sigma_{k}}}\big{\}}\Bigg{)}.$

		$\displaystyle\big{\\|}\widehat{\bm{\phi}}_{k}-\mkern 1.5mu\overline{\mkern-1.5mu\bm{\phi}\mkern-1.5mu}\mkern 1.5mu_{k}\big{\\|}_{\mathcal{D}^{-1/2}_{\Sigma_{k}},\infty}$		(75)
		$\displaystyle\leq\sqrt{N}\cdot\Big{(}\big{\\|}\Sigma_{0}^{-1}(\widehat{\bm{\phi}}_{0}-\widetilde{\bm{\phi}}_{0})\big{\\|}_{\Upsilon_{k}^{-1}}+\lambda_{k-1}\big{\\|}A[k:1]^{-1}\widetilde{\bm{\phi}}^{k}\big{\\|}_{\Upsilon_{k}^{-1}}$
		$\displaystyle\hskip 45.0pt+\Big{\\|}\sum_{t=0}^{k-1}\lambda_{t}A[t:1]^{\top}H_{t}^{\top}V^{-1}\mathbf{n}_{t}\Big{\\|}_{\Upsilon_{k}^{-1}}\Big{)},$