Stability of FFLS-based Diffusion Adaptive Filter Under Cooperative Excitation Condition

Die Gan Siyu Xie Zhixin Liu \IEEEmembershipMember, IEEE and Jinhu Lü \IEEEmembershipFellow, IEEE Corresponding author: Zhixin Liu.This work was supported by Natural Science Foundation of China under Grant T2293772, the National Key R&D Program of China under Grant 2018YFA0703800, the Strategic Priority Research Program of Chinese Academy of Sciences under Grant No. XDA27000000, and National Science Foundation of Shandong Province (ZR2020ZD26).D. Gan is with the Zhongguancun Laboratory, Beijing, China (e-mail: gandie@amss.ac.cn).S. Y. Xie is with the School of Aeronautics and Astronautics, University of Electronic Science and Technology of China, Chengdu 611731, China (e-mail: syxie@uestc.edu.cn).Z. X. Liu is with the Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China. (e-mail: lzx@amss.ac.cn.).J. H. Lü is with the School of Automation Science and Electrical Engineering, Beihang University, Beijing, China, and also with the Zhongguancun Laboratory, Beijing, China (e-mail: jhlu@iss.ac.cn).

Abstract

In this paper, we consider the distributed filtering problem over sensor networks such that all sensors cooperatively track unknown time-varying parameters by using local information. A distributed forgetting factor least squares (FFLS) algorithm is proposed by minimizing a local cost function formulated as a linear combination of accumulative estimation error. Stability analysis of the algorithm is provided under a cooperative excitation condition which contains spatial union information to reflect the cooperative effect of all sensors. Furthermore, we generalize theoretical results to the case of Markovian switching directed graphs. The main difficulties of theoretical analysis lie in how to analyze properties of the product of non-independent and non-stationary random matrices. Some techniques such as stability theory, algebraic graph theory and Markov chain theory are employed to deal with the above issue. Our theoretical results are obtained without relying on the independency or stationarity assumptions of regression vectors which are commonly used in existing literature.

{IEEEkeywords}

Distributed forgetting factor least squares, cooperative excitation condition, exponential stability, stochastic dynamic systems, Markovian switching topology

1 Introduction

\IEEEPARstart

Owing to the capability to process the collaborative data, wireless sensor networks (WSNs) have attracted increasing research attention in diverse areas, including consensus seeking [1][2], resource allocation [3][4], and formation control [5][6]. How to design the distributed adaptive estimation and filtering algorithms to cooperatively estimate unknown parameters has become one of the most important research topics. Compared with centralized estimation algorithms where a fusion center is needed to collect and process information measured by all sensors, the distributed ones can estimate or track an unknown parameter process of interest cooperatively by using local noisy measurements. Therefore, the distributed algorithms are easier to be implemented because of their robustness to network link failure, privacy protection, and reduction on communication and computation costs.

Based on classical estimation algorithms and typical distributed strategies such as the incremental, diffusion and consensus, a number of distributed adaptive estimation or filtering algorithms have been investigated (cf., [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]), e.g., the consensus-based least mean squares (LMS), the diffusion Kalman filter (KF), the diffusion least squares (LS), the incremental LMS, the combination of diffusion and consensus stochastic gradient (SG), the diffusion forgetting factor least squares (FFLS). Performance analysis of the distributed algorithms is also studied under some information conditions. For deterministic signals or deterministic system matrices, Battistelli and Chisci in [7] provided the mean-square boundedness of the state estimation error of the distributed Kalman filter algorithm under a collectively observable condition. Chen et al. in [8] studied the convergence of distributed adaptive identification algorithm under a cooperative persistent excitation (PE) condition. Javed et al. in [9] presented stability analysis of the cooperative gradient algorithm for the deterministic regression vectors satisfying a cooperative PE condition. Note that the signals are often random since they are generated from dynamic systems affected by noises. For the random regression vector case, Barani et al. in [10] studied the convergence of distributed stochastic gradient descent algorithm with independent and identically distributed (i.i.d.) signals. Schizas et al. in [11] provided the stability analysis of a distributed LMS-type adaptive algorithm under the strictly stationary and ergodic regression vectors. Zhang et al. in [12] studied the mean square performance of a diffusion FFLS algorithm with independent input signals. Takahashi et al. in [13] established the performance analysis of the diffusion LMS algorithm for i.i.d. regression vectors. Lei and Chen in [14] established the convergence analysis of the distributed stochastic approximation algorithm with ergodic system signals. Mateos and Giannakis in [15] presented the stability and performance analysis of the distributed FFLS algorithm under the spatio-temporally white regression vectors condition.

We remark that most theoretical results mentioned in the above literature were established by requiring regression vectors to be either deterministic and satisfy PE conditions, or random but satisfy independency, stationarity and ergodicity conditions. In fact, the observed data are often random and hard to satisfy the above statistical assumptions, since they are generated by complex dynamic systems where feedback loops inevitably exist (cf., [20]). The main difficulty in performance analysis of distributed algorithms is to analyze the product of random matrices involved in estimation error equations. In order to relax the above stringent conditions on random regression vectors, some progress has been made on distributed adaptive estimation and filtering algorithms under undirected graphs. For estimating time-invariant parameters, the convergence analysis of distributed SG algorithm and distributed LS algorithm is provided in [21] and [22] under cooperative excitation conditions. For tracking a time-varying parameter, Xie and Guo in [16] and [23] proposed the weakest possible cooperative information conditions to guarantee the stability and performance of consensus-based and diffusion-based LMS algorithms. Compared with LMS algorithm, FFLS algorithm can generate more accurate estimates in the transient phase (see e.g.,[24]), and the stability analysis for the distributed FFLS algorithm is still lacking. In this paper, we focus on the design and stability analysis of distributed FFLS algorithm without relying on the independency, stationarity or ergodicity assumptions on regression vectors.

The information exchange between sensors is an important factor for the performance of distributed estimation algorithms, and previous studies often assume that the networks are undirected and time-invariant. In practice, they might not be bidirectional or time-invariant due to the heterogeneity of sensors and signal losses caused by the temporary deterioration in the communication link. One approach is to model the networks which randomly change over time as an i.i.d. process, see e.g., [25, 26]. However, the loss of connection usually occurs with correlations [27]. Another approach is to model the random switching process as a Markov chain whose states correspond to possible communication topologies, see [27, 28, 29, 30] among many others. Some studies on the distributed algorithms with deterministic or temporally independent measurement matrix under Markovian switching topologies are given in e.g.,[31, 32].

In this paper, we consider the distributed filtering problem over sensor networks where all sensors aim at collectively tracking an unknown randomly time-varying parameter vector. Based on the fact that recent observation data respond to the parameter changes faster than the early data, we introduce a forgetting factor into the local accumulative cost function formulated as a linear combination of local estimation errors between the observation signals and the prediction signals. By minimizing the local cost function, we propose the distributed FFLS algorithm based on the diffusion strategy over the fixed undirected graph. The stability analysis of the distributed FFLS algorithm is provided under a cooperative excitation condition. Moreover, we generalize the theoretical results to the case of Markovian switching directed sensor networks. The key difference from the fixed undirected graph case is that the adjacency matrix is an asymmetric random matrix. We employ the Markov chain theory to deal with the coupled relationship between random adjacency matrices and random regression vectors. The main contributions of this paper can be summarized as the following aspects:

•

In comparison with [16] and [21], the main difficulty is that the random matrices in the error equation of the diffusion FFLS algorithm are not symmetric and the adaptive gain is no longer a scalar. We establish the exponential stability of the homogeneous part of the estimation error equation and the bound of the tracking error by virtue of the specific structure of the proposed diffusion FFLS algorithm and stability theory of stochastic dynamic systems.
•

Different from the theoretical results of distributed FFLS algorithms in [12] and [15] where regression vectors are required to satisfy the independent or spatio-temporally uncorrelated assumptions, our theoretical analysis is obtained without relying on such stringent conditions, which makes it possible to be applied to the stochastic feedback systems.
•

The cooperative excitation condition introduced in this paper is a temporal and spatial union information condition on the random regression vectors, which can reveal the cooperative effect of multiple sensors in a certain sense, i.e., the whole sensor network can cooperatively finish the estimation task, even if any individual sensor cannot due to lack of necessary information.

The remainder of this paper is organized as follows. In Section 2, we give the problem formulation of this paper. Section 3 presents the distributed FFLS algorithm. The stability of the proposed algorithm under fixed undirected graph and Markovian switching directed graphs are given in Section 4 and Section 5, respectively. Finally, we conclude the paper with some remarks in Section 6.

2 Problem Formulation

2.1 Matrix theory

In this paper, we use $\mathbb{R}^{m}$ to denote the set of $m$ -dimensional real vectors, $\mathbb{R}^{m\times n}$ to denote the set of real matrices with $m$ rows and $n$ columns, and $\bm{I}_{m}$ to denote the $m$ -dimensional square identity matrix. For a matrix $\bm{A}\in\mathbb{R}^{m\times n}$ , $\|\bm{A}\|$ denotes its Euclidean norm, i.e., $\|\bm{A}\|\triangleq(\lambda_{\max}(\bm{A}\bm{A}^{T}))^{\frac{1}{2}}$ , where the notation $T$ denotes the transpose operator and $\lambda_{\max}(\cdot)$ denotes the largest eigenvalue of the matrix. Correspondingly, $\lambda_{\min}(\cdot)$ and $tr(\cdot)$ denote the smallest eigenvalue and the trace of the matrix, respectively. The notation ${\rm{col}}(\cdot,\cdots,\cdot)$ is used to denote a vector stacked by the specified vectors, and ${\rm{diag}}(\cdot,\cdots,\cdot)$ is used to denote a block matrix formed in a diagonal manner of the corresponding vectors or matrices.

For a matrix $\bm{A}=[a_{ij}]\in\mathbb{R}^{m\times m}$ , if $\sum_{j=1}^{m}a_{ij}=1$ holds for all $i=1,\cdots,m$ , then it is called stochastic. The Kronecker product of two matrices $\bm{A}$ and $\bm{B}$ is denoted by $\bm{A}\otimes\bm{B}$ . For two real symmetric matrices $\bm{X}\in\mathbb{R}^{n\times n}$ and $\bm{Y}\in\mathbb{R}^{n\times n}$ , $\bm{X}\geq\bm{Y}$ ( $\bm{X}>\bm{Y}$ , $\bm{X}\leq\bm{Y}$ , $\bm{X}<\bm{Y}$ ) means that $\bm{X}-\bm{Y}$ is a semi-positive (positive, semi-negative, negative) definite matrix. For a matrix sequence $\{\bm{A}_{t}\}$ and a positive scalar sequence $\{a_{t}\}$ , the equation $\bm{A}_{t}=O(a_{t})$ means that there exists a positive constant $C$ independent of $t$ and $a_{t}$ such that $\|\bm{A}_{t}\|\leq Ca_{t}$ holds for all $t\geq 0$ .

The matrix inversion formula is often used in this paper and we list it as follows.

Lemma 2.1 (Matrix inversion formula [33])

For any matrices $\bm{A}$ , $\bm{B}$ , $\bm{C}$ and $\bm{D}$ with suitable dimensions, the following formula

(\bm{A}+\bm{B}\bm{D}\bm{C})^{-1}=\bm{A}^{-1}-\bm{A}^{-1}\bm{B}(\bm{D}^{-1}+\bm{C}\bm{A}^{-1}\bm{B})^{-1}\bm{C}\bm{A}^{-1}

holds, provided that the relevant matrices are invertible.

2.2 Graph theory

We use graphs to model the communication topology between sensors. A directed graph $\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{A})$ is composed of a vertex set $\mathcal{V}=\{1,2,3,\cdots,n\}$ which stands for the set of sensors (i.e., nodes), $\mathcal{E}\subset\mathcal{V}\times\mathcal{V}$ is the edge set, and $\mathcal{A}=[a_{ij}]_{1\leq i,j\leq n}$ is the weighted adjacency matrix. A directed edge $(i,j)\in\mathcal{E}$ means that the $j$ -th sensor can receive the data from the $i$ -th sensor, and sensors $i$ and $j$ are called the parent and child sensors, respectively. The elements of matrix $\mathcal{A}$ satisfy $a_{ij}>0$ if $(i,j)\in\mathcal{E}$ and $a_{ij}=0$ otherwise. The in-degree and out-degree of sensor $i$ are defined by $\deg_{in}(i)=\sum^{n}_{j=1}a_{ji}$ and $\deg_{out}(i)=\sum^{n}_{j=1}a_{ij}$ respectively. The digraph $\mathcal{G}$ is called balanced if $\deg_{in}(i)=\deg_{out}(i)$ for $i=1,...,n$ . Here, we assume that $\mathcal{A}$ is a stochastic matrix. The neighbor set of $i$ is denoted as $\mathcal{N}_{i}=\{j\in\mathcal{V},(j,i)\in\mathcal{E}\}$ , and the sensor $i$ is also included in this set. For a given positive integer $k$ , the union of $k$ digraphs $\{\mathcal{G}_{j}=(\mathcal{V},\mathcal{E}_{j},\mathcal{A}_{j}),1\leq j\leq k\}$ with the same node set is denoted by $\cup^{k}_{j=1}\mathcal{G}_{j}=(\mathcal{V},\cup^{k}_{j=1}\mathcal{E}_{j},\frac{1}{k}\sum^{k}_{j=1}\mathcal{A}_{j})$ . A directed path from $i_{1}$ to $i_{l}$ consists of a sequence of sensors $i_{1},i_{2},...i_{l}(l\geq 2)$ , such that $(i_{k},i_{k+1})\in\mathcal{E}$ for $k=1,...,l-1$ . The digraph $\mathcal{G}$ is said to be strongly connected if for any senor there exist directed paths from this sensor to all other sensors. For the graph $\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{A})$ , if $a_{ij}=a_{ji}$ for all $i,j\in\mathcal{V}$ , then it is called an undirected graph. The diameter $D_{\mathcal{G}}$ of the undirected graph $\mathcal{G}$ is defined as the maximum shortest length of paths between any two sensors.

2.3 Observation model

Consider a network consisting of $n$ sensors (labeled $1,\cdots,n$ ) whose task is to estimate an unknown time-varying parameter $\bm{\theta}_{t}$ by cooperating with each other. We assume that the measurement $\{y_{t,i},\bm{\varphi}_{t,i}\}$ at the sensor $i$ obeys the following discrete-time stochastic regression model,

\displaystyle y_{t+1,i}=\bm{\varphi}_{t,i}^{T}\bm{\theta}_{t}+w_{t+1,i},

(1)

where $y_{t,i}$ is the scalar output of the sensor $i$ at time $t$ , $\bm{\varphi}_{t,i}\in\mathbb{R}^{m}$ is the random regression vector, $\{w_{t,i}\}$ is a noise process, and $\bm{\theta}_{t}$ is the unknown $m$ -dimensional time-varying parameter whose variation at time $t$ is denoted by $\Delta\bm{\theta}_{t}$ , i.e.,

\displaystyle\Delta\bm{\theta}_{t}\triangleq\bm{\theta}_{t+1}-\bm{\theta}_{t},~{}~{}t\geq 0.

(2)

Note that when $\Delta\bm{\theta}_{t}\equiv 0$ , $\bm{\theta}_{t}$ becomes a constant vector. For the special case where $w_{t+1,i}$ is a moving average process and $\bm{\varphi}_{t,i}$ consists of current and past input-output data, i.e.,

\displaystyle\bm{\varphi}_{t,i}^{T}=[y_{t,i},\cdots,y_{t-p,i},u_{t,i},\cdots,u_{t-q,i}]

with $u_{t,i}$ being the input signal of the sensor $i$ at time $t$ , then the model (1) can be reduced to ARMAX model with time-varying coefficients.

3 The distributed FFLS Algorithm

Tracking a time-varying signal is a fundamental problem in system identification and signal processing. The well-known recursive least squares estimator with a constant forgetting factor $\alpha\in(0,1)$ is often used to track time-varying parameters, which is defined by

\displaystyle\bm{\hat{\theta}}_{t+1,i}\triangleq\arg\min_{\bm{\beta}}\sum^{t}_{k=0}\alpha^{t-k}(y_{k+1,i}-{\bm{\beta}}^{T}\bm{\varphi}_{k,i})^{2}.

(3)

With some simple manipulations using the matrix inversion formula, we can obtain the following recursive FFLS algorithm (Algorithm 1) for an individual sensor.

Algorithm 1 Standard non-cooperative FFLS algorithm

For any given sensor $i\in\{1,...,n\}$ , begin with an initial estimate $\bm{\hat{\theta}}_{0,i}\in\mathbb{R}^{m}$ and an initial positive definite matrix $\bm{P}_{0,i}\in\mathbb{R}^{m\times m}$ . The standard FFLS is recursively defined at time $t\geq 0$ as follows,

	$\displaystyle\bm{\hat{\theta}}_{t+1,i}$	$\displaystyle=\bm{\hat{\theta}}_{t,i}+\frac{\bm{P}_{t,i}\bm{\varphi}_{t,i}}{\alpha+\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}\bm{\varphi}_{t,i}}(y_{t+1,i}-\bm{\varphi}^{T}_{t,i}\bm{\hat{\theta}}_{t,i}),$
	$\displaystyle\bm{P}_{t+1,i}$	$\displaystyle=\frac{1}{\alpha}\left(\bm{P}_{t,i}-\frac{\bm{P}_{t,i}\bm{\varphi}_{t,i}\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}}{\alpha+\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}\bm{\varphi}_{t,i}}\right).$

However, due to the limited sensing ability of each sensor, it is often the case where the measurements obtained by each sensor can only reflect partial information of the unknown parameter. In such a case, if only local measurements of the sensor itself are utilized to perform the estimation task (see Algorithm 1), then at most part of the unknown parameter rather than the whole vector can be estimated. Thus, in this paper, we aim at designing a distributed adaptive estimation algorithm such that all sensors cooperatively track the unknown time-varying parameter $\bm{\theta}_{t}$ by using random regression vectors and the observation signals from its neighbors. To simplify the analysis, in this section, we use a fixed undirected graph $\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{A})$ to model the communication topology of $n$ sensors.

We first introduce the following local cost function $\sigma_{t+1,i}(\bm{\beta})$ for each sensor $i$ at the time instant $t\geq 0$ recursively formulated as a linear combination of its neighbors’ local estimation error between the observation signal and the prediction signal,

\sigma_{t+1,i}(\bm{\beta})=\sum_{j\in\mathcal{N}_{i}}a_{ij}\bigg{(}\alpha\sigma_{t,j}(\bm{\beta})+(y_{t+1,j}-{\bm{\beta}}^{T}\bm{\varphi}_{t,j})^{2}\bigg{)}.

(4)

with $\sigma_{0,i}(\bm{\beta})=0$ . Set

	$\displaystyle\bm{\sigma}_{t}(\bm{\beta})={\rm{col}}\{\sigma_{t,1}(\bm{\beta}),\cdots,\sigma_{t,n}(\bm{\beta})\},$
	$\displaystyle\bm{e}_{t+1}(\bm{\beta})={\rm{col}}\{(y_{t+1,1}-{\bm{\beta}}^{T}\bm{\varphi}_{t,1})^{2},\cdots,(y_{t+1,n}-{\bm{\beta}}^{T}\bm{\varphi}_{t,n})^{2}\}.$

Hence by (4), we have

$\displaystyle\bm{\sigma}_{t+1}(\bm{\beta})$	$\displaystyle=$	$\displaystyle\alpha\mathcal{A}\bm{\sigma}_{t}(\bm{\beta})+\mathcal{A}\bm{e}_{t+1}(\bm{\beta})$
	$\displaystyle=$	$\displaystyle\alpha^{2}\mathcal{A}^{2}\bm{\sigma}_{t-1}(\bm{\beta})+\alpha\mathcal{A}^{2}\bm{e}_{t}(\bm{\beta})+\mathcal{A}\bm{e}_{t+1}(\bm{\beta})$
	$\displaystyle=$	$\displaystyle\cdots$
	$\displaystyle=$	$\displaystyle\alpha^{t+1}\mathcal{A}^{t+1}\bm{\sigma}_{0}(\bm{\beta})+\sum^{t}_{k=0}\alpha^{t-k}\mathcal{A}^{t+1-k}\bm{e}_{k+1}(\bm{\beta})$
	$\displaystyle=$	$\displaystyle\sum^{t}_{k=0}\alpha^{t-k}\mathcal{A}^{t+1-k}\bm{e}_{k+1}(\bm{\beta}),$

which implies that

\displaystyle\sigma_{t+1,i}(\bm{\beta})=\sum^{n}_{j=1}\sum^{t}_{k=0}\alpha^{t-k}a^{(t+1-k)}_{ij}(y_{k+1,j}-{\bm{\beta}}^{T}\bm{\varphi}_{k,j})^{2},

(5)

where $a^{(t+1-k)}_{ij}$ is the $i$ -th row, $j$ -th column entry of the matrix $\mathcal{A}^{t+1-k}$ .

By minimizing the local cost function $\sigma_{t+1,i}(\bm{\beta})$ in (5), we obtain the distributed FFLS estimate $\bm{\hat{\theta}}_{t+1,i}$ of the unknown time-varying parameter for sensor $i$ , i.e.,

$\displaystyle\bm{\hat{\theta}}_{t+1,i}$	$\displaystyle\triangleq$	$\displaystyle\arg\min_{\bm{\beta}}\sigma_{t+1,i}(\bm{\beta})$	(6)
	$\displaystyle=$	$\displaystyle\left[\sum^{n}_{j=1}\sum^{t}_{k=0}\alpha^{t-k}a^{(t+1-k)}_{ij}\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}\right]^{-1}$
		$\displaystyle\left(\sum^{n}_{j=1}\sum^{t}_{k=0}\alpha^{t-k}a^{(t+1-k)}_{ij}\bm{\varphi}_{k,j}y_{k+1,j}\right).$

Denote $\bm{P}_{t+1,i}=\left(\sum^{n}_{j=1}\sum^{t}_{k=0}\alpha^{t-k}a^{(t+1-k)}_{ij}\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}\right)^{-1}.$ Then we write it into the following recursive form,

\displaystyle\bm{P}^{-1}_{t+1,i}=\sum_{j\in\mathcal{N}_{i}}a_{ij}(\alpha{{\bm{P}}}^{-1}_{t,j}+\bm{\varphi}_{t,j}\bm{\varphi}^{T}_{t,j}).

(7)

By (6), we similarly have

\displaystyle\bm{\hat{\theta}}_{t+1,i}=\bm{P}_{t+1,i}\sum_{j\in\mathcal{N}_{i}}a_{ij}(\alpha\bm{P}^{-1}_{t,j}\bm{\hat{\theta}}_{t,j}+\bm{\varphi}_{t,j}y_{t+1,j}).

(8)

Note that in the above derivation, we assume that the matrix $\sum^{n}_{j=1}\sum^{t}_{k=0}\alpha^{t-k}a^{(t+1-k)}_{ij}\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}$ is invertible which is usually not satisfied for small $t$ . To solve this problem, we take the initial matrix $\bm{P}_{0,i}$ to be positive definite. Then (7) can be modified into the following equation,

	$\displaystyle\bm{P}_{t+1,i}=$	$\displaystyle\Bigg{(}\sum^{n}_{j=1}\sum^{t}_{k=0}\alpha^{t-k}a^{(t+1-k)}_{ij}\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}$
		$\displaystyle+\sum^{n}_{j=1}\alpha^{t+1}a^{(t+1)}_{ij}\bm{P}^{-1}_{0,j}\Bigg{)}^{-1}.$		(9)

Though, the estimate given by (8) has a slight difference with (6), which does not affect the analysis of the asymptotic properties of the estimates.

To design the distributed algorithm, we denote

\displaystyle\bm{\bar{P}}^{-1}_{t+1,i}=\alpha{{\bm{P}}}^{-1}_{t,i}+\bm{\varphi}_{t,i}\bm{\varphi}^{T}_{t,i}.

(10)

By Lemma 2.1, we have $\bm{\bar{P}}_{t+1,i}=\frac{1}{\alpha}(\bm{P}_{t,i}-\frac{\bm{P}_{t,i}\bm{\varphi}_{t,i}\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}}{\alpha+\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}\bm{\varphi}_{t,i}})$ . Hence,

	$\displaystyle\bm{\bar{\theta}}_{t+1,i}$	$\displaystyle\triangleq$	$\displaystyle\bm{\bar{P}}_{t+1,i}(\alpha\bm{P}^{-1}_{t,i}\bm{\hat{\theta}}_{t,i}+\bm{\varphi}_{t,i}y_{t+1,i})$
		$\displaystyle=$	$\displaystyle\bm{\hat{\theta}}_{t,i}+\frac{\bm{P}_{t,i}\bm{\varphi}_{t,i}}{\alpha+\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}\bm{\varphi}_{t,i}}(y_{t+1,i}-\bm{\varphi}^{T}_{t,i}\bm{\hat{\theta}}_{t,i}).$

Therefore, we get the following distributed FFLS algorithm of diffusion type, i.e., Algorithm 2.

Algorithm 2 Distributed FFLS algorithm

Input: $\{\bm{\varphi}_{t,i},y_{t+1,i}\}^{n}_{i=1}$ , $t=0,1,2,\cdots$
Output: $\{\bm{\hat{\theta}}_{t+1,i}\}^{n}_{i=1}$ , $t=0,1,2,\cdots$

Initialization: For each sensor

i\in\{1,\cdots,n\}

, begin with an initial vector

\bm{\hat{\theta}}_{0,i}

and an initial positive definite matrix

\bm{P}_{0,i}>0

for each time

t=0,1,2,\cdots

for each sensor

i=1,\cdots,n

\mathbf{Step\ 1.}

Adaption (generate

\bm{\bar{\theta}}_{t+1,i}

and

\bm{\bar{P}}_{t+1,i}

based on

\bm{\hat{\theta}}_{t,i}

\bm{P}_{t,i}

\bm{\varphi}_{t,i}

and

y_{t+1,i}

	$\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\bm{\bar{\theta}}_{t+1,i}$	$\displaystyle=\bm{\hat{\theta}}_{t,i}+\frac{\bm{P}_{t,i}\bm{\varphi}_{t,i}}{\alpha+\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}\bm{\varphi}_{t,i}}(y_{t+1,i}-\bm{\varphi}^{T}_{t,i}\bm{\hat{\theta}}_{t,i}),$		(11)
	$\displaystyle\bm{\bar{P}}_{t+1,i}$	$\displaystyle=\frac{1}{\alpha}\left(\bm{P}_{t,i}-\frac{\bm{P}_{t,i}\bm{\varphi}_{t,i}\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}}{\alpha+\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}\bm{\varphi}_{t,i}}\right),$		(12)

\mathbf{Step\ 2.}

Combination (generate

\bm{P}^{-1}_{t+1,i}

and

\bm{\hat{\theta}}_{t+1,i}

by a convex combination of

\bm{\bar{\theta}}_{t+1,j}

and

\bm{\bar{P}}_{t+1,j}

	$\displaystyle\bm{P}^{-1}_{t+1,i}$	$\displaystyle=\sum_{j\in\mathcal{N}_{i}}a_{ij}\bm{\bar{P}}^{-1}_{t+1,j},$		(13)
	$\displaystyle\bm{\hat{\theta}}_{t+1,i}$	$\displaystyle=\bm{P}_{t+1,i}\sum_{j\in\mathcal{N}_{i}}a_{ij}\bm{\bar{P}}^{-1}_{t+1,j}\bm{\bar{\theta}}_{t+1,j}.$		(14)

Note that when $\mathcal{A}=\bm{I}_{n}$ , the distributed FFLS algorithm will degenerate to the classical FFLS (i.e., Algorithm 1), and when $\alpha=1$ , the distributed FFLS algorithm will degenerate to the distributed LS in [22] which is used to estimate the time-invariant parameter. The quantity $1-\alpha$ is usually referred to as the speed of adaption. Intuitively, when the parameter process $\{\bm{\theta}_{t}\}$ is slowly time-varying, the adaptation speed should also be slow (i.e., $\alpha$ is large). The purpose of this paper is to establish the stability of the above diffusion FFLS-based adaptive filter without independence or stationarity assumptions on random regression vector $\{\bm{\varphi}_{t,i}\}$ .

In order to analyze the distributed FFLS algorithm, we need to derive the estimation error equation. Denote $\bm{\widetilde{\theta}}_{t,i}\triangleq\bm{\theta}_{t}-\bm{\hat{\theta}}_{t,i}$ , then from (13) and (14), we have

	$\displaystyle\bm{\widetilde{\theta}}_{t+1,i}=\bm{\theta}_{t+1}-\bm{P}_{t+1,i}\sum_{j\in\mathcal{N}_{i}}a_{ij}\bm{\bar{P}}^{-1}_{t+1,j}\bm{\bar{\theta}}_{t+1,j}$
$\displaystyle=$	$\displaystyle\bm{P}_{t+1,i}\sum_{j\in\mathcal{N}_{i}}a_{ij}\bm{\bar{P}}^{-1}_{t+1,j}\bm{\theta}_{t+1}-\bm{P}_{t+1,i}\sum_{j\in\mathcal{N}_{i}}a_{ij}\bm{\bar{P}}^{-1}_{t+1,j}\bm{\bar{\theta}}_{t+1,j}$
$\displaystyle=$	$\displaystyle\bm{P}_{t+1,i}\sum_{j\in\mathcal{N}_{i}}a_{ij}\bm{\bar{P}}^{-1}_{t+1,j}(\bm{\theta}_{t+1}-\bm{\bar{\theta}}_{t+1,j}).$	(15)

By (1), (2), (11) and (12), we can obtain the following equation,

	$\displaystyle~{}~{}~{}~{}\bm{\theta}_{t+1}-\bm{\bar{\theta}}_{t+1,i}$
	$\displaystyle=\bm{\theta}_{t}+\Delta\bm{\theta}_{t}-\bm{\hat{\theta}}_{t,i}-\frac{\bm{P}_{t,i}\bm{\varphi}_{t,i}}{\alpha+\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}\bm{\varphi}_{t,i}}(y_{t+1,i}-\bm{\varphi}^{T}_{t,i}\bm{\hat{\theta}}_{t,i})$
	$\displaystyle=\Big{(}\bm{I}_{m}-\frac{\bm{P}_{t,i}\bm{\varphi}_{t,i}\bm{\varphi}^{T}_{t,i}}{\alpha+\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}\bm{\varphi}_{t,i}}\Big{)}\bm{\widetilde{\theta}}_{t,i}-\frac{\bm{P}_{t,i}\bm{\varphi}_{t,i}w_{t+1,i}}{\alpha+\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}\bm{\varphi}_{t,i}}+\Delta\bm{\theta}_{t}$
	$\displaystyle=\alpha\bm{\bar{P}}_{t+1,i}\bm{P}^{-1}_{t,i}\bm{\widetilde{\theta}}_{t,i}-\frac{\bm{P}_{t,i}\bm{\varphi}_{t,i}w_{t+1,i}}{\alpha+\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}\bm{\varphi}_{t,i}}+\Delta\bm{\theta}_{t}.$		(16)

For convenience of analysis, we introduce the following set of notations,

	$\displaystyle\bm{Y}_{t}={\rm{col}}\{y_{t,1},\cdots,y_{t,n}\},\hskip 55.19841pt(n\times 1)$
	$\displaystyle\bm{\Phi}_{t}={\rm{diag}}\{\bm{\varphi}_{t,1},\cdots,\bm{\varphi}_{t,n}\},\hskip 42.67912pt(mn\times n)$
	$\displaystyle\bm{W}_{t}={\rm{col}}\{w_{t,1},\cdots,w_{t,n}\},\hskip 46.94687pt(n\times 1)$
	$\displaystyle\bm{P}_{t}={\rm{diag}}\{\bm{P}_{t,1},\cdots,\bm{P}_{t,n}\},\hskip 45.52458pt(mn\times mn)$
	$\displaystyle\bm{\bar{P}}_{t}={\rm{diag}}\{\bm{\bar{P}}_{t,1},\cdots,\bm{\bar{P}}_{t,n}\},\hskip 45.52458pt(mn\times mn)$
	$\displaystyle\bm{\Theta}_{t}={\rm{col}}\{\underbrace{\bm{\theta}_{t},\cdots,\bm{\theta}_{t}}_{n}\},\hskip 68.28644pt(mn\times 1)$
	$\displaystyle\Delta\bm{\Theta}_{t}={\rm{col}}\{\underbrace{\Delta\bm{\theta}_{t},\cdots,\Delta\bm{\theta}_{t}}_{n}\},\hskip 44.10185pt(mn\times 1)$
	$\displaystyle\bm{L}_{t}={\rm{diag}}\{\bm{L}_{t,1},\cdots,\bm{L}_{t,n}\},\hskip 47.23167pt(mn\times n)$
	$\displaystyle\hskip 28.45274pt{\rm where}~{}~{}\bm{L}_{t,i}=\frac{\bm{P}_{t,i}\bm{\varphi}_{t,i}}{\alpha+\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}\bm{\varphi}_{t,i}},$
	$\displaystyle\bm{\widetilde{\Theta}}_{t}={\rm{col}}\{\bm{\widetilde{\theta}}_{t,1},\cdots,\bm{\widetilde{\theta}}_{t,n}\},\hskip 56.9055pt(mn\times 1)$
	$\displaystyle\mathscr{A}=\mathcal{A}\otimes\bm{I}_{m},\hskip 104.70593pt(mn\times mn)$

Hence by (15) and (16), we have the following equation about estimation error,

\displaystyle\bm{\widetilde{\Theta}}_{t+1}=\alpha\bm{P}_{t+1}\mathscr{A}\bm{P}^{-1}_{t}\bm{\widetilde{\Theta}}_{t}-\bm{P}_{t+1}\mathscr{A}\bm{\bar{P}}^{-1}_{t+1}(\bm{L}_{t}\bm{W}_{t+1}+\Delta\bm{\Theta}_{t}).

(17)

From (17), we see that the properties of product of random matrices, i.e., $\prod_{t}\alpha\bm{P}_{t+1}\mathscr{A}\bm{P}^{-1}_{t}$ , play important roles in stability analysis of the homogeneous part in error equation.

As we all know, the analysis of product of random matrices is generally a difficult mathematical problem if the random matrices do not satisfy the independency or stationarity assumptions. There is some work to study this problem, which focuses on either symmetric random matrix or scalar gain case. For example, [21] and [16] investigated the convergence of consensus-diffusion SG algorithm and the stability of consensus normalized LMS algorithm where the random matrices in error equations are symmetric. Note that the random matrices $\alpha\bm{P}_{t+1}\mathscr{A}\bm{P}^{-1}_{t}$ here are asymmetric. Although [23] studied the properties of the asymmetric random matrices in the LMS-based estimation error equation, the adaptive gain of distributed LMS algorithm in [23] is a scalar while the gain $\frac{\bm{P}_{t,i}}{\alpha+\bm{\varphi}^{T}_{t,i}\bm{P}_{t,i}\bm{\varphi}_{t,i}}$ in (11) of this paper is a random matrix. Hence the methods used in existing literature including [16, 21, 23] are no longer applicable to our case. One of the main purposes of this paper is to overcome the above difficulties by using both the specific structure of the diffusion FFLS and some results of FFLS on single sensor case (see [34]).

4 Stability of distributed FFLS algorithm under fixed undirected graph

In this section, we will establish exponential stability for the homogeneous part of the error equation (17) and the tracking error bounds for the proposed distributed FFLS algorithm in Algorithm 2 without requiring statistical independence on the system signals. For this purpose, we need to introduce some definitions on the stability of random matrices (see [34]) and assumptions on the graph and random regression vectors.

4.1 Some definitions

Definition 4.1

A random matrix sequence $\{\bm{A}_{t},t\geq 0\}$ defined on the basic probability space $(\Omega,\mathscr{F},P)$ is called $L_{p}$ -stable $(p>0)$ if $\sup_{t\geq 0}\mathbb{E}(\|\bm{A}_{t}\|^{p})<\infty$ , where $\mathbb{E}(\cdot)$ denotes the mathematical expectation operator. We define $\|\bm{A}_{t}\|_{L_{p}}\triangleq[\mathbb{E}(\|\bm{A}_{t}\|^{p})]^{\frac{1}{p}}$ as the $L_{p}$ -norm of the random matrix $\bm{A}_{t}$ .

Definition 4.2

A sequence of $n\times n$ random matrices $\bm{A}=\{\bm{A}_{t},t\geq 0\}$ is called $L_{p}$ -exponentially stable $(p\geq 0)$ with parameter $\lambda\in[0,1)$ , if it belongs to the following set

	$\displaystyle S_{p}(\lambda)=\Big{\{}$	$\displaystyle\bm{A}:\Big{\\|}\prod^{t}_{j=k+1}\bm{A}_{j}\Big{\\|}_{L_{p}}\leq M\lambda^{t-k},\forall t\geq k,$
		$\displaystyle\forall k\geq 0,{\rm for~{}some}~{}M>0\Big{\}}.$		(18)

As demonstrated by Guo in [34], $\{\bm{A}_{t},t\geq 0\}\in S_{p}(\lambda)$ is in some sense the necessary and sufficient condition for stability of $\{\bm{x}_{t}\}$ generated by $\bm{x}_{t}=\bm{A}_{t}\bm{x}_{t}+\bm{\xi}_{t+1},~{}t\geq 0$ . Also, the stability analysis of the matrix sequence may be reduced to that of a certain class of scalar sequence, which can be further analyzed based on some excitation conditions on the regressors. To this end, we introduce the following subset of $S_{1}(\lambda)$ for a scalar sequence $a=(a_{t},t\geq 0)$ .

	$\displaystyle S^{0}(\lambda)=\Big{\{}$	$\displaystyle a:a_{t}\in[0,1),\mathbb{E}\left(\prod^{t}_{j=k+1}a_{j}\right)\leq M\lambda^{t-k},$
		$\displaystyle\forall t\geq k,\forall k\geq 0,{\rm for~{}some}~{}M>0\Big{\}}.$

The definition $S^{0}(\lambda)$ will be used when we convert the product of a random matrix to that of a scalar sequence.

Remark 4.1

It is clear that if there exist a constant $a_{0}\in(0,1)$ such that $a_{t}\leq a_{0}$ for all $t$ , then $a_{t}\in S^{0}(a_{0})$ . More properties about the set $S^{0}(\lambda)$ can be found in [35].

4.2 Assumptions

Assumption 4.1

The undirected graph $\mathcal{G}$ is connected.

Remark 4.2

For any $k>1$ , we denote $\mathcal{A}^{k}\triangleq(a_{ij}^{(k)})$ with $\mathcal{A}$ being the weighted adjacency matrix of the graph $\mathcal{G}$ , i.e., $a_{ij}^{(k)}$ is the $i$ -th row, $j$ -th column element of the matrix $\mathcal{A}^{k}$ . Under Assumption 4.1, it is clear that $\mathcal{A}^{k}$ is a positive matrix for $k\geq D_{\mathcal{G}}$ , which means that $a_{ij}^{(k)}>0$ for any $i$ and $j$ (cf., [36]).

Assumption 4.2 (Cooperative Excitation Condition)

For the adapted sequences $\{\bm{\varphi}_{t,i},\mathscr{F}_{t},t\geq 0\}$ , where $\mathscr{F}_{t}$ is a sequence of non-decreasing $\sigma$ -algebras, there exists an integer $h>0$ such that $\{1-\lambda_{t}\}\in S^{0}(\lambda)$ for some $\lambda\in(0,1)$ , where $\lambda_{t}$ is defined by

\displaystyle\lambda_{t}\triangleq\lambda_{\min}\left[\mathbb{E}\left(\frac{1}{n(1+h)}\sum^{n}_{i=1}\sum^{(t+1)h}_{k=th+1}\frac{\bm{\varphi}_{k,i}\bm{\varphi}^{T}_{k,i}}{1+\|\bm{\varphi}_{k,i}\|^{2}}\Big{|}\mathscr{F}_{th}\right)\right]

with $\mathbb{E}(\cdot|\cdot)$ being the conditional mathematical expectation operator.

Remark 4.3

Assumption 4.2 is also used to guarantee the stability and performance of the distributed LMS algorithm (see e.g., [16, 23]). We give some intuitive explanations for the above cooperative excitation condition about the following two aspects.

(1) “Why excitation”. Let us consider an extreme case where all regression vectors $\bm{\varphi}_{k,i}$ are equal to zero, then Assumption 4.2 can not be satisfied. Moreover, from (1), we see that the unknown parameter $\bm{\theta}_{t}$ can not be estimated or tracked since the observations $y_{t,i}$ do not contain any information about the unknown parameter $\bm{\theta}_{t}$ . In order to estimate $\bm{\theta}_{t}$ , some nonzero information condition (named excitation condition) should be imposed on the regression vectors $\bm{\varphi}_{t,i}$ . In fact, Assumption 4.2 intuitively gives a lower bound (which may be changed over time) of the sequence $\{\lambda_{t}\}$ . For example, if there exists a constant $\lambda_{0}\in(0,1)$ such that $\inf_{t}\lambda_{t}\geq\lambda_{0}$ , then by Remark 4.1, we know that Assumption 4.2 can be satisfied.

(2) “Why cooperative”. Compared with the excitation condition for FFLS algorithm of single sensor case in [34], i.e., there exists a constant $h>0$ such that

\displaystyle\{1-{\lambda}{{}^{\prime}}_{t},t\geq 0\}\in S^{0}({\lambda}{{}^{\prime}})

(19)

for some ${\lambda}{{}^{\prime}}$ where

\displaystyle{\lambda}{{}^{\prime}}_{t}=\lambda_{\min}\left[\mathbb{E}\left(\frac{1}{1+h}\sum^{(t+1)h}_{k=th+1}\frac{\bm{\varphi}_{k,i}\bm{\varphi}^{T}_{k,i}}{1+\|\bm{\varphi}_{k,i}\|^{2}}\Big{|}\mathscr{F}_{th}\right)\right].

Assumption 4.2 contains not only temporal union information but also spatial union information of all the sensors, which means that Assumption 4.2 is much weaker than the condition (19) since $\lambda_{t}\geq{\lambda}{{}^{\prime}}_{t}$ when $n>1$ . Besides, we also note that Assumption 4.2 can be reduced to the condition (19) when $n=1$ . In fact, Assumption 4.2 can reflect the cooperative effect of multiple sensors in the sense that the estimation task can be still fulfilled by the cooperation of multiple sensors even if any of them cannot.

4.3 Main results

In order to establish exponential stability of the product of random matrices $\alpha\bm{P}_{t+1}\mathscr{A}\bm{P}^{-1}_{t}$ , we first analyze the properties of the random matrix $\bm{P}_{t}$ to obtain its upper bound.

Lemma 4.1

For $\{\bm{P}_{t}\}$ generated by (12) and (13), under Assumptions 4.1-4.2, we have

\displaystyle T_{t+1}\leq\frac{1}{\alpha^{h^{\prime}}}(1-\beta_{t+1})(h^{\prime}-D_{\mathcal{G}})tr(\bm{P}_{th^{\prime}+1}).

(20)

where

	$\displaystyle T_{t}$	$\displaystyle\triangleq\sum^{th^{\prime}}_{k=(t-1)h^{\prime}+D_{\mathcal{G}}+1}tr(\bm{P}_{k+1}),~{}T_{0}=0,$
	$\displaystyle\beta_{t+1}$	$\displaystyle\triangleq\frac{a^{2}_{\min}\gamma_{t+1}}{n(h^{\prime}-D_{\mathcal{G}})\left(\alpha^{h^{\prime}}+\lambda_{\max}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)\right)tr(\bm{P}_{th^{\prime}+1})},$
	$\displaystyle\gamma_{t+1}$	$\displaystyle\triangleq tr\left(\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)^{2}\sum^{(t+1)h^{\prime}}_{k=th^{\prime}+D_{\mathcal{G}}+1}\sum^{n}_{j=1}\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{(1+\\|\bm{\varphi}_{k,j}\\|^{2})}\right),$
	$\displaystyle a_{\min}$	$\displaystyle\triangleq\min\limits_{i,j\in\{1,\cdots,n\}}a^{(D_{\mathcal{G}})}_{ij}>0,$
	$\displaystyle h^{\prime}$	$\displaystyle\triangleq 2h+D_{\mathcal{G}},$

and $h$ is given by Assumption 4.2.

Proof 4.1.

Note that $a^{(k)}_{ij}$ is the $i$ -th row, $j$ -th column element of the matrix $\mathcal{A}^{k},$ $k\geq 1$ , where $a^{(1)}_{ij}=a_{ij}$ . By (10), we have $\bm{P}^{-1}_{k+1,i}\geq\sum^{n}_{j=1}a_{ij}{\alpha\bm{P}^{-1}_{k,j}}$ . Hence by the inequality

\displaystyle\Big{(}\sum^{n}_{j=1}a_{ij}{\bm{A}_{j}}\Big{)}^{-1}\leq\sum^{n}_{j=1}a_{ij}{\bm{A}^{-1}_{j}}

(21)

with $\bm{A}_{j}\geq 0$ , we obtain for any $t\geq 0$ , and any $k\in[th^{\prime}+D_{\mathcal{G}}+1,(t+1)h^{\prime}]$ ,

$\displaystyle\bm{P}_{k,i}$	$\displaystyle\leq\Big{(}\sum^{n}_{j=1}a_{ij}{\alpha\bm{P}^{-1}_{k-1,j}}\Big{)}^{-1}\leq\frac{1}{\alpha}\sum^{n}_{j=1}a_{ij}{\bm{P}_{k-1,j}}$
	$\displaystyle\leq\frac{1}{\alpha}\sum^{n}_{j=1}a_{ij}\left(\frac{1}{\alpha}\sum^{n}_{l=1}a_{jl}\bm{P}_{k-2,l}\right)$
	$\displaystyle=\frac{1}{\alpha^{2}}\sum^{n}_{j=1}a^{(2)}_{ij}\bm{P}_{k-2,j}\leq\cdots$
	$\displaystyle\leq\frac{1}{\alpha^{k-th^{\prime}-1}}\sum^{n}_{j=1}a^{({k-th^{\prime}-1})}_{ij}\bm{P}_{th^{\prime}+1,j}$
	$\displaystyle\leq\frac{1}{\alpha^{h^{\prime}-1}}\sum^{n}_{j=1}a^{({k-th^{\prime}-1})}_{ij}\bm{P}_{th^{\prime}+1,j}.$	(22)

Denote $\bm{Q}^{k,th^{\prime}}_{i}=\sum^{n}_{j=1}a^{({k-th^{\prime}-1})}_{ij}\bm{P}_{th^{\prime}+1,j}$ . Then by (10), (13), (21) and (22), we have for $k\in[th^{\prime}+D_{\mathcal{G}}+1,(t+1)h^{\prime}]$ ,

$\displaystyle\bm{P}_{k+1,i}$	$\displaystyle=\left(\sum^{n}_{j=1}a_{ij}(\alpha{{\bm{P}}}^{-1}_{k,j}+\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j})\right)^{-1}$
	$\displaystyle\leq\sum^{n}_{j=1}a_{ij}(\alpha{{\bm{P}}}^{-1}_{k,j}+\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j})^{-1}$
	$\displaystyle\leq\sum^{n}_{j=1}a_{ij}\left(\alpha\left(\frac{1}{\alpha^{h^{\prime}-1}}\bm{Q}^{k,th^{\prime}}_{j}\right)^{-1}+\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}\right)^{-1}.$	(23)

By Lemma 2.1 and (23), it follows that

$\displaystyle\bm{P}_{k+1,i}$	$\displaystyle\leq\frac{1}{\alpha^{h^{\prime}}}\sum^{n}_{j=1}a_{ij}\Bigg{(}\bm{Q}^{k,th^{\prime}}_{j}-\frac{\bm{Q}^{k,th^{\prime}}_{j}\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}\bm{Q}^{k,th^{\prime}}_{j}}{\alpha^{h^{\prime}}+\bm{\varphi}^{T}_{k,j}\bm{Q}^{k,th^{\prime}}_{j}\bm{\varphi}_{k,j}}\Bigg{)}$
	$\displaystyle=\frac{1}{\alpha^{h^{\prime}}}\sum^{n}_{j=1}a^{({k-th^{\prime}})}_{ij}\bm{P}_{th^{\prime}+1,j}$
	$\displaystyle~{}~{}~{}~{}-\frac{1}{\alpha^{h^{\prime}}}\sum^{n}_{j=1}a_{ij}\frac{\bm{Q}^{k,th^{\prime}}_{j}\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}\bm{Q}^{k,th^{\prime}}_{j}}{\alpha^{h^{\prime}}+\bm{\varphi}^{T}_{k,j}\bm{Q}^{k,th^{\prime}}_{j}\bm{\varphi}_{k,j}}$
	$\displaystyle\leq\frac{1}{\alpha^{h^{\prime}}}\sum^{n}_{j=1}a^{({k-th^{\prime}})}_{ij}\bm{P}_{th^{\prime}+1,j}$
	$\displaystyle~{}~{}~{}~{}-\frac{1}{\alpha^{h^{\prime}}}\sum^{n}_{j=1}\frac{a_{ij}\bm{Q}^{k,th^{\prime}}_{j}\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}\bm{Q}^{k,th^{\prime}}_{j}}{\alpha^{h^{\prime}}+\lambda_{\max}(\bm{Q}^{k,th^{\prime}}_{j})(1+\\|\bm{\varphi}_{k,j}\\|^{2})}.$	(24)

Then by (24), we have

		$\displaystyle tr(\bm{P}_{k+1})=tr\Bigg{(}\sum^{n}_{i=1}\bm{P}_{k+1,i}\Bigg{)}$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\alpha^{h^{\prime}}}tr\Bigg{(}\sum^{n}_{i=1}\sum^{n}_{j=1}a^{({k-th^{\prime}})}_{ij}\bm{P}_{th^{\prime}+1,j}\Bigg{)}$
		$\displaystyle-\frac{1}{\alpha^{h^{\prime}}}tr\Bigg{(}\sum^{n}_{i=1}\sum^{n}_{j=1}a_{ij}\frac{\bm{Q}^{k,th^{\prime}}_{j}\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}\bm{Q}^{k,th^{\prime}}_{j}}{\alpha^{h^{\prime}}+\lambda_{\max}(\bm{Q}^{k,th^{\prime}}_{j})(1+\\|\bm{\varphi}_{k,j}\\|^{2})}\Bigg{)}$
	$\displaystyle=$	$\displaystyle\frac{1}{\alpha^{h^{\prime}}}\Bigg{(}tr(\bm{P}_{th^{\prime}+1})-\sum^{n}_{j=1}\frac{tr\left(\bm{Q}^{k,th^{\prime}}_{j}\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}\bm{Q}^{k,th^{\prime}}_{j}\right)}{\alpha^{h^{\prime}}+\lambda_{\max}(\bm{Q}^{k,th^{\prime}}_{j})(1+\\|\bm{\varphi}_{k,j}\\|^{2})}\Bigg{)}.$

Hence combining this with the inequality $\sum^{n}_{j=1}\frac{a_{j}}{b_{j}}\geq\frac{\sum^{n}_{j=1}a_{j}}{\sum^{n}_{j=1}b_{j}}$ where $a_{j}\geq 0$ and $b_{j}\geq 0$ , we obtain that

		$\displaystyle tr(\bm{P}_{k+1})$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\alpha^{h^{\prime}}}\left(tr(\bm{P}_{th^{\prime}+1})-\frac{tr\left(\sum^{n}_{j=1}\left(\bm{Q}^{k,th^{\prime}}_{j}\right)^{2}\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{(1+\\|\bm{\varphi}_{k,j}\\|^{2})}\right)}{\sum^{n}_{j=1}\left(\alpha^{h^{\prime}}+\lambda_{\max}\left(\bm{Q}^{k,th^{\prime}}_{j}\right)\right)}\right).$		(25)

By Remark 4.2, we know that $a^{(k)}_{ij}\geq a_{\min}$ holds for all $k\geq D_{\mathcal{G}}$ . Thus, by (25), we have for $k\in[th^{\prime}+D_{\mathcal{G}}+1,(t+1)h^{\prime}]$

	$\displaystyle tr(\bm{P}_{k+1})$	$\displaystyle\leq\frac{1}{\alpha^{h^{\prime}}}\Bigg{(}tr(\bm{P}_{th^{\prime}+1})$
		$\displaystyle-\frac{a^{2}_{\min}tr\left(\sum^{n}_{j=1}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)^{2}\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{(1+\\|\bm{\varphi}_{k,j}\\|^{2})}\right)}{n\left(\alpha^{h^{\prime}}+\lambda_{\max}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)\right)}\Bigg{)}.$		(26)

Summing up both sides of (26) from $th^{\prime}+D_{\mathcal{G}}+1$ to $(t+1)h^{\prime}$ , by the definition of $\beta_{t+1}$ , we have

	$\displaystyle T_{t+1}$	$\displaystyle=\sum^{(t+1)h^{\prime}}_{k=th^{\prime}+D_{\mathcal{G}}+1}tr(\bm{P}_{k+1})$
		$\displaystyle\leq\frac{1}{\alpha^{h^{\prime}}}(1-\beta_{t+1})(h^{\prime}-D_{\mathcal{G}})tr(\bm{P}_{th^{\prime}+1}).$

This completes the proof of the lemma.

Before giving the boundness of the random matrix $\bm{P}_{t}$ , we first introduce two lemmas in [34].

Lemma 4.2.

[34] Let $\{1-\xi_{t}\}\in S^{0}(\lambda)$ , and $0<\xi_{t}\leq\xi^{*}<1$ , where $\xi^{*}$ is a positive constant. Then for any $\varepsilon\in(0,1)$ , $\{1-\varepsilon\xi_{t}\}\in S^{0}(\lambda^{(1-\xi^{*})\varepsilon})$ .

Lemma 4.3.

[34] Let $\{x_{t},\mathscr{F}_{t}\}$ be an adapted process, and

\displaystyle x_{t+1}\leq\xi_{t+1}x_{t}+\eta_{t+1},~{}~{}~{}~{}t\geq 0,~{}~{}\mathbb{E}x^{2}_{0}<\infty,

where $\{\xi_{t},\mathscr{F}_{t}\}$ and $\{\eta_{t},\mathscr{F}_{t}\}$ are two adapted nonnegative process with properties:

	$\displaystyle\xi_{t}\geq\varepsilon_{0}>0,~{}~{}\forall t,$
	$\displaystyle\mathbb{E}(\eta^{2}_{t+1}\|\mathscr{F}_{t})\leq N<\infty,~{}~{}\forall t,$
	$\displaystyle\left\\|\prod^{t}_{k=j}\mathbb{E}(\xi^{4}_{k+1}\|\mathscr{F}_{k})\right\\|\leq M\eta^{t-j+1},~{}~{}\forall t\geq j,~{}~{}\forall j,$

where $\varepsilon_{0},M,N$ and $\eta\in(0,1)$ are constants. Then we have

		$\displaystyle(i)$	$\displaystyle~{}~{}\left\\|\prod^{t}_{k=j}\xi_{k}\right\\|_{L_{2}}\leq M^{\frac{1}{4}}\eta^{\frac{1}{4}(t-j+1)},~{}~{}~{}~{}\forall t\geq j,~{}~{}\forall j;$
		$\displaystyle(ii)$	$\displaystyle~{}~{}\sup_{t}\mathbb{E}(\\|x_{t}\\|)<\infty.$

The following lemma proves the boundedness of the random matrix sequence $\{\bm{P}_{t}\}$ .

Lemma 4.4.

For $\{\bm{P}_{t}\}$ generated by (12) and (13), under Assumptions 4.1-4.2, we have for any $p\geq 1$ , $\bm{P}_{t}$ is $L_{p}$ stable, i.e.,

\displaystyle\sup_{t\geq 0}\mathbb{E}(\|\bm{P}_{t}\|^{p})<\infty

provided that $\lambda^{\frac{a^{2}_{\min}}{32pmh(4h+D_{\mathcal{G}}-1)}}<\alpha<1$ , where $\lambda$ and $h$ are given by Assumption 4.2, and $m$ is the dimension of $\bm{\varphi}_{t,i}$ .

Proof 4.5.

For any $t\geq 0$ , there exists an integer $z_{t}=\lfloor\frac{th^{\prime}+D_{\mathcal{G}}}{h}\rfloor+1$ such that

\displaystyle(z_{t}-1)h\leq(th^{\prime}+D_{\mathcal{G}}+1)\leq z_{t}h+1.

(27)

By the definition of $\beta_{t+1}$ in Lemma 4.1, it is clear that

	$\displaystyle\beta_{t+1}$
$\displaystyle\geq$	$\displaystyle\frac{a^{2}_{\min}tr\Big{(}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)^{2}\sum^{(z_{t}+1)h}_{k=z_{t}h+1}\sum^{n}_{j=1}\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{(1+\\|\bm{\varphi}_{k,j}\\|^{2})}\Big{)}}{n(h^{\prime}-D_{\mathcal{G}})\left(\alpha^{h^{\prime}}+\lambda_{\max}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)\right)tr(\bm{P}_{th^{\prime}+1})}$
$\displaystyle\triangleq$	$\displaystyle b_{t+1}.$	(28)

Hence by Lemma 4.1 and (28), we obtain

\displaystyle T_{t+1}\leq\frac{1}{\alpha^{h^{\prime}}}(1-b_{t+1})(h^{\prime}-D_{\mathcal{G}})tr(\bm{P}_{th^{\prime}+1}).

(29)

By the inequality $\bm{P}_{k,i}\leq\frac{1}{\alpha}\sum^{n}_{j=1}a_{ij}{\bm{P}_{k-1,j}}$ used in (22) it follows that

			$\displaystyle(h^{\prime}-D_{\mathcal{G}})tr(\bm{P}_{th^{\prime}+1})=\sum^{th^{\prime}}_{k=(t-1)h^{\prime}+D_{\mathcal{G}}+1}tr(\bm{P}_{th^{\prime}+1})$
		$\displaystyle=$	$\displaystyle\sum^{th^{\prime}}_{k=(t-1)h^{\prime}+D_{\mathcal{G}}+1}\sum^{n}_{i=1}tr(\bm{P}_{th^{\prime}+1,i})$
		$\displaystyle\leq$	$\displaystyle\sum^{th^{\prime}}_{k=(t-1)h^{\prime}+D_{\mathcal{G}}+1}\sum^{n}_{i=1}tr\left(\frac{1}{\alpha^{th^{\prime}-k}}\sum^{n}_{j=1}a^{(th^{\prime}-k)}_{ij}\bm{P}_{k+1,j}\right)$
		$\displaystyle\leq$	$\displaystyle\frac{1}{\alpha^{h^{\prime}-D_{\mathcal{G}}-1}}\sum^{th^{\prime}}_{k=(t-1)h^{\prime}+D_{\mathcal{G}}+1}tr(\bm{P}_{k+1})=\frac{1}{\alpha^{h^{\prime}-D_{\mathcal{G}}-1}}T_{t}.$

Hence by (29), we have

\displaystyle T_{t+1}\leq\frac{1}{\alpha^{2h^{\prime}-D_{\mathcal{G}}-1}}(1-b_{t+1})T_{t}.

(30)

For $p\geq 1$ , denote

\displaystyle c_{t+1}=\frac{1}{\alpha^{p(2h^{\prime}-D_{\mathcal{G}}-1})}\left(1-\frac{b_{t+1}}{2}\right)I_{\{tr(\bm{P}_{th^{\prime}+1})\geq 1\}}

(31)

where $I_{\{\cdot\}}$ denotes the indicator function, whose value is 1 if its argument (a formula) is true, and 0, otherwise. Then by (29) and (30), we have

$\displaystyle T^{p}_{t+1}$	$\displaystyle\leq$	$\displaystyle T^{p}_{t+1}\left(I_{\{tr(\bm{P}_{th^{\prime}+1})\geq 1\}}+I_{\{tr(\bm{P}_{th^{\prime}+1})\leq 1\}}\right)$	(32)
	$\displaystyle\leq$	$\displaystyle\frac{1}{\alpha^{p(2h^{\prime}-D_{\mathcal{G}}-1)}}(1-b_{z_{t}+1})^{p}T^{p}_{t}I_{\{tr(\bm{P}_{th^{\prime}+1})\geq 1\}}$
		$\displaystyle+T^{p}_{t+1}I_{\{tr(\bm{P}_{th^{\prime}+1})\leq 1\}}$
	$\displaystyle\leq$	$\displaystyle c_{t+1}T^{p}_{t}+\frac{1}{\alpha^{ph^{\prime}}}(h^{\prime}-D_{\mathcal{G}})^{p}.$

Denote

\bm{H}_{z_{t}}=\mathbb{E}\left(\sum^{(z_{t}+1)h}_{k=z_{t}h+1}\sum^{n}_{j=1}\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{1+\|\bm{\varphi}_{k,j}\|^{2}}\Bigg{|}\mathscr{F}_{z_{t}h}\right).

By the inequality

tr\left(\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)^{2}\right)\geq m^{-1}\left(tr\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)\right)^{2}

and $\bm{P}_{th^{\prime}+1,l}\in\mathscr{F}_{th^{\prime}}\subset\mathscr{F}_{z_{t}h}$ , from the definition of $b_{t+1}$ in (28), we can conclude the following inequality,

	$\displaystyle\mathbb{E}(b_{t+1}\|\mathscr{F}_{z_{t}h})$
$\displaystyle=$	$\displaystyle\frac{a^{2}_{\min}tr\left[\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)^{2}\bm{H}_{z_{t}}\right]}{n(h^{\prime}-D_{\mathcal{G}})\left(\alpha^{h^{\prime}}+\lambda_{\max}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)\right)tr(\bm{P}_{th^{\prime}+1})}$
$\displaystyle\geq$	$\displaystyle\frac{a^{2}_{\min}\left(tr(\bm{P}_{th^{\prime}+1})\right)^{2}\lambda_{\min}(\bm{H}_{z_{t}})}{mn(h^{\prime}-D_{\mathcal{G}})\left(\alpha^{h^{\prime}}+\lambda_{\max}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)\right)tr(\bm{P}_{th^{\prime}+1})}$
$\displaystyle\geq$	$\displaystyle\frac{a^{2}_{\min}\left(tr(\bm{P}_{th^{\prime}+1})\right)\lambda_{z_{t}}(1+h)}{m(h^{\prime}-D_{\mathcal{G}})\left(\alpha^{h^{\prime}}+\lambda_{\max}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)\right)}$
$\displaystyle\geq$	$\displaystyle\frac{a^{2}_{\min}\left(tr(\bm{P}_{th^{\prime}+1})\right)\lambda_{z_{t}}(1+h)}{m(h^{\prime}-D_{\mathcal{G}})\left(1+tr(\bm{P}_{th^{\prime}+1})\right)}$
$\displaystyle\geq$	$\displaystyle\frac{a^{2}_{\min}\lambda_{z_{t}}(1+h)}{2m(h^{\prime}-D_{\mathcal{G}})}~{}~{}~{}~{}{\rm{on}}~{}~{}\{tr(\bm{P}_{th^{\prime}+1})\geq 1\}.$	(33)

Hence by the definition of $c_{z_{t}+1}$ in (31),

		$\displaystyle\mathbb{E}(c_{t+1}\|\mathscr{F}_{z_{t}h})$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\alpha^{p(2h^{\prime}-D_{\mathcal{G}}-1)}}\left(1-\frac{a^{2}_{\min}\lambda_{z_{t}}(1+h)}{4m(h^{\prime}-D_{\mathcal{G}})}\right)I_{\{tr(\bm{P}_{th^{\prime}+1})\geq 1\}}.$		(34)

Denote

\displaystyle d_{t+1}=\begin{cases}c_{t+1},&tr(\bm{P}_{th^{\prime}+1})\geq 1;\\ \frac{1}{\alpha^{p(2h^{\prime}-D_{\mathcal{G}}-1)}}\left(1-\frac{a^{2}_{\min}\lambda_{z_{t}}(1+h)}{4m(h^{\prime}-D_{\mathcal{G}})}\right),&{\rm otherwise.}\end{cases}

Then by (32) and (34), we have

\displaystyle T^{p}_{t+1}\leq

\displaystyle d_{t+1}T^{p}_{t}+\frac{1}{\alpha^{ph^{\prime}}}(h^{\prime}-D_{\mathcal{G}})^{p}.

(35)

Since $\lambda_{z_{t}}\leq\frac{h}{1+h}$ and $b_{t+1}\leq\frac{a^{2}_{\min}h}{h^{\prime}-D_{\mathcal{G}}}$ , we know that $d_{t+1}\geq\varepsilon_{0}$ with $\varepsilon_{0}$ being a positive constant. Denote $\mathscr{B}_{t}\triangleq\mathscr{F}_{z_{t}h}$ , then by the definition of $z_{t}$ , it is clear that $z_{t+1}\geq z_{t}+2$ . Thus, we obtain that $d_{t+1}\in\mathscr{F}_{(z_{t}+1)h}\subset\mathscr{B}_{t+1}$ . Similar to the analysis of $(\ref{adap22})$ , we have

\displaystyle\mathbb{E}(c^{4}_{t+1}|\mathscr{B}_{t})\leq\frac{1}{\alpha^{4p(2h^{\prime}-D_{\mathcal{G}}-1)}}\left(1-\frac{a^{2}_{\min}\lambda_{z_{t}}(1+h)}{4m(h^{\prime}-D_{\mathcal{G}})}\right).

(36)

Hence by the definition of $d_{t+1}$ , it follows that

		$\displaystyle\Big{\\|}\prod^{t}_{k=j}\mathbb{E}(d^{4}_{k+1}\|\mathscr{B}_{k})\Big{\\|}_{L_{1}}$
	$\displaystyle\leq$	$\displaystyle\Big{\\|}\prod^{t}_{k=j}\left(\frac{1}{\alpha^{4p(2h^{\prime}-D_{\mathcal{G}}-1)}}\left(1-\frac{a^{2}_{\min}\lambda_{z_{k}}(1+h)}{8mh}\right)\right)\Big{\\|}_{L_{1}}.$		(37)

By Assumption 4.2 and the fact $\lambda_{z_{k}}\leq\frac{h}{1+h}$ , applying Lemma 4.2, we obtain $\{1-\frac{a^{2}_{\min}\lambda_{z_{k}}(1+h)}{8mh}\}\in S^{0}\Big{(}\lambda^{\frac{a^{2}_{\min}}{8mh}}\Big{)}$ . By (37), we see that there exists a positive constant $N$ such that

\displaystyle\Big{\|}\prod^{t}_{k=j}\mathbb{E}(d^{4}_{k+1}|\mathscr{B}_{k})\Big{\|}_{L_{1}}\leq N\lambda_{1}^{t-j+1},

where $\lambda_{1}=\frac{1}{\alpha^{4p(2h^{\prime}-D_{\mathcal{G}}-1)}}\lambda^{\frac{a^{2}_{\min}}{8mh}}\in(0,1)$ . Furthermore, by Lemma 4.3, we have $\sup_{t}\mathbb{E}(T^{p}_{t})<\infty$ , which implies that $\sup_{t\geq 0}\mathbb{E}(\|\bm{P}_{t}\|^{p})<\infty$ . This completes the proof.

We then establish the exponential stability of the homogeneous part of the error equation (17).

Theorem 4.6.

Consider the distributed FFLS algorithm in Algorithm 2. If the forgetting factor $\alpha$ satisfies $\lambda^{\frac{a^{2}_{\min}}{32pmh(4h+D_{\mathcal{G}}-1)}}<\alpha<1$ and for any $i\in\{1,\cdots,n\}$ , $\sup_{t}\|\bm{\varphi}_{t,i}\|_{L_{6p}}<\infty$ , then under Assumptions 4.1 and 4.2, for any $p\geq 1$ , $\{\alpha\bm{P}_{t+1}\mathscr{A}\bm{P}^{-1}_{t}\}$ is $L_{p}$ -exponentially stable.

Proof 4.7.

By (10) and (13), we have

\displaystyle\bm{P}^{-1}_{t+1,i}=\sum^{n}_{j=1}a_{ij}(\alpha\bm{P}^{-1}_{t,j}+\bm{\varphi}_{t,j}\bm{\varphi}^{T}_{t,j}).

Then we can obtain the following equation,

	$\displaystyle tr(\bm{P}^{-1}_{t+1})$	$\displaystyle=tr\left(\sum^{n}_{i=1}\bm{P}^{-1}_{t+1,i}\right)$
		$\displaystyle=tr\left(\sum^{n}_{j=1}(\alpha\bm{P}^{-1}_{t,j}+\bm{\varphi}_{t,j}\bm{\varphi}^{T}_{t,j})\right)$
		$\displaystyle=\alpha tr(\bm{P}^{-1}_{t})+\sum^{n}_{j=1}\\|\bm{\varphi}_{t,j}\\|^{2}.$

By Mikowski inequality, it follows that

	$\displaystyle\\|tr(\bm{P}^{-1}_{t+1})\\|_{L_{3p}}$	$\displaystyle\leq\alpha\\|tr(\bm{P}^{-1}_{t})\\|_{L_{3p}}+O\left(\sum^{n}_{j=1}\\|\bm{\varphi}_{t,j}\\|^{2}_{L_{6p}}\right)$
		$\displaystyle=\alpha^{t+1}\\|tr(\bm{P}^{-1}_{0})\\|_{L_{3p}}+O\left(\sum^{t}_{k=0}\alpha^{j}\right).$

Hence we have

\displaystyle\sup_{t}\|\bm{P}^{-1}_{t+1}\|_{L_{3p}}<\infty.

(38)

By Lemma 4.4, we derive that

			$\displaystyle\Big{\\|}\prod^{t}_{k=j}\alpha\bm{P}_{k+1}\mathscr{A}\bm{P}^{-1}_{k}\Big{\\|}_{L_{p}}$
		$\displaystyle=$	$\displaystyle\mathbb{E}\left(\Big{\\|}\prod^{t}_{k=j}\alpha\bm{P}_{k+1}\mathscr{A}\bm{P}^{-1}_{k}\Big{\\|}^{p}\right)^{\frac{1}{p}}$
		$\displaystyle=$	$\displaystyle\mathbb{E}\left(\\|\alpha^{t-j+1}\bm{P}_{t+1}\mathscr{A}^{t-j+1}\bm{P}^{-1}_{j}\\|^{p}\right)^{\frac{1}{p}}$
		$\displaystyle\leq$	$\displaystyle\alpha^{t-j+1}\\|\bm{P}_{t+1}\\|_{L_{2p}}\\|\bm{P}^{-1}_{j}\\|_{L_{2p}}=O(\alpha^{t-j+1}).$

This completes the proof of the theorem.

Based on Theorem 4.6, we further establish the tracking error bound of Algorithm 2 under some conditions on the noises and parameter variation.

Theorem 4.8.

Consider the model (1) and the error equation (17). Under the conditions of Theorem 4.6, if for some $p\geq 1$ , $\sigma_{3p}\triangleq\sup_{t}(\|\bm{W}_{t}\|_{L_{3p}}+\|\Delta\bm{\Theta}_{t}\|_{L_{3p}})<\infty$ , then there exists a constant $c$ such that

\displaystyle\limsup_{t\rightarrow\infty}\|\bm{\widetilde{\Theta}}_{t}\|_{L_{p}}\leq c\sigma_{3p}.

Proof 4.9.

For convenience of analysis, let the state transition matrix ${\bm{\Psi}}(t,k)$ be recursively defined by

\displaystyle{\bm{\Psi}}(t+1,k)=\alpha\bm{P}_{t+1}\mathscr{A}\bm{P}^{-1}_{t}{\bm{\Psi}}(t,k),~{}{\bm{\Psi}}(k,k)=\bm{I}_{mn}.

(39)

It is clear that ${\bm{\Psi}}(t+1,k)=\alpha^{t-k+1}\bm{P}_{t+1}\mathscr{A}^{t-k+1}\bm{P}^{-1}_{k}$ . From the definition of $\bm{L}_{t}$ and (10), we have $\bm{\bar{P}}^{-1}_{t+1}\bm{L}_{t}=\bm{\Phi}_{t}$ . Then by (17), we have

\displaystyle\bm{\widetilde{\Theta}}_{t+1}=\alpha\bm{P}_{t+1}\mathscr{A}\bm{P}^{-1}_{t}\bm{\widetilde{\Theta}}_{t}-\bm{P}_{t+1}\mathscr{A}(\bm{\Phi}_{t}\bm{W}_{t+1}+\bm{\bar{P}}^{-1}_{t+1}\Delta\bm{\Theta}_{t}).

Hence by Hölder inequality, we have

		$\displaystyle\\|\bm{\widetilde{\Theta}}_{t+1}\\|_{L_{p}}$
	$\displaystyle=$	$\displaystyle\Big{\\|}{\bm{\Psi}}(t+1,0)\bm{\widetilde{\Theta}}_{0}$
		$\displaystyle-\sum^{t}_{k=0}{\bm{\Psi}}(t+1,k+1)(\bm{P}_{k+1}\mathscr{A}(\bm{\Phi}_{k}\bm{W}_{k+1}+\bm{\bar{P}}^{-1}_{k+1}\Delta\bm{\Theta}_{k}))\Big{\\|}_{L_{p}}$
	$\displaystyle\leq$	$\displaystyle\\|\alpha^{t+1}\bm{P}_{t+1}\mathscr{A}^{t+1}\bm{P}^{-1}_{0}\bm{\widetilde{\Theta}}_{0}\\|_{L_{p}}$
		$\displaystyle+\Big{\\|}\sum^{t}_{k=0}{\alpha^{t-k}\bm{P}_{t+1}\mathscr{A}^{t-k+1}(\bm{\Phi}_{k}\bm{W}_{k+1}+\bm{\bar{P}}^{-1}_{k+1}\Delta\bm{\Theta}_{k})}\Big{\\|}_{L_{p}}$
	$\displaystyle\leq$	$\displaystyle O(\alpha^{t+1}\\|\bm{P}_{t+1}\\|_{L_{2p}})$
		$\displaystyle+\sum^{t}_{k=0}\alpha^{t-k}\\|\bm{P}_{t+1}\\|_{L_{3p}}\\|\bm{\Phi}_{k}\\|_{L_{3p}}\\|\bm{W}_{k+1}\\|_{L_{3p}}$
		$\displaystyle+\sum^{t}_{k=0}\alpha^{t-k}\\|\bm{P}_{t+1}\\|_{L_{3p}}\\|\bm{\bar{P}}^{-1}_{k+1}\\|_{L_{3p}}\\|\Delta\bm{\Theta}_{k}\\|_{L_{3p}}.$

Hence by Lemma 4.4 and (38), it follows that

\displaystyle\limsup_{t\rightarrow\infty}\|\bm{\widetilde{\Theta}}_{t}\|_{L_{p}}\leq c\sigma_{3p},

where $c$ is a positive constant depending on $\alpha$ and the upper bounds of $\{\bm{P}_{t}\}$ , $\{\bm{\Phi}_{t}\}$ and $\{\bm{P}^{-1}_{t}\}$ . This completes the proof.

Remark 4.10.

From the proof of Theorems 4.6 and 4.8, we can see that if the forgetting factor $\alpha$ is selected to be uncoordinated for different sensors, i.e., we replace $\alpha$ with $\alpha_{i}$ in Algorithm 2, the results of Theorems 4.6 and 4.8 also hold only if the condition $\lambda^{\frac{a^{2}_{\min}}{32pmh(4h+D_{\mathcal{G}}-1)}}<\alpha$ is replaced with $\lambda^{\frac{a^{2}_{\min}}{32pmh(4h+D_{\mathcal{G}}-1)}}<\alpha_{\min}\triangleq\min\{\alpha_{1},...,\alpha_{n}\}$ .

5 Stability of distributed FFLS algorithm over unreliable directed networks

In Section IV, we have studied the stability of the distributed FFLS algorithm under the fixed undirected graph. However, in practical engineering applications, the information exchange between sensors might not be bidirectional. Moreover, it is often interfered by many uncertain random factors due to the distance, obstacle and interference, which will lead to the interruption or reconstruction of communication links. Thus, in this section, we model the communication links between sensors as time-varying random switching directed communication topologies $\mathcal{G}_{r(t)}=(\mathcal{V},\mathcal{E}_{r(t)},\mathcal{A}_{r(t)})$ . The switching process is governed by a homogeneous Markov chain $r(t)$ whose states belong to a finite set $\mathbb{S}=\{1,2,...,s\}$ , and the corresponding set of communication topology graph is denoted by $\mathcal{C}=\{\mathcal{G}_{1},...,\mathcal{G}_{s}\}$ . The communication graph is switched just at the instant that the value of $r(t)$ is changed. Thus, the corresponding adjacency matrix and the neighbor set of the sensor $i$ are denoted as $\mathcal{A}_{r(t)}=[a_{ij,r(t)}]_{1\leq i,j\leq n}$ and $\mathcal{N}_{i,r(t)}$ , respectively. For the distributed FFLS algorithm over the Markovian switching directed topologies, we just modify Step 2 in Algorithm 2 as follows:

	$\displaystyle\bm{P}^{-1}_{t+1,i}$	$\displaystyle=\sum_{j\in\mathcal{N}_{i,r(t)}}a_{ji,r(t)}\bm{\bar{P}}^{-1}_{t+1,j},$		(40)
	$\displaystyle\bm{\hat{\theta}}_{t+1,i}$	$\displaystyle=\bm{P}_{t+1,i}\sum_{j\in\mathcal{N}_{i,r(t)}}a_{ji,r(t)}\bm{\bar{P}}^{-1}_{t+1,j}\bm{\bar{\theta}}_{t+1,j}.$		(41)

To analyze the stability of algorithm (11), (12), (40), (41), we introduce the following assumptions:

Assumption 5.1

All possible digraphs $\{\mathcal{G}_{1},...,\mathcal{G}_{s}\}$ are balanced and the union of all those digraphs is strongly connected.

Assumption 5.2

The Markov chain $\{r_{t},t\geq 0\}$ is irreducible and aperiodic with the transition probability matrix $\bm{P}=[p_{ij}]_{1\leq i,j\leq s}$ where $p_{ij}=\Pr(r_{t+1}=j|r_{t}=i)$ with $\Pr(\cdot|\cdot)$ being the conditional probability.

According to Markov chain theory (c.f., [37]), a discrete-time homogeneous Markov chain with finite states is ergodic if and only if it is irreducible and aperiodic. Hence Assumption 5.2 means that the $l$ -step transition matrix $\bm{P}^{l}$ has a limit with identical rows.

In the following, we will analyze the properties of the strongly connected directed graph. For convenience, we denote the $i$ -th row, $j$ -th column element of the matrix $\bm{A}$ as $\bm{A}(i,j)$ .

Lemma 5.1.

Let $\mathcal{G}_{k}=(\mathcal{V},\mathcal{E}_{k},\mathcal{A}_{k}),(1\leq k\leq n)$ be $n$ strongly connected graph with $\mathcal{V}=\{1,2,\cdots,n\}$ . Then $\mathcal{A}_{1}\mathcal{A}_{2}\cdots\mathcal{A}_{n}$ is a positive matrix, i.e., every element of the matrix $\mathcal{A}_{1}\mathcal{A}_{2}\cdots\mathcal{A}_{n}$ is positive.

Proof 5.2.

We just prove that the graph $\mathcal{G}^{n}_{1}$ corresponding to the matrix $\mathcal{A}_{1}\mathcal{A}_{2}\cdots\mathcal{A}_{n}$ is a complete graph. Denote the child node set of the node $i$ in graph $\mathcal{G}_{k}$ as $\mathcal{O}_{k}(i)$ . The corresponding child node set of the node $i$ in graph $\mathcal{G}^{n}_{1}$ is denoted by $\mathcal{O}^{n}_{1}(i)$ . For any $i\in\mathcal{V}$ and $j\in\mathcal{O}_{1}(i)$ , we have

	$\displaystyle(\mathcal{A}_{1}\mathcal{A}_{2})(i,j)$	$\displaystyle=\sum^{n}_{k=1}\mathcal{A}_{1}(i,k)\mathcal{A}_{2}(k,j)$
		$\displaystyle\geq\mathcal{A}_{1}(i,j)\mathcal{A}_{2}(j,j)>0.$		(42)

Since $\mathcal{G}_{2}$ is strongly connected, if $\mathcal{O}_{1}(i)\neq\mathcal{V}$ , then there exists two nodes $j_{1}\in\mathcal{V}\backslash\mathcal{O}_{1}(i)$ and $j_{2}\in\mathcal{O}_{1}(i)$ such that $(j_{2},j_{1})\in\mathcal{E}_{2}$ , hence

	$\displaystyle(\mathcal{A}_{1}\mathcal{A}_{2})(i,j_{1})$	$\displaystyle=\sum^{n}_{k=1}\mathcal{A}_{1}(i,k)\mathcal{A}_{2}(k,j_{1})$
		$\displaystyle\geq\mathcal{A}_{1}(i,j_{2})\mathcal{A}_{2}(j_{2},j_{1})>0.$		(43)

By (42) and (43), it is clear that $\{j_{1}\}\cup\mathcal{O}_{1}(i)\subset\mathcal{O}^{2}_{1}(i)$ . Hence for any $j\in\{j_{1}\}\cup\mathcal{O}_{1}(i)$ , we have

	$\displaystyle(\mathcal{A}_{1}\mathcal{A}_{2}\mathcal{A}_{3})(i,j)$	$\displaystyle=\sum^{n}_{k=1}(\mathcal{A}_{1}\mathcal{A}_{2})(i,k)\mathcal{A}_{3}(k,j)$
		$\displaystyle\geq(\mathcal{A}_{1}\mathcal{A}_{2})(i,j)\mathcal{A}_{3}(j,j)>0.$		(44)

Since $\mathcal{G}_{3}$ is strongly connected, if $\{j_{1}\}\cup\mathcal{O}_{1}(i)\neq\mathcal{V}$ , then there exists two nodes $j_{2}\in\mathcal{V}\backslash(\{j_{1}\}\cup\mathcal{O}_{1}(i))$ and $j_{3}\in\{j_{1}\}\cup\mathcal{O}_{1}(i)$ such that $(j_{3},j_{2})\in\mathcal{E}_{3}$ , hence

	$\displaystyle(\mathcal{A}_{1}\mathcal{A}_{2}\mathcal{A}_{3})(i,j_{2})$	$\displaystyle=\sum^{n}_{k=1}(\mathcal{A}_{1}\mathcal{A}_{2})(i,k)\mathcal{A}_{3}(k,j_{2})$
		$\displaystyle\geq(\mathcal{A}_{1}\mathcal{A}_{2})(i,j_{3})\mathcal{A}_{3}(j_{3},j_{2})>0.$		(45)

By (44) and (45), we can see that $\{j_{2}\}\cup\{j_{1}\}\cup\mathcal{O}_{1}(i)\subset\mathcal{O}^{3}_{1}(i)$ . We repeat the above process until $\mathcal{O}^{n}_{1}(i)=\mathcal{V}$ . The lemma can be proved by the arbitrariness of the node $i$ .

Compared with the undirected graph case, the key difference is that the adjacency matrix in this section is an asymmetric and random matrix. Hence we need to deal with the coupled relationship between random adjacency matrices and random regression vectors. By using the above lemma and Markov chain theory, we establish the stability of the algorithm (11), (12), (40), (41) under Markovian switching topology.

Theorem 5.3.

Under Assumptions 4.2, 5.1 and 5.2, if for any $i\in\{1,\cdots,n\}$ , $\sup_{t}\|\bm{\varphi}_{t,i}\|_{L_{6p}}<\infty$ and $\sigma_{3p}\triangleq\sup_{t}(\|\bm{W}_{t}\|_{L_{3p}}+\|\Delta\bm{\Theta}_{t}\|_{L_{3p}})<\infty$ hold, then there exists a constant $c^{\prime}$ such that

\displaystyle\limsup_{t\rightarrow\infty}\|\bm{\widetilde{\Theta}}_{t}\|_{L_{p}}\leq c^{\prime}\sigma_{3p}.

Proof 5.4.

Following the proof line of Theorem 4.8 in Subsection 4.3, it can be seen that we need to prove equation (33) holds under the assumptions of the theorem. By Assumption 5.2, there exists a positive integer $q_{0}$ such that

\displaystyle\Pr(r(t+q_{0})=a|r(t)=b)>0

(46)

holds for all $t$ and all states $a,b\in\mathbb{S}$ . Denote $\Pi^{t}_{k}=\mathscr{A}_{r(t)}\mathscr{A}_{r(t-1)}\cdots\mathscr{A}_{r(k)}$ . Then the $i$ -th row, $j$ -th column element of the matrix $\Pi^{t}_{k}$ is denoted by $\Pi^{t}_{k}(i,j)$ . Following Lemmas 4.1 and 4.4, we may abuse some notations $h^{\prime}=2h+nsq_{0}$ , $z_{t}=\lfloor\frac{th^{\prime}+nsq_{0}}{h}\rfloor+1$ and

	$\displaystyle b_{t+1}=$
	$\displaystyle\frac{tr\Big{(}\sum^{(z_{t}+1)h}_{k=z_{t}h+1}\sum^{n}_{j=1}\left(\sum^{n}_{l=1}\Pi^{k-1}_{th^{\prime}+1}(j,l)\bm{P}_{th^{\prime}+1,l}\right)^{2}\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{1+\\|\bm{\varphi}_{k,j}\\|^{2}}\Big{)}}{n(h^{\prime}-nsq_{0})\left(\alpha^{h^{\prime}}+\lambda_{\max}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)\right)tr(\bm{P}_{th^{\prime}+1})}.$

In the following we analyze the term $\mathbb{E}(b_{t+1}|\mathscr{F}_{z_{t}h})$ . By (46), we can see that there exists a positive constant $p_{0}$ such that for all $t$ ,

	$\displaystyle\Pr\Big{(}r(t+nsq_{0})=s,r(t+(ns-1)q_{0})=s-1,\cdots,$
	$\displaystyle~{}~{}~{}~{}~{}r(t+((n-1)s+1)q_{0})=1;$
	$\displaystyle~{}~{}~{}~{}~{}\cdots r(t+2sq_{0})=s,r(t+(2s-1)q_{0})=s-1,\cdots,$
	$\displaystyle~{}~{}~{}~{}~{}r(t+(s+1)q_{0})=1;$
	$\displaystyle~{}~{}~{}~{}~{}r(t+sq_{0})=s,r(t+(s-1)q_{0})=s-1\cdots,$
	$\displaystyle~{}~{}~{}~{}~{}r(t+q_{0})=1\Big{\|}\mathscr{F}\Big{)}$
$\displaystyle=$	$\displaystyle\sum_{a_{0}}\Pr\Big{(}r(t+nsq_{0})=s\Big{\|}r(t+(ns-1)q_{0})=s-1\Big{)}\cdots$
	$\displaystyle\Pr\Big{(}r(t+((n-1)s+1)q_{0})=1\Big{\|}r(t+((n-1)s)q_{0})=s\Big{)}$
	$\displaystyle~{}~{}~{}~{}~{}\cdots\Pr\Big{(}r(t+q_{0})=1\|r(t)=a_{0}\Big{)}\Pr\Big{(}r(t)=a_{0}\Big{\|}\mathscr{F}\Big{)}$
$\displaystyle\geq$	$\displaystyle p_{0}\sum_{a_{0}}\Pr\Big{(}r(t)=a_{0}\Big{\|}\mathscr{F}\Big{)}=p_{0}>0$	(47)

with $\mathscr{F}$ being a $\sigma$ -algebra. By (47), we know that the Markov chain $\{r_{t},t\geq 0\}$ can visit all states in $\mathbb{S}$ with $n$ times in a positive probability during the time interval $[t+q_{0},t+nsq_{0}]$ . Hence for $k\in[z_{t}h+1,(z_{t}+1)h)]$ , by Assumption 5.1 and Lemma 5.1, there exists a positive constant $\sigma>0$ such that the following inequality holds,

		$\displaystyle\mathbb{E}\left(\left(\sum^{n}_{l=1}\Pi^{k-1}_{th^{\prime}+1}(j,l)\bm{P}_{th^{\prime}+1,l}\right)^{2}\Bigg{\|}\mathscr{F}_{k}\right)$
	$\displaystyle=$	$\displaystyle\mathbb{E}\Big{(}\Big{(}\sum_{u\in\mathcal{V}}\sum_{v\in\mathcal{V}}\Pi^{k-1}_{th^{\prime}+1}(j,u)\Pi^{k-1}_{th^{\prime}+1}(j,v)\bm{P}_{th^{\prime}+1,u}\bm{P}_{th^{\prime}+1,v}\Big{)}\Big{\|}\mathscr{F}_{k}\Big{)}$
	$\displaystyle=$	$\displaystyle\sum_{u\in\mathcal{V}}\sum_{v\in\mathcal{V}}\Big{(}\mathbb{E}\Big{(}\Pi^{k-1}_{th^{\prime}+1}(j,u)\Pi^{k-1}_{th^{\prime}+1}(j,v)\Big{)}\Big{\|}\mathscr{F}_{k}\Big{)}\bm{P}_{th^{\prime}+1,u}\bm{P}_{th^{\prime}+1,v}$
	$\displaystyle\geq$	$\displaystyle\sigma\sum_{u\in\mathcal{V}}\sum_{v\in\mathcal{V}}\bm{P}_{th^{\prime}+1,u}\bm{P}_{th^{\prime}+1,v}=\sigma\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)^{2}.$

By $\mathscr{F}_{z_{t}h}\subset\mathscr{F}_{k}$ and $\bm{\varphi}_{k,j}\in\mathscr{F}_{k}$ , we conclude that

	$\displaystyle\mathbb{E}\left(\left(\sum^{n}_{l=1}\Pi^{k-1}_{th^{\prime}+1}(j,l)\bm{P}_{th^{\prime}+1,l}\right)^{2}\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{1+\\|\bm{\varphi}_{k,j}\\|^{2}}\Bigg{\|}\mathscr{F}_{z_{t}h}\right)$
$\displaystyle=$	$\displaystyle\mathbb{E}\Bigg{(}\Bigg{(}\mathbb{E}\left(\sum^{n}_{l=1}\Pi^{k-1}_{th^{\prime}+1}(j,l)\bm{P}_{th^{\prime}+1,l}\right)^{2}\Bigg{\|}\mathscr{F}_{k}\Bigg{)}$
	$\displaystyle\cdot\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{1+\\|\bm{\varphi}_{k,j}\\|^{2}}\Bigg{\|}\mathscr{F}_{z_{t}h}\Bigg{)}$
$\displaystyle\geq$	$\displaystyle\sigma\mathbb{E}\Bigg{(}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)^{2}\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{1+\\|\bm{\varphi}_{k,j}\\|^{2}}\Bigg{\|}\mathscr{F}_{z_{t}h}\Bigg{)}.$	(48)

From the above analysis, we can obtain the following inequality

	$\displaystyle\mathbb{E}(b_{t+1}\|\mathscr{F}_{z_{t}h})\geq$
	$\displaystyle\frac{tr\Big{(}\sum^{(z_{t}+1)h}_{k=z_{t}h+1}\sum^{n}_{j=1}\sigma\mathbb{E}\Big{(}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)^{2}\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{1+\\|\bm{\varphi}_{k,j}\\|^{2}}\Big{\|}\mathscr{F}_{z_{t}h}\Big{)}\Big{)}}{n(h^{\prime}-nsq_{0})\Big{(}\alpha^{h^{\prime}}+\lambda_{\max}\Big{(}\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\Big{)}\Big{)}tr(\bm{P}_{th^{\prime}+1})}$
	$\displaystyle=\frac{\sigma tr\left[\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)^{2}\bm{H}_{z_{t}}\right]}{n(h^{\prime}-nsq_{0})\left(\alpha^{h^{\prime}}+\lambda_{\max}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)\right)tr(\bm{P}_{th^{\prime}+1})}.$

The rest part of the proof can be obtained by following the proofs of Lemma 4.4, Theorems 4.6 and 4.8 just replacing the notation $D_{\mathcal{G}}$ with $nsq_{0}$ . This completes the proof of Theorem 5.3.

Remark 5.5.

From Theorem 5.3, (also Theorems 4.6 and 4.8), we see that our results are obtained without using the independency or stationarity assumptions on the regression signals, which makes it possible to apply the distributed FFLS algorithm to practical feedback systems.

6 Concluding Remarks

This paper proposed a distributed FFLS algorithm to collaboratively track an unknown time-varying parameter by minimizing a local loss function with a forgetting factor. By introducing a spatio-temporal cooperative excitation condition, we established the stability of the proposed distributed FFLS algorithm for fixed undirected graph case. Then, the theoretical results were generalized to the case of Markovian switching directed graphs. The cooperative excitation condition revealed that the sensors can collaboratively accomplish the tracking task even though any individual sensor cannot. We note that our theoretical results are established without using independence or stationarity conditions of the regression vectors. Thus, a relevant research topic is how to combine the distributed adaptive estimation with the distributed control. How to establish the stability analysis of the distributed algorithms for more complex cases such as considering quantization effect or time-delay in communication channels is another interesting research topic.

References

[1] W. Ren and R. Beard, “Consensus seeking in multiagent systems under dynamically changing interaction topologies,” IEEE Transactions on Automatic Control, vol. 50, no. 5, pp. 655–661, 2005.
[2] Y. Wang, L. Cheng, W. Ren, Z.-G. Hou, and M. Tan, “Seeking consensus in networks of linear agents: Communication noises and markovian switching topologies,” IEEE Transactions on Automatic Control, vol. 60, no. 5, pp. 1374–1379, 2015.
[3] K. Lu, H. Xu, and Y. Zheng, “Distributed resource allocation via multi-agent systems under time-varying networks,” Automatica, vol. 136, p. 110059, 2022.
[4] B. Wang, Q. Fei, and Q. Wu, “Distributed time-varying resource allocation optimization based on finite-time consensus approach,” IEEE Control Systems Letters, vol. 5, no. 2, pp. 599–604, 2021.
[5] Z. Lin, L. Wang, Z. Han, and M. Fu, “Distributed formation control of multi-agent systems using complex laplacian,” IEEE Transactions on Automatic Control, vol. 59, no. 7, pp. 1765–1777, 2014.
[6] Y. Zhi, L. Liu, B. Guan, B. Wang, Z. Cheng, and H. Fan, “Distributed robust adaptive formation control of fixed-wing uavs with unknown uncertainties and disturbances,” Aerospace Science and Technology, vol. 126, p. 107600, 2022.
[7] G. Battistelli and L. Chisci, “Kullback-Leibler average, consensus on probability densities, and distributed state estimation with guaranteed stability,” Automatica, vol. 50, no. 3, pp. 707–718, 2014.
[8] W. Chen, C. Wen, S. Hua, and C. Sun, “Distributed cooperative adaptive identification and control for a group of continuous-time systems with a cooperative pe condition via consensus,” IEEE Transactions on Automatic Control, vol. 59, no. 1, pp. 91–106, 2014.
[9] M. U. Javed, J. I. Poveda, and X. Chen, “Excitation conditions for uniform exponential stability of the cooperative gradient algorithm over weakly connected digraphs,” IEEE Control Systems Letters, vol. 6, pp. 67–72, 2022.
[10] F. Barani, A. Savadi, and H. S. Yazdi, “Convergence behavior of diffusion stochastic gradient descent algorithm,” Signal Processing, vol. 183, p. 108014, 2021.
[11] I. D. Schizas, G. Mateos, and G. B. Giannakis, “Distributed LMS for consensus-based in-network adaptive processing,” IEEE Transactions on Signal Processing, vol. 57, no. 6, pp. 2365–2382, 2009.
[12] L. Zhang, Y. Cai, C. Li, and R. C. de Lamare, “Variable forgetting factor mechanisms for diffusion recursive least squares algorithm in sensor networks,” EURASIP Journal on Advances in Signal Processing, vol. 57, 2017, doi:10.1186/s13634-017-0490-z.
[13] N. Takahashi, I. Yamada, and A. H. Sayed, “Diffusion least-mean squares with adaptive combiners: Formulation and performance analysis,” IEEE Transactions on Signal Processing, vol. 58, no. 9, pp. 4795–4810, 2010.
[14] J. Lei and H. Chen, “Distributed estimation for parameter in heterogeneous linear time-varying models with observations at network sensors,” Communications in Information and Systems, vol. 15, no. 4, pp. 423–451, 2015.
[15] G. Mateos and G. B. Giannakis, “Distributed recursive least-squares: Stability and performance analysis,” IEEE Transactions on Signal Processing, vol. 60, no. 7, pp. 3740–3754, 2012.
[16] S. Xie and L. Guo, “Analysis of normalized least mean squares-based consensus adaptive filters under a general information condition,” SIAM Journal on Control and Optimization, vol. 56, no. 5, pp. 3404–3431, 2018.
[17] D. Gan and Z. Liu, “Performance analysis of the compressed distributed least squares algorithm,” Systems & Control Letters, vol. 164, p. 105228, 2022.
[18] ——, “Distributed order estimation of arx model under cooperative excitation condition,” SIAM Journal on Control and Optimization, vol. 60, no. 3, pp. 1519–1545, 2022.
[19] D. Gan, S. Xie, and Z. Liu, “Stability of the distributed Kalman filter using general random coefficients,” Science China Information Sciences, vol. 64, pp. 172 204:1–172 204:14, 2021.
[20] L. Guo, “Estimation, control, and games of dynamical systems with uncertainty,” SCIENTIA SINICA Informationis, vol. 50, no. 9, pp. 1327–1344, 2020.
[21] D. Gan and Z. Liu, “Convergence of the distributed SG algorithm under cooperative excitation condition,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2022, doi=10.1109/TNNLS.2022.3213715.
[22] S. Xie, Y. Zhang, and L. Guo, “Convergence of a distributed least squares,” IEEE Transactions on Automatic Control, vol. 66, no. 10, pp. 4952–4959, 2021.
[23] S. Xie and L. Guo, “Analysis of distributed adaptive filters based on diffusion strategies over sensor networks,” IEEE Transactions on Automatic Control, vol. 63, no. 11, pp. 3643–3658, 2018.
[24] O. Macchi and E. Eweda, “Compared speed and accuracy of RLS and LMS algorithms with constant forgetting factors,” Traitement Signal, vol. 22, pp. 255–267, 1988.
[25] Y. Hatano and M. Mesbahi, “Agreement over random networks,” IEEE Transactions on Automatic Control, vol. 50, no. 11, pp. 1867–1872, 2005.
[26] S. Kar, J. M. F. Moura, and K. Ramanan, “Distributed parameter estimation in sensor networks: Nonlinear observation models and imperfect communication,” IEEE Transactions on Information Theory, vol. 58, no. 6, pp. 3575–3605, 2012.
[27] I. Matei, N. Martins, and J. S. Baras, “Almost sure convergence to consensus in markovian random graphs,” in Proceedings of the 47th IEEE Conference on Decision and Control, Cancun, Mexico, December 2008, pp. 3535–3540.
[28] K. You, Z. Li, and L. Xie, “Consensus condition for linear multi-agent systems over randomly switching topologies,” Automatica, vol. 49, no. 10, pp. 3125–3132, 2013.
[29] Y. Wang, L. Cheng, W. Ren, Z.-G. Hou, and M. Tan, “Seeking consensus in networks of linear agents: Communication noises and markovian switching topologies,” IEEE Transactions on Automatic Control, vol. 60, no. 5, pp. 1374–1379, 2015.
[30] M. Meng, L. Liu, and G. Feng, “Adaptive output regulation of heterogeneous multiagent systems under markovian switching topologies,” IEEE Transactions on Cybernetics, vol. 48, no. 10, pp. 2962–2971, 2018.
[31] Q. Zhang and J.-F. Zhang, “Distributed parameter estimation over unreliable networks with markovian switching topologies,” IEEE Transactions on Automatic Control, vol. 57, no. 10, pp. 2545–2560, 2012.
[32] Q. Liu, Z. Wang, X. He, and D. Zhou, “Event-based distributed filtering over markovian switching topologies,” IEEE Transactions on Automatic Control, vol. 64, no. 4, pp. 1595–1602, 2019.
[33] G. Zielke, “Inversion of modified symmetric matrices,” Journal of the Association for Computing Machinery, vol. 15, no. 3, pp. 402–408, 1968.
[34] L. Guo, “Stability of recursive stochastic tracking algorithms,” SIAM Journal on Control and Optimization, vol. 32, no. 5, pp. 1195–1225, 1994.
[35] ——, Time-varying stochastic systems, stability and adaptive theory, Second edition. Science Press, Beijing, 2020.
[36] C. Godsil and G. Royle, Algebraic Graph Theory. Spring-Verlag, 2001.
[37] S. Karlin and H. Taylor, A Second Course in Stochastic Processes. New York: Academic, 1981.

			$\displaystyle\Big{\\|}\prod^{t}_{k=j}\alpha\bm{P}_{k+1}\mathscr{A}\bm{P}^{-1}_{k}\Big{\\|}_{L_{p}}$
		$\displaystyle=$	$\displaystyle\mathbb{E}\left(\Big{\\|}\prod^{t}_{k=j}\alpha\bm{P}_{k+1}\mathscr{A}\bm{P}^{-1}_{k}\Big{\\|}^{p}\right)^{\frac{1}{p}}$
		$\displaystyle=$	$\displaystyle\mathbb{E}\left(\\|\alpha^{t-j+1}\bm{P}_{t+1}\mathscr{A}^{t-j+1}\bm{P}^{-1}_{j}\\|^{p}\right)^{\frac{1}{p}}$
		$\displaystyle\leq$	$\displaystyle\alpha^{t-j+1}\\|\bm{P}_{t+1}\\|_{L_{2p}}\\|\bm{P}^{-1}_{j}\\|_{L_{2p}}=O(\alpha^{t-j+1}).$

		$\displaystyle\\|\bm{\widetilde{\Theta}}_{t+1}\\|_{L_{p}}$
	$\displaystyle=$	$\displaystyle\Big{\\|}{\bm{\Psi}}(t+1,0)\bm{\widetilde{\Theta}}_{0}$
		$\displaystyle-\sum^{t}_{k=0}{\bm{\Psi}}(t+1,k+1)(\bm{P}_{k+1}\mathscr{A}(\bm{\Phi}_{k}\bm{W}_{k+1}+\bm{\bar{P}}^{-1}_{k+1}\Delta\bm{\Theta}_{k}))\Big{\\|}_{L_{p}}$
	$\displaystyle\leq$	$\displaystyle\\|\alpha^{t+1}\bm{P}_{t+1}\mathscr{A}^{t+1}\bm{P}^{-1}_{0}\bm{\widetilde{\Theta}}_{0}\\|_{L_{p}}$
		$\displaystyle+\Big{\\|}\sum^{t}_{k=0}{\alpha^{t-k}\bm{P}_{t+1}\mathscr{A}^{t-k+1}(\bm{\Phi}_{k}\bm{W}_{k+1}+\bm{\bar{P}}^{-1}_{k+1}\Delta\bm{\Theta}_{k})}\Big{\\|}_{L_{p}}$
	$\displaystyle\leq$	$\displaystyle O(\alpha^{t+1}\\|\bm{P}_{t+1}\\|_{L_{2p}})$
		$\displaystyle+\sum^{t}_{k=0}\alpha^{t-k}\\|\bm{P}_{t+1}\\|_{L_{3p}}\\|\bm{\Phi}_{k}\\|_{L_{3p}}\\|\bm{W}_{k+1}\\|_{L_{3p}}$
		$\displaystyle+\sum^{t}_{k=0}\alpha^{t-k}\\|\bm{P}_{t+1}\\|_{L_{3p}}\\|\bm{\bar{P}}^{-1}_{k+1}\\|_{L_{3p}}\\|\Delta\bm{\Theta}_{k}\\|_{L_{3p}}.$

	$\displaystyle\mathbb{E}\left(\left(\sum^{n}_{l=1}\Pi^{k-1}_{th^{\prime}+1}(j,l)\bm{P}_{th^{\prime}+1,l}\right)^{2}\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{1+\\|\bm{\varphi}_{k,j}\\|^{2}}\Bigg{\|}\mathscr{F}_{z_{t}h}\right)$
$\displaystyle=$	$\displaystyle\mathbb{E}\Bigg{(}\Bigg{(}\mathbb{E}\left(\sum^{n}_{l=1}\Pi^{k-1}_{th^{\prime}+1}(j,l)\bm{P}_{th^{\prime}+1,l}\right)^{2}\Bigg{\|}\mathscr{F}_{k}\Bigg{)}$
	$\displaystyle\cdot\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{1+\\|\bm{\varphi}_{k,j}\\|^{2}}\Bigg{\|}\mathscr{F}_{z_{t}h}\Bigg{)}$
$\displaystyle\geq$	$\displaystyle\sigma\mathbb{E}\Bigg{(}\left(\sum^{n}_{l=1}\bm{P}_{th^{\prime}+1,l}\right)^{2}\frac{\bm{\varphi}_{k,j}\bm{\varphi}^{T}_{k,j}}{1+\\|\bm{\varphi}_{k,j}\\|^{2}}\Bigg{\|}\mathscr{F}_{z_{t}h}\Bigg{)}.$	(48)