Failure Detection and Isolation in Integrator Networks
Abstract
Detection and isolation of link failures under the Laplacian consensus dynamics have been the focus of our previous study. Our results relate the failure of links in the network to jump discontinuities in the derivatives of the output responses of the nodes and exploit that relation to propose failure detection and isolation (FDI) techniques, accordingly. In this work, we extend the results to general linear networked dynamics. In particular, we show that with additional niceties of the integrator networks and the enhanced proofs, we are able to incorporate both unidirectional and bidirectional link failures. At the next step, we extend the available FDI techniques to accommodate the cases of bidirectional link failures and undirected topologies. Computer experiments with large networks and both directed and undirected topologies provide interesting insights as to the role of directionality, as well as the scalability of the proposed FDI techniques with the network size.
I Introduction
Multi-agent network systems have found promising applications in areas such as motion coordination of robots [1]. Cooperative dynamics over a network can be strongly affected by the network failures. Hence, studying the effects of link or node failures on the network dynamics is an important topic in network science and it has various practical implications [2, 3].
Fault Detection and Isolation (FDI) in networked systems is an active area of research with application to power networks [4], and security of cyber-physical systems [5]. In a systematic approach the so-called FDI filter uses available measurements to generate a residual signal, which is then used for fault diagnosis [6]. Various observer and Kalman filter techniques are used to obtain the residual signal [7], which can be then compared with a threshold signal to detect faults. In [8] the residual signal and threshold signals are generated using filtering techniques that allow for noise suppression and less conservative detection thresholds. Failures and attacks are modeled as disturbances in the descriptor systems approach that is adopted in [5], and some fundamental limitations of detection and identification in cyber-physical systems are established from systems-theoretic or graph-theoretic perspectives.
The mathematical framework for investigating the ability to detect and distinguish failures based on the observed output responses is established in [9], where conditions in terms of the inter-nodal distances to the observation points are provided for the detectability of links. Subsequent results in [10] provide a method for detection and isolation of failures under the Laplacian network dynamics. These results relate the presence of discontinuities in the derivatives of particular orders in the output responses of a subset of nodes to occurrence and location of link failures; and coupled with efficient sensor placement algorithms they can be used to pin down the location of failures in the network. In this paper, we analyze the particular case of networked single integrator dynamics and extend the proofs and FDI algorithms to allow for efficient sensor location in a (directed or undirected) integrator network where link failures can be either unidirectional or bidirectional.
The remainder of this paper is organized as follows. In Section II we set up the notation and give the preliminaries on weighted digraphs and their algebraic properties. Our main analytical findings are set forth in Section III where we first prove the results relating link failures to jump discontinuities of the output derivatives for single-integrator networks, and then consider the case when the links can fail simultaneously in both forward and reverse directions. In Section IV we flesh out the necessary formalism to accommodate both cases of unidirectional and bidirectional link failures within the framework of the previously developed sensor placement algorithms. Computer experiments on large networks in Section V elucidate the results and offer interesting insights on the effects of the network size and edge directionality. Section VI concludes the paper.
II Preliminaries
Throughout the paper, is the empty set, denotes the set of all natural numbers, and denotes the set of all real numbers. Also, the set of integers is denoted by , and any other set is represented by a calligraphic capital letter. The cardinality of a set is denoted by , and denotes the power-set of , which is the set of all its subsets. The difference of two sets and is denoted by and is defined as , where is the logical conjunction. In addition the logical implication and bi-implication are denoted by and , respectively, and denote the logical disjunction. The fixed integer represents the number of nodes in the network. Matrices are represented by capital letters, vectors are expressed by boldface lower-case letters, and the superscript T denotes the matrix transpose. Moreover, denotes the identity matrix with proper dimension, is the -th unit vector in the standard basis of , and indicate the element of matrix which is located at its th row and th column.
A directed graph or digraph is defined as an ordered pair of sets , where is a set of vertices and is a set of directed edges. In the graphical representations, each edge is depicted by a directed arc from vertex to vertex . Vertices and are referred to as the head and tail of the edge and a edge is called a self-loop on . Given an integer , a set of (possibly repeated) indices and two vertices , an ordered sequence of edges of the form , is called a walk with start-node , end-node and length . A cycle on node signifies a walk. For any , is the set of all walks in with length . Similarly for , i.e. the set of walks with length that include the edge . In the same venue, the integer is referred to as the distance from to in , and by convention, and if , . For ease of notation we usually use . The diameter of , denoted by , is defined as the maximum distance between any pair of nodes : . A matrix is called an in-weighting on if . If is an in-weighting on , then is referred to as the weight of walk w.r.t. . Given a set of walks in digraph and the in-weighting on , the function is defined, which finds use in the proof of the main results in Section III. Note that this function satisfies . It is also known that given an in-weighting on and vertices , that [11, 12]:
(1) |
The following definition comes handy in the statement and proof of the main theorem in the next section. At each time and for all and , define as . The function as defined, measures the jump in the th derivative of the response of agent at the time of failure , and the parameter is a fixed integer, denoting the highest order of derivatives to which the designer has access.
III Failures in Networks of Single-Integrators
Let us consider a network of single-integrators, where each integrator is described by a single state , with the following dynamics:
(2) |
where , , , and is an in-weighting of graph . We assume that the entries of are -differentiable. Let us assume link fails at time , resulting in a faulty connectivity graph for . The corresponding in-weighting of , denoted by , is a perturbed version of that satisfies , and for all ; i.e. entries on the -th row of are allowed to change while every other entry remains unaffected.
The following theorem characterizes the effect of link failures on the output derivatives of a network of single integrators:
Theorem 1.
Consider the dynamic network of single integrators in (2) with output equation , and assume link fails at time . Then, the output of node satisfies:
(3) |
for ; and , for .
Proof:
See the Appendix. ∎
The next lemma and the following corollary show in particular how the result of Theorem 1 agrees with and maps to the result of Theorem 1 in [13], where we consider the case of networked LTI agents. However, the proofs in [13] rely on Laplace domain techniques rather than the combinatorial arguments in the time-domains that helped us prove Theorem 1; indeed, the flexibilities in the latter case facilitate some extensions that are of particular interest to FDI scenarios.
Lemma 1.
Consider the failure of link and suppose that . If the failure of link is to induce a jump discontinuity in the -th derivative of the output response of node , then it should be true that .
Proof:
See the Appendix. ∎
Corollary 1.
The condition stated in Lemma 1 is intuitive because holds true only if there exist a shortest path of length connecting to and with as its first edge. In other words, the failed link contributes to the flow of information from to as an element of a shortest path from node to node . These observations are in perfect agreement with the sufficient conditions previously studied in [9]. On the other hand, Corollary 1 shows how Theorem 1 may follow as a special case of Theorem 1 in [13] (up to a known constant multiplier), after setting the relative degree for the involved networked LTI systems. The proofs in the single-integrator case, however, admit additional niceties that we discuss next.
III-A Bidirectional Link Failures
To begin, note that the perturbed matrix is not constrained in the way its entries on the -th row are modified, thence Theorem 1 continues to hold in the case where all or several of the edges incoming to node are lost simultaneously. Indeed, it is perceivable for the faults in an agent’s hardware or internal structure to cause the failure of multiple links which are incoming to that agent. In the particular case of a faulty agent , which looses all its incoming links at the instant of failure , the systems dynamics for is characterized by , . Hence, as a special case, Theorem 1 and the corresponding FDI techniques that are developed in Section IV may also be applied for the detection and isolation of single agent failures by mapping the isolated edges to their head vertices.
On the other hand, for certain applications, where communications are of a bidirectional nature, it is reasonable to consider link failures that simultaneously prevent either agents from communication in the other’s direction. Such a failure corresponds to the simultaneous elimination of both links , defined earlier, and leading to as the information flow structure for . It is worth highlighting that undirected networks, where , , signify the special case that all, not just some, of the links are bidirectional. The FDI methods in this paper are designed to handle the cases where some of the links in the networks are bidirectional and the rest are unidirectional. It is worth pinpointing that, as an assumption of modeling each link is considered either bidirectional or unidirectional, but not both. In other words, if and and the link between the nodes and is specified as bidirectional, then fails if, and only if, fails; otherwise, the two links and are regarded as separate, and their failures as independent events. Accordingly, it is assumed that the set of all bidirectional links in the network, is known to the designer beforehand.
After modeling the failure of a bidirectional link as the simultaneous failures of two directed links, and , the proof of Theorem 1 can be adapted to yield:
Proposition 1.
In the case of the simultaneous failure of and , (6) still holds true if we substitute by , .
Proof:
See the Appendix. ∎
We end this section by an intuitive remark that as each agent of the network system in (2) is a single-integrator, a jump discontinuity (because of a sudden network failure) at point will appear to point after several (length of path) steps of “integrations”. Thus, an agent at point needs to make the same number of “differentiations” before observing the jump due to the failure at point . In what follows we shall see how to determine the observation points along with the required number of differentiations at each point so that the occurrence and location of failures are always inferable from the observed jumped discontinuities.
IV Sensor Placement for Unidirectional and Bidirectional Links
It is assumed that at each instant of time, the designer is given access to the response of a subset of agents, as well as the nominal network information flow digraph (prior to the link failure) and the set of bidirectional . Neither the location of the failure (nodes and ), nor the time of failure are known to the designer. In the case of detection, the designer is interested in determining the existence of any single link failure in the network at the instant of failure. For the isolation problem, however, the designer would like to determine “instantaneously”, not only the existence of a failure, but also its location. That is to determine which link, if any, has failed and exactly at the same instant as it fails. The significance of “instantaneous” detection and isolation is better understood upon noting that if the time of failure is random and has a continuous sample space, then “simultaneous” failure of more than one link is a measure zero event; hence, justifying the focus of investigation in this paper, which is on the “single” (possibly bidirectional) link failures. Before shifting attention to the sensor placement problem, two assumptions are set forth:
Assumption 1.
For all pairs of nodes , the in-weighting on digraph satisfies , i.e., the sum of the weights of all shortest paths between them is nonzero.
Assumption 2.
Given the in-weightings and of the faultless and the faulty network, and , respectively, we have that , where is the head (or tail too, if the link is bidirectional) of the failed link and denotes the instant of failure.
The first assumption above is a provision of consistency that is assumed with regard to the in-weighting matrix . This assumptions is satisfied almost surely for any assignment of weights on the graph. In particular, it holds true for the Laplacian consensus networks considered in [10]. The second assumption involves the values of the agents states at the time failure . This condition also holds true, almost surely, for any in-weightings , its perturbed version , and a random time of failure ; since specifies a low-dimensional hyperplane in the agents’ state space that the agents almost surely avoid given a random time of failure.
To enable the designer to handle the desired FDI tasks, she is given access to the output response of a subset of nodes as well as their derivatives upto the -th order. In this section, we offer efficient procedures for determining such a subset of nodes, given the network topology and parameter ; and in such a way that all link failures in the networks can be detected or isolated from the occurrence of jump discontinuities in the observed outputs and their derivatives. Furthermore, we would like to achieve this goal using as few observation points (sensors) as possible. From Corollary 1 it follows that if the existence of a jump discontinuity in the th derivative of the output response of agent is to serve as the basis for a method to detect the failure of edge at time , then it should be true that . The latter happens only if there exist a shortest path of length connecting to and with as its first edge. In [10] we use this observation in the case of Laplacian network dynamics to define binary relations and between the sets and such that for all and if , then the failure of link produces a jump in the derivative of the response of node and if then the failure of edge does not produce a jump in any of the derivatives of the response of node upto the th order. We now go ahead and redefine the binary relations per Proposition 1 to accommodate bidirectional link failures. Indeed, bidirectional links are treated specially as any of the two edges in reverse directions can provide us with the required relation for detection when a bidirectional failure occurs.
Definition 1.
We define the binary relations and for , between and , as follows. For all and , we have that:
-
•
If , then if, and only if, .
-
•
If , then if, and only if, one of the following conditions is satisfied:
-
–
, or
-
–
.
-
–
The FDI problems can now be posed as follows.
Problem 1 (Detection).
Given a digraph , find a subset of nodes of minimum cardinality , such that for all , there exists a node such that .
Problem 2 (Isolation).
Given a digraph , find a subset of vertices with the smallest cardinality , such that .
The idea for proposing efficient sensor placement algorithms that approximate the solutions of the above problems is by counting the number of edges that are not yet detectable or isolatable from the currently chosen nodes and add new nodes to the existing sets in a greedy manner: in each addition of a new node to the existing sensor set, we aim to decrease the number of edges that are not yet detectable or isolatable as much as possible. To this end, we define a correspondence which maps an order pair , comprised of a sensor set and an edge , to the set of ordered pairs that specify the relations between edge and nodes in . Accordingly, those edges and which produce the exact same pattern of jumps and at the exact same order of derivatives in the output responses of the nodes in would satisfy ; and none of them can be identified using just the nodes in . We further define two set functions and which take a subset of nodes and map it to the number of edges that are, respectively, not detectable or not isolatable using the sensor set . In [13] these functions are shown to be supermodular; wherefore per the theory of submodular set coverings [14], adding nodes greedily with respect to these functions would guarantee that the chosen sensor set is within a factor of the minimal sensor sets that achieve detection or isolation goals (solutions of Problems 1 and 2). The following algorithms are proposed in [10], and included below for completeness, to implement this idea of supermodular greedy minimization.
It was noted in Subsection III-A that the set of bidirectional links should be made known to the designer. To facilitate the application of Algorithms 1 and 2 to the case of bidirectional link failures in networked single-integrator agents, we define an equivalence relation on the set that identifies two parallel edges in reverse directions only if they are bidirectional. Specifically, for any such that and , set , while for any two edges that , iff . The task of the equivalence relation is to identify those edges who share the same head and tail but at opposite directions, only if they are bidirectional. Every other edge in the network is distinguished and therefore identified only with itself. With the afore-defined equivalence relation , for any edge , denotes the equivalence class of , and for any subset of edges , is the quotient of by , which is the set of all equivalence classes of the elements of . Last but not least, is the issue of self-loops which are specific to the case of single-integrator agents. In particular, every self-loop would always satisfy an relation with all nodes in the network and the proposed algorithms cannot be applied for the detection and isolation of a self-loop , although its value (weight) is allowed to change with the failure of a link incoming to node . In the sequel, the set of all self-loops in is denoted by . Next, changing the definitions of the correspondence , and the supermodular functions and as follows, allows us to apply Algorithms 1 and 2 to single-integrator networks, while properly identifying bidirectional links and accommodating self-loops. Define for all and any of the equivalence classes in :
(7) | |||
(8) | |||
(9) | |||
V Computer Experiments with Large Networks
In the following subsections, the performance of the developed routines is tested for different random graph models and with varying model parameters.
V-A A Random Geometric Graph
In a random geometric graph model the nodes of the network are randomly and uniformly spread across a bounded region, and there is an undirected edge between a pair of nodes, wherever a certain distance threshold is met. The graph of Fig. 1(a) depicts one such graph instance with nodes and undirected edges, which are interpreted as pairs of bidirectional links. For this graph a total of nine nodes is sufficient for complete detection, whereas even with all of the nodes observed none of the bidirectional links can be isolated. In other words, for any bidirectional link in the network, there exists at least one other link whose removal will induce the same set of jumps in the entire node-set of the network.
The situation is rather different if the undirected edges of the network in Fig. 1(a) are regarded as unidirectional links. Then the output of Routine 1 has nodes that are indicated in Fig. 1(b), and by observing them the designer can isolate edges out of the total . Observing all of the nodes in the network decreases the cardinality of the set of unresolved edges from to just , out of the total . It is worth highlighting that with the change in the interpretation of the links from bidirectional to unidirectional, matrix of the network remains the same, and so does the required highest order of derivatives .


Next, each of the undirected edges in Fig. 1(a) is oriented randomly leading to a total of unidirectional edges in Fig. 2. In the latter, a total of nodes is sufficient for detection, and these nodes enable the isolation of all but edges of the digraph, which are highlighted in Fig. 3. For this directed network, by observing all of the nodes in the network, the cardinality of the set of unresolved links reduces to .

The preceding results suggest that while detection is achievable more easily in undirected networks, the increased diversity brought about by the directionality of the links improves the isolation task for the case of directed networks.

The focus of investigation in the following subsections is shifted to the Erdős-Rényi random graph model, for which the role of edge probability and graph size on the cardinality of the detection set and the highest order of derivatives required is explored.
V-B Erdős-Rényi Random Graphs: Directed versus Undirected
In a Erdős-Rényi random graph model every potential edge is either existent or not with a fixed probability , and independently of all the rest. This model is implemented for varying network sizes , and different edge probabilities . In Figs.4 and 5, the cardinality of the detection sets in several randomly generated instances are recorded, averaged, and plotted. The sample means in each case are computed over random instances and the error bars indicate the sample standard deviations for those instances. The plots in all cases confirm the increased difficulty of the detection process for the case of directed networks. Moreover, the cardinality of the detection sets does not scale fast with the network size; an observation which is of practical significance for large networks and complements the theoretical guarantees that are available from the submodular set covering literature. In the case of edge probabilities, however, it is observed that as the edge probabilities approach leading to a complete graph, the number of nodes required for detection becomes increasingly large. It is worth highlighting that although for small the networks are sparse and can have large diameters, as the edge probability is increased beyond the network diameter remains constant at so that only the first three derivatives of any chosen sensor set need to be observed. Similarly for , as the network size is increased beyond , the network diameters remain fixed at and only the first four derivatives of the outputs in any chosen sensor set are observed.


VI Conclusions
In this paper, we developed FDI techniques for single-integrator networks that enable the designer to detect and isolate link failures based on the observed jumps in the derivatives of the output responses of a subset of nodes by relating the jumps in the derivatives at the time of failure to the distance of the failed link from the observation point. Our results covered both cases of unidirectional and bidirectional link failures. We also extended our previously developed sensor placement algorithms to accommodate both types of link failures (unidirectional and bidirectional). These algorithms were tested in large random networks, and the results suggest that link failures in directed networks are harder to detect but easier to isolate, as compared to undirected networks. The latter effect can be attributed to the increased diversity that is brought about by the directionality of the links. Moreover, both the cardinality of the detection sets and the required order of derivatives are shown to scale up reasonably well with the network size, and this agrees with the performance guarantees that are available from the theory of submodular set coverings and bound the size of the chosen sets to within a multiplicative factor of the minimal sensor set, where is the size of the edge set.
[Proofs of The Main Results]
-A Theorem 1
Given an initial condition , the solution to (2) for is trivially given by:
(10) |
The evolution of states after failure is therefore governed by the state matrix , instead of , as follows
(11) |
where , i.e., the state of the faultless evolution at the instant right before failure.
For any fixed, differentiating (10) and (11) times and using the Leibniz integral rule yields for that:
(12) | |||
and for that,
(13) | |||
Next, note that by differentiability of and continuity of the states, and . Hence, subtracting the two equations in (12) and (13) for and yields:
(14) | |||
(15) | |||
(16) |
With , and fixed in the preceding and , for all , define , so that (16) can be rewritten as:
(17) |
Next to compute , substitute for and from (1) to get:
By partitioning the sets and , (-A) can be rewritten as:
(18) |
Next note that none of the walks in or can include as an edge for any . This is true, since otherwise if there exists a walk that violates the above, then removing the segment of from to , which consists of at least two edges, one to reach from followed by the edge , yields a new walk with length at most . Now is a walk of length at most in , which is a contradiction, since . It next follows that , as none of the walks involved include any of the edges for , and these are the only edges at which the digraphs and or the in-weightings and differ. Hence, (-A) simplifies into:
(19) | |||
(20) |
The last step in deriving a simplified expression for is to argue that . To see why, note that since is derived upon removal of the edge from , it follows that
(21) |
where the digraph argument for the distance function indicate that the distances are calculated with respect to the edge-removed digraph as opposed to usual case where the distances are calculated with respect to the original digraph . The inequalities in (21) together with , implies that , so that none of the walks in or can include any edges, , and ; hence,
(22) |
which upon replacement in (17) yields:
(23) |
To complete the proof, note that for , , since for any , is a walk of length which contradicts with . Thence, for , (23) simplifies into , thus completing the proof for the case of the failure of the single link . That for , also follows as for any such .
-B Lemma 1
Notice that any shortest path from to of length gives a path of length from to , whence given , it follows that:
(24) |
Also notice that if there is no path of length from to , i.e.
(25) | |||
(26) |
Now (24) and (26) together imply that for the right-hand side of (3) in Theorem 1 to be non-zero when , it should be true that , which is the same as the claimed condition.
-C Proposition 1
To see how Theorem 1 applies to the case of bidirectional link failures, in parallelism with and in the preceding, let be an in-weighting on that denotes the perturbed version of following the simultaneous failure of links and . The perturbations only affect the entries of on its th and th rows, such that , while , and . For any agent and the evolution of the state of agent following the simultaneous failure of links and is given by (11) with substituted for . Repeating the same procedure as in the proof of Theorem 1 leads to the proof of the proposition as follows.
For the case of simultaneous failure of and , note that (12) to (18) continue to hold after substituting and for and , respectively. The transitions from (18) to (19) and (22) also carry through with the same replacements and upon the additional observation that the walks in , , and include neither any edges, as stated in the previous case, nor any edges, . The rest of the proof is identical to the previous case, except for which should be replaced with .
References
- [1] M. Mesbahi and M. Egerstedt, Graph Theoretic Methods in Multiagent Networks. Princeton University Press, 2010.
- [2] M. A. Rahimian and A. G. Aghdam, “Structural controllability of multi-agent networks: Robustness against simultaneous failures,” Automatica, vol. 49, no. 11, pp. 3149 – 3157, 2013.
- [3] J. Kleinberg, M. Sandler, and A. Slivkins, “Network failure detection and graph connectivity,” SIAM Journal on Computing, vol. 38, no. 4, pp. 1330–1346, 2008.
- [4] W. Pan, Y. Yuan, H. Sandberg, J. Gonçalves, and G.-B. Stan, “Real-time fault diagnosis for large-scale nonlinear power networks,” in Proceedings of the 52nd IEEE Conference on Decision and Control, 2013, pp. 2340–2345.
- [5] F. Pasqualetti, F. Dorfler, and F. Bullo, “Attack detection and identification in cyber-physical systems,” Automatic Control, IEEE Transactions on, vol. 58, no. 11, pp. 2715–2729, 2013.
- [6] E. E. Tiniou, P. M. Esfahani, and J. Lygeros, “Fault detection with discrete-time measurements: An application for the cyber security of power networks,” in Proceedings of the 52nd IEEE Conference on Decision and Control, 2013, pp. 194–199.
- [7] P. Menon and C. Edwards, “Robust fault estimation using relative information in linear multi-agent networks,” Automatic Control, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2013.
- [8] C. Keliris, M. Polycarpou, and T. Parisini, “A distributed fault detection filtering approach for a class of interconnected continuous-time nonlinear systems,” Automatic Control, IEEE Transactions on, vol. 58, no. 8, pp. 2032–2047, 2013.
- [9] M. A. Rahimian, A. Ajorlou, and A. G. Aghdam, “Digraphs with distinguishable dynamics under the multi-agent agreement protocol,” Asian Journal of Control, 2014, in press.
- [10] M. A. Rahimian and V. M. Preciado, “Detection and isolation of link failures under the agreement protocol,” in Proceedings of the 52nd IEEE Conference on Decision and Control, 2013, pp. 7364–7369.
- [11] N. Biggs, Algebraic Graph Theory. Cambridge University Press, 1994.
- [12] V. M. Preciado and A. Jadbabaie, “Moment-based spectral analysis of large-scale networks using local structural information,” IEEE/ACM Transactions on Networking, vol. 21, no. 2, pp. 373–382, Apr. 2013.
- [13] M. A. Rahimian and V. M. Preciado, “Detection and isolation of failures in directed networks of lti systems,” arXiv preprint arXiv:1408.3164, 2014.
- [14] L. Wolsey, “An analysis of the greedy algorithm for the submodular set covering problem,” Combinatorica, vol. 2, no. 4, pp. 385 – 393, 1982.