Overview of Fault Tolerant Techniques in Underwater Sensor Networks

Lauri Vihman Department of Computer Systems
Tallinn University of Technology
Tallinn, Estonia
lauri.vihman@taltech.ee Maarja Kruusmaa Department of Computer Systems
Tallinn University of Technology
Tallinn, Estonia
maarja.kruusmaa@taltech.ee Jaan Raik Department of Computer Systems
Tallinn University of Technology
Tallinn, Estonia
jaan.raik@taltech.ee

Abstract

Sensor networks provide services to a broad range of applications ranging from intelligence service surveillance to weather forecasting. Most of the sensor networks are terrestrial, however much of our planet is covered by water and Underwater Sensor Networks (USN) are an emerging research area. One of the unavoidable increasing challenge for modern technology is tolerating faults - accepting that hardware is imperfect and cope with it. Fault tolerance may have more impact underwater than in terrestrial environment as terrestrial environment is more forgiving, reaching the malfunctioning devices for replacement underwater is harder and may be more costly. Current paper is the first to investigate fault tolerance, particularly cross layer fault tolerance, in USN-s.

Index Terms:

underwater, sensor network, resilient, fault tolerance, cross-layer, fault management, internet of things

I Introduction

In current paper applications, practices, and central issues on Fault Tolerant Underwater Sensor Networks (USN-s) from previous research works are discussed. Because the global community has not yet put much effort on research of Fault Tolerance of USN-s, the criteria is expanded and papers covering only some parts of the topic are also taken into account. Many of the technologies, approaches and tools may possibly be adapted for use in USN-s. Fig. 1. shows the tasks of Fault Tolerance applicable in USN-s and how they affect each other. While design and initial deployment of USN-s contribute to Fault Prevention and Prediction abilities, data collecting techniques at the runtime contribute also to Fault Detection and Fault Recovery stages of the system, all of which are going to be discussed in current paper.

The rest of this paper is organized as follows: Further in current section methodology of selecting papers is explained, possible fault sources and categorization of techniques is discussed. Following sections are divided like shown on Fig. 1. In II Fault Prevention and Prediction section design, deployment, data collection and testing frameworks are overviewed. In III Fault Detection and Identification section techniques for fulfilling those tasks are discussed. In IV Fault Masking and Recovery section relevant techniques are overviewed. In V open research issues are discussed and finally in VI current work is concluded.

I-A Methodology

In order to obtain relevant sample in field of fault tolerant USN-s IEEE Explore, Google Scholar, Sciencedirect and Espacenet online environments were used with following search keywords in different combinations: “underwater”, “sensor network”, “internet of things”, “resilient”, “tolerant”, “fault management”, “cross-layer”. Top papers were selected by relevance order offered by environments’ algorithms and sources on the topic found from those papers. Other citations of those sources were searched and more papers found this way. Related articles offered by IEEE Explore and Sciencedirect algorithms were also taken into account. Collected papers were next analyzed, classified and divided into marine and terrestrial categories, and the number of papers managing specific areas of research are shown on radar diagram Fig. 2. It should be noted that in the context of Fig. 2 meaning of “localization” is location detection in space and meaning of “mobile” is capacity of movement. It can be seen from Fig. 2 that large share of marine research interest from the found papers has been drawn to underwater wireless communication while some on underwater fault tolerance techniques and almost none to underwater cross-layer fault tolerance. Underwater energy-efficiency and scalability are more covered areas than underwater vehicles (mobility) and security. Terrestrial papers were, according to initial search criteria, more concerned on fault-tolerance, including cross-layer fault tolerance, and less on energy-efficiency or security.

High research effort on marine wireless networking in Fig. 2 conforms the claim that current pace of research on Internet of Underwater Things is slow due to the challenges arising from the uniqueness of underwater wireless sensor networks [1]. Specifically, the main challenges for Internet of Underwater Things are the differences between Underwater Wireless Sensor Networks and Terrestrial Wireless Sensor Networks [1].

Refer to caption — Figure 1: Fault tolerance tasks in USN-s

I-B Sources of Faults

A fault is defined [2] as an underlying defect of a system that leads to an error. Error is a faulty system state, which may lead to failure, and failure is an error that affects system functionality. Faults may occur in different components and layers of systems for different reasons. The only type of fault possible in software is a design fault introduced during the software development i.e bug [3]. The software bugs can be addressed separately and will not be covered further in current paper.

Fault sources can be categorized by components where they occur. In sensor networks they can occur in sensor nodes, network and the data sink [4]. Sensor networks share common failure issues with traditional networks as well as introduce node failures as new fault sources [5].

USN-s introduce additionally faults caused by environment conditions such as pressure, currents, underwater obstacles, etc. Those conditions may cause physical damage that may result in failures as well as obstruct system functionality. For instance in underwater acoustic networks loss of connection and high bit error rate may be caused by shadow zones [6] formed by different physical reasons. Domingo and Vuran (2012) [7] distinguish up to 5 different underwater propagation phenomena which may obstruct communication.

Faults can either be permanent or temporary [8]. Permanent faults may be caused by manufacturing defect, as variances of the hardware components are inevitable due to physical reasons [9]. One of the other factors that can introduce faults is aging and wear-out of the hardware components [10]. In addition to the components themselves also the interconnection between them are affecting reliability and may cause faults [11]. One of the challenges of fault management is temporary faults, especially soft errors. Soft Error is a temporary change of signal value due to ionizing particles [8] that may lead to failure. Due to high integration density it is estimated that soft failure rate is increasing in future [12].

I-C Fault Tolerance Techniques

A distributed system is defined [13] as a collection of independent computers that appears to its users as a single coherent system. A Sensor network consists of a number of sensor nodes that form the network and feed data to single or multiple data sinks. Provided that in the sensor network the sensor nodes are autonomous, it can be seen as a distributed system. Faults happening in sensor networks can be addressed using the same techniques as in distributed systems [13]. The used techniques can be categorized into following groups:

•

Fault Prediction and Prevention are about preventing a fault to happen and proactive fault avoidance.
•

Fault Detection and Identification are responsible for detecting and localizing of the fault.
•

Fault Isolation, Masking and Recovery are different techniques for repairing fault, minimizing the effect of a fault or avoiding it to turn to system failure.

II Fault Prevention and Prediction

Fault prevention and prediction in sensor networks is dependent on the initial deployment method of the sensor network and the architectural design of the system. These will be looked at in the following subsections.

II-A Design of the Sensor Network

In Wireless Sensor Networks (WSN), instead of a centralized homogeneous topology, dividing nodes into clusters is an energy efficient and resilient method [14], where dedicated cluster head nodes may have more energy and communication capabilities to effectively act as mediators between regular nodes and data sinks.

To overcome issues caused by varying environmental challenges of Underwater Wireless Sensor Networks (UWSN), natural algorithms may be utilized. For instance clustering and routing can be done utilizing Cuckoo Search algorithm and Particle Swarm Optimization [15] which have behaved more resiliently in underwater conditions than more usual terrestrial Low Energy Adaptive Clustering Hierarchy (LEACH) protocol [16]. Pressure measurements have been used for UWSN routing [17] with floating depth-controlling sensors. Fault Management tasks can also be distributed across the whole network. In WSN with enough spare nodes energy efficient grid can be formed [18], changing node manager, gateway and sensing nodes selected and spare nodes put to sleep. This results in energy-efficient and lightweight network but needs excess nodes.

However, existing UWSN protocols have not been adequately compared in underwater field trial yet [19].

II-B Sensor Network Deployment

Sensor network deployment techniques are important for WSN where deployment may affect directly nodes’ locations and networking availability. Even for terrestrial wireless sensor networks, to obtain a satisfactory network performance, an adaptable deployment method is essential [20]. Usually sensor placement for WSN utilizes more sensors than the minimum required number for redundancy reasons [21]. The deployment costs and energy efficiency of WSN-s have been investigated and it has been found that there is no single solution that can be easily applied in practice [22].

Wired sensor network deployment is less researched, possibly because wired sensor networks’ node deployment locations are limited to cable, their locations are more predetermined and node connectivity is not directly related to location.

II-C Data Collection

Sensor networks tend to have limited network bandwidth, energy and storage capabilities, thus filtering and aggregating sensor information may be a way to meet requirements. Raw sensor data near the source can be divided into informative, non-informative and outlier groups [23] and only needed data communicated or stored. Outlier data may result from noise, failures, disturbances etc. and may be useful for fault tolerance purposes.

Different techniques to compress and aggregate collected information in UWSN-s are investigated [24]. I was found that aggregation is justified and cluster-based aggregation techniques are performing better than non-cluster based or other.

Security challenges need to be addressed and one way to minimize the risk of data tampering and/or interference is to ensure that data is processed locally or if that is not possible then communicated end-to-end encrypted [25].

II-D UWSN testing Frameworks

For UWSN-s there have been developed DESERT framework version 1 and version 2 [26] and SUNSET framework [27] that allow simulation, emulation and testing of networks. A conducted analysis [28] shows that SUNSET represents a more mature, flexible and robust framework for in field testing than DESERT, but DESERT v2 was released after that. For acoustic UWSN security testing SecFUN framework [29] has been proposed.

III Fault Detection and Identification

In essence fault detection means determining that one or more bits in the computation differ from their correct value [30]. This can be detected via continuous monitoring of the network and nodes’ status. Some sources also use the word “Diagnosis” in a broader meaning than just detection and identification. Diagnosis has been defined as “characterizing the system’s state to locate the causes of errors, determine how the system is changing over time, and predict errors before they occur [30]“. Current section covers different techniques to execute previously mentioned concepts.

Distributed hierarchical fault management has been used [31] for WSN-s, where agent fault detection devices collect information from power modules and sensors to determine failure conditions and sequentially diagnose the nature of the detected failure.

In industry on higher abstraction levels there has been wide use of SNMP [32] protocol for fault detection querying and triggering in IP networked devices. There are multiple commercial tools for generating failures, e.g Chaos Monkey from Netflix [33], that randomly terminate services in production environments, to ensure resiliency of them. The latter do not manage occurring faults but ensure that the repairing mechanisms are in place and operable. Intelligent Platform Management Interface (IPMI) [34] is an industrial technology specification for hardware system management and monitoring.

Neural-network-based scheme for sensor failure detection, identification, and accommodation can be used which may allow the conditions to deviate to greater extent from theoretical models and estimation. Relatively simple and computationally light approach has been presented [35] where neural network is used as on-line learning state estimator for detecting faults. Neural network itself can be built as fault-tolerant [36], so that failing nodes have least impact on result data.

Situational Awareness approach, using a mechanism that has been borrowed from humans, can be used for Internet of Things (IoT) sensor data interpretation, specifically regarding processes of sensation, perception and cognition. In addition to specification-based and learning-based approaches, perception-based approach utilizing Fuzzy Formal Concept was proposed [37] for Situational Awareness identification.

Semantic Sensor Network Ontology has been proposed [38] for managing interoperability between sensing systems. The Semantic Ground describes information for interoperability and cooperation among agents [39]. To enhance resilience in Semantic Sensor Networks, monitoring nodes may forward observations to association nodes, which develop Situational Awareness by mining association rules for example via natural Artificial Bee Colony algorithm [39].

Electrical Power Grids need efficient monitoring, for outage detection, environmental monitoring and fault diagnostics different WSN-based approaches are reviewed [40]. Most of these approaches are also used in other applications.

IV Fault Isolation, Masking and Recovery

After fault detection, identification and diagnosing, fault handling stage can be entered [31] to prevent further data corruption and system deterioration. The fault handling stage consists of Fault Isolation, Masking and Recovery. Fault handling can hide the fault occurrence from other components - the key techniques for such masking are informational, time and physical redundancy [13]. Isolating a faulty component from others can be facilitated by using virtualization [13]. In large scale distributed systems frozen virtual images of healthy services have been used as checkpoints [41] for rolling back in case of a fault occurrence.

Fault Recovery ensures that the fault does not propagate to visible results, for instance by rolling back to a previous healthy state (checkpointing) or re-trying failed operations (time redundancy). Some of the techniques for Fault Recovery can be Reconfiguration - changing the system’s state so that the same or similar error is prevented from occurring again, and Adaptation - re-optimizing the system for instance after Reconfiguration task [30].

In Sensor Networks, different approaches for Fault Recovery have been used, that have different resource overheads, energy-efficiencies, scalabilities and network types. For both network and node fault recovery in wireless sensor networks Mitra (2016) [42] compares checkpoint based (CRAFT), agent based recovery (ABSR), fault node recovery algorithm (FNR), cluster-based and hierarchical fault management (CHFM), Failure Node Detection and Recovery algorithm (FNDRA). While some of those are specific for terrestrial wireless usage, some principles (e.g checkpointing etc.) can also be used in wired and/or underwater environments. To reduce network bandwidth requirements checkpoint backup can be mobile to nearby nodes [43] and used for recovering from fault situations.

In networks, error control schemes are commonly classified into three groups [7]:

•

Automatic Repeat Request (ARQ) - retransmission of corrupted data is asked
•

Forward Error Correction (FEC) - data corruption can be detected and corrected by receiving end
•

Hybrid ARQ (HARQ) - combination of FEC and ARQ

These groups are similar to already mentioned techniques for node fault management.

Cross-layer approach benefits fault recovery significantly because single layer redundancy, such as hardware redundancy and application checkpointing, have very high costs and a wide variance in delay between fault occurrence and detection makes recovery difficult [30].

V Open Research Issues

V-A Security

Faults and security are interrelated concepts [41]. It needs effort to prevent systems from being penetrated when working as intended, faults add uncertainty and make the task of prevention even harder. Faults can be created by an intrusion, but moreover faults can enable new intrusion vectors [44] - misbehaving devices violate key assumptions and create number of new attack vectors to systems. For example soft errors explained in section I-B can be used to defeat cryptography [45].

V-B Energy-efficiency

Power dissipation has now reached a point where energy concerns limit the computation we can deploy on chip [44] and the aim is shifting from transistor density and speed to energy density and cost. Energy density and efficiency needs to be addressed also on larger-scale, for instance WSN-s may not have unlimited power supply and need to utilize energy-efficiency strategies [22, 18, 14, 16]. For fault tolerance techniques, cross-layer approach is considered more energy-efficient [30] than single layer. Strategic redundancy in cross-layer approach may allow systems to safely operate on the verge of failure [44] spending less energy without going over the edge.

V-C Scalability

One of traditional benefits of scaling has been the decrease of cost per functionality [44], but easing reliability problems by multiplicating logic, voting and similar techniques means that the scaled technology might not offer a reduction of energy or area. Some tolerance techniques may increase computing overhead, and not all approaches are scalable [42]. Large scale fault tolerant systems are researched [41] without paying special attention to energy and communication constraints.

V-D Cross Layer Approach to Fault Tolerance

Faults are not going to disappear but likely to increase in future [12]. One way to cope with faults is to accept imperfect devices to fail and compensate failures at higher levels in system stack [44], tolerating faults cross layer involving circuit design, firmware, operating system, applications etc. Cross-layer fault tolerant systems have potential to implement reliable, high-performance and energy-efficient solutions without overwhelming cost [30] by distributing the responsibilities of tolerating faults across multiple layers [46].

In case fault detection and fault recovery are to be implemented in different system layers then following challenges arise [47]:

•

For statistical validation and metrics high confidence resource-light reliability and availability estimation is needed.
•

Verification of resilience techniques, to be sure that resilience techniques perform under all possible scenarios.
•

Reliability grades for testing and grading system-wide reliability and data integrity. Reliability may change under workload.

In addition to Cross Layer approach also Multi Layer approach [48] has been proposed, where system layers are adapted to each other to reduce error propagation. In opinion of the author of current paper, this is not a separate approach, but rather a small increment of Cross Layer approach.

V-E Specifics of USN networks

Underwater environment is mostly different because of harsh physical conditions - pressure, hard accessibility, limited communication and energy resources. Many communication methods are unavailable underwater and there are multiple phenomena [6, 7] that obstruct communication. Because of the possibility of flooding hardware, more attention and resources should be paid to physical security. On the other hand faults from excessive heat should be rare and avoidable underwater.

While most of the common concepts should be possible to be adapted for underwater use, the environment is more demanding and unforgiving and faults are more costly. Some more demanding approaches like cloud computing may not make sense to implement in USN, but author cannot see any low network bandwidth and power requirement fault tolerant approach mentioned in current paper, that cannot be used underwater. One of the more promising approaches that could be adapted well between underwater environment’s constraints seems to be cross-layer resilience, which for unknown reasons is lacking recent research papers even for terrestrial implementations.

VI Conclusion

Current paper presented fault tolerant techniques, presented a survey on fault tolerant techniques in USN-s and pointed out open research issues in this field. Fault tolerance is addressed in underwater context for reliant UWSN networking [49, 50, 51, 7], space localization [52] and monitoring underwater pipelines [53]. Current paper overviewed fault tolerant techniques that are developed for underwater use or could be adapted for that. The techniques were divided into groups that are used in distributed systems and papers utilizing the techniques discussed in corresponding sections.

Current paper is the first to investigate fault tolerance, particularly cross layer fault tolerance, in USN-s. According to the survey there is no research covering cross-layer fault tolerance for underwater sensor networks.

References

[1] C.-C. Kao, Y.-S. Lin, G.-D. Wu, and C.-J. Huang, “A study of applications, challenges, and channel models on the Internet of Underwater Things,” in 2017 Int. Conf. Appl. Syst. Innov., no. 2, pp. 1375–1378, IEEE, may 2017.
[2] S. Kumar and Balamurugan B, “Fault Tolerant Cloud Systems,” in Encycl. Inf. Sci. Technol. Fourth Ed. (M. Khosrow-Pour, D.B.A., ed.), ch. 93, pp. 1075–1090, IGI Global, 4th ed., 2018.
[3] T. Wilfredo and W. Torres-Pomales, “Software Fault Tolerance: A Tutorial,” Tech. Rep. October, NASA, 2000.
[4] E. Jaynes and F. Cummings, “Fault Management in Wireless Sensor Networks,” Proc. IEEE, vol. 51, no. 1, pp. 89 – 109, 2013.
[5] L. Paradis and Q. Han, “A survey of fault management in wireless sensor networks,” J. Netw. Syst. Manag., vol. 15, no. 2, pp. 171–190, 2007.
[6] M. C. Domingo, “A topology reorganization scheme for reliable communication in underwater wireless sensor networks affected by shadow zones,” Sensors, vol. 9, no. 11, pp. 8684–8708, 2009.
[7] M. C. Domingo and M. C. Vuran, “Cross-layer analysis of error control in underwater wireless sensor networks,” Comput. Commun., vol. 35, no. 17, pp. 2162–2172, 2012.
[8] J. Henkel, L. Hedrich, A. Herkersdorf, R. Kapitza, D. Lohmann, P. Marwedel, M. Platzner, W. Rosenstiel, U. Schlichtmann, O. Spinczyk, M. Tahoori, L. Bauer, J. Teich, N. Wehn, H.-J. Wunderlich, J. Becker, O. Bringmann, U. Brinkschulte, S. Chakraborty, M. Engel, R. Ernst, and H. Härtig, “Design and architectures for dependable embedded systems,” in Proc. seventh IEEE/ACM/IFIP Int. Conf. Hardware/software codesign Syst. Synth. - CODES+ISSS ’11, (New York, New York, USA), p. 69, ACM Press, 2011.
[9] G. Georgakos, U. Schlichtmann, R. Schneider, and S. Chakraborty, “Reliability challenges for electric vehicles,” in Proc. 50th Annu. Des. Autom. Conf. - DAC ’13, (New York, New York, USA), p. 1, ACM Press, 2013.
[10] D. Lorenz, M. Barke, and U. Schlichtmann, “Efficiently analyzing the impact of aging effects on large integrated circuits,” Microelectron. Reliab., vol. 52, pp. 1546–1552, aug 2012.
[11] Z. Sauli, V. Retnasamy, S. Taniselass, A. H. Shapri, R. M. Hatta, and M. H. Aziz, “Polymer core BGA vertical stress loading analysis,” Proc. Int. Conf. Comput. Intell. Model. Simul., vol. 129, pp. 148–151, 2012.
[12] S. Rehman, Reliable Software for Unreliable Hardware – A Cross-Layer Approach. Doctoral dissertation, Karlsruhe Institute of Technology (KIT), 2015.
[13] A. S. Tanenbaum and M. Van Steen, Distributed systems: principles and paradigms. Prentice-Hall, 2007.
[14] S. P. Singh and S. C. Sharma, “A survey on cluster based routing protocols in wireless sensor networks,” Procedia Comput. Sci., vol. 45, no. C, pp. 687–695, 2015.
[15] S. A. Sofi and R. N. Mir, “Natural algorithm based adaptive architecture for underwater wireless sensor networks,” Proc. 2017 Int. Conf. Wirel. Commun. Signal Process. Networking, WiSPNET 2017, vol. 2018-Janua, pp. 2343–2346, 2018.
[16] S. Tyagi and N. Kumar, “A systematic review on clustering and routing techniques based upon LEACH protocol for wireless sensor networks,” J. Netw. Comput. Appl., vol. 36, no. 2, pp. 623–645, 2013.
[17] Y. Noh, U. Lee, S. Lee, P. Wang, L. F. Vieira, J. H. Cui, M. Gerla, and K. Kim, “HydroCast: Pressure routing for underwater sensor networks,” IEEE Trans. Veh. Technol., vol. 65, no. 1, pp. 333–347, 2016.
[18] M. Asim, H. Mokhtar, and M. Merabti, “A fault management architecture for wireless sensor network,” IWCMC 2008 - Int. Wirel. Commun. Mob. Comput. Conf., pp. 779–785, 2008.
[19] S. Jiang, “State-of-the-Art Medium Access Control (MAC) Protocols for Underwater Acoustic Networks: A Survey Based on a MAC Reference Model,” IEEE Commun. Surv. Tutorials, vol. 20, no. 1, pp. 96–131, 2018.
[20] C. H. Wu, K. C. Lee, and Y. C. Chung, “A Delaunay Triangulation based method for wireless sensor network deployment,” Comput. Commun., vol. 30, no. 14-15, pp. 2744–2752, 2007.
[21] V. Isler, S. Kannan, and K. Daniilidis, “Sampling Based Sensor-Network Deployment,” pp. 1780–1785, 2004.
[22] Z. Cheng, M. Perillo, and W. B. Heinzelman, “General network lifetime and cost models for evaluating sensor network deployment strategies,” IEEE Trans. Mob. Comput., vol. 7, no. 4, pp. 484–497, 2008.
[23] V. P. Bhuvana, C. Preissl, A. M. Tonello, and M. Huemer, “Multi-Sensor Information Filtering with Information-Based Sensor Selection and Outlier Rejection,” IEEE Sens. J., vol. 18, pp. 2442–2452, mar 2018.
[24] N. Goyal, M. Dave, and A. K. Verma, “Data aggregation in underwater wireless sensor network: Recent approaches and issues,” J. King Saud Univ. - Comput. Inf. Sci., 2017.
[25] J. Kohnstamm and D. Madhub, “Mauritius Declaration,” pp. 1–2, 2014.
[26] F. Campagnaro, R. Francescon, F. Guerra, F. Favaro, P. Casari, R. Diamant, and M. Zorzi, “The DESERT underwater framework v2: Improved capabilities and extension tools,” 3rd Underw. Commun. Netw. Conf. Ucomms 2016, 2016.
[27] C. Petrioli, R. Petroccia, J. R. Potter, and D. Spaccini, “The SUNSET framework for simulation, emulation and at-sea testing of underwater wireless sensor networks,” Ad Hoc Networks, vol. 34, pp. 224–238, 2015.
[28] R. Petroccia and D. Spaccini, “Comparing the SUNSET and DESERT frameworks for in field experiments in underwater acoustic networks,” Ocean. 2013 MTS/IEEE Bergen Challenges North. Dimens., 2013.
[29] G. Ateniese, A. Capossele, P. Gjanci, C. Petrioli, and D. Spaccini, “SecFUN: Security framework for underwater acoustic sensor networks,” MTS/IEEE Ocean. 2015 - Genova Discov. Sustain. Ocean Energy a New World, 2015.
[30] N. P. Carter, H. Naeimi, and D. S. Gardner, “Design techniques for cross-layer resilience,” 2010 Des. Autom. Test Eur. Conf. Exhib. (DATE 2010), no. February, pp. 1023–1028, 2010.
[31] T.-H. Liu, S.-C. Yi, and X.-W. Wang, “A fault management protocol for low-energy and efficient Wireless sensor networks,” J. Inf. Hiding Multimed. Signal Process., vol. 4, no. 1, pp. 34–45, 2013.
[32] J. D. Case, M. Fedor, M. L. Schoffstall, and J. Davin, “Simple network management protocol (SNMP),” tech. rep., 1990.
[33] H. S. Gunawi, T. Do, J. M. Hellerstein, I. Stoica, D. Borthakur, and J. Robbins, “Failure as a service (faas): A cloud service for large-scale, online failure drills,” Electr. Eng. Comput. Sci., pp. 1–8, 2011.
[34] Intel, “Intelligent Platform Management Interface,” 2004.
[35] M. R. Napolitano, C. Neppach, V. Casdorph, S. Naylor, M. Innocenti, and G. Silvestri, “Neural-network-based scheme for sensor failure detection, identification, and accommodation,” J. Guid. Control. Dyn., vol. 18, pp. 1280–1286, nov 1995.
[36] C. Neti, M. H. Schneider, and E. D. Young, “Maximally Fault Tolerant Neural Networks,” IEEE Trans. Neural Networks, vol. 3, no. 1, pp. 14–23, 1992.
[37] G. Benincasa, G. D’Aniello, C. De Maio, V. Loia, and F. Orciuoli, “Towards perception-oriented situation awareness systems,” Adv. Intell. Syst. Comput., vol. 322, pp. 813–824, 2014.
[38] M. Compton, P. Barnaghi, L. Bermudez, R. García-Castro, O. Corcho, S. Cox, J. Graybeal, M. Hauswirth, C. Henson, A. Herzog, V. Huang, K. Janowicz, W. D. Kelsey, D. Le Phuoc, L. Lefort, M. Leggieri, H. Neuhaus, A. Nikolov, K. Page, A. Passant, A. Sheth, and K. Taylor, “The SSN ontology of the W3C semantic sensor network incubator group,” J. Web Semant., vol. 17, pp. 25–32, 2012.
[39] G. D’Aniello, A. Gaeta, and F. Orciuoli, “Artificial bees for improving resilience in a sensor middleware for Situational Awareness,” TAAI 2015 - 2015 Conf. Technol. Appl. Artif. Intell., pp. 300–307, 2016.
[40] E. Fadel, V. C. Gungor, L. Nassef, N. Akkari, M. G. Abbas Malik, S. Almasri, and I. F. Akyildiz, “A survey on wireless sensor networks for smart grid,” Comput. Commun., vol. 71, pp. 22–33, 2015.
[41] V. Cristea, C. Dobre, F. Pop, C. Stratan, A. Costan, C. Leordeanu, and E. Tirsa, “A dependability layer for large-scale distributed systems,” Int. J. Grid Util. Comput., vol. 2, no. 2, p. 109, 2011.
[42] S. Mitra, “Comparative Study of Fault Recovery Techniques in Wireless Sensor Network,” no. December, pp. 19–21, 2016.
[43] I. Salera, A. Agbaria, and M. Eltoweissy, “Fault-tolerant mobile sink in networked sensor systems,” 2006 2nd IEEE Work. Wirel. Mesh Networks, WiMESH 2006, pp. 106–108, 2007.
[44] A. DeHon, H. M. Quinn, and N. P. Carter, “Vision for cross-layer optimization to address the dual challenges of energy and reliability,” Proc. -Design, Autom. Test Eur. DATE, no. March, pp. 1017–1022, 2010.
[45] J. Xu, S. Chen, Z. Kalbarczyk, and R. K. Iyer, “An experimental study of security vulnerabilities caused by errors,” Proc. Int. Conf. Dependable Syst. Networks, pp. 421–430, 2001.
[46] M. Veleski, R. Kraemer, and M. Krstic, “An Overview of Cross-Layer Resilience Design Methods,” in Fifth Int. Conf. Radiat. Appl. Var. Fields Res., 2017.
[47] S. Mitra, K. Brelsford, and P. N. Sanda, “Cross-layer resilience challenges: Metrics and optimization,” 2010 Des. Autom. Test Eur. Conf. Exhib. (DATE 2010), no. March 2010, pp. 1029–1034, 2010.
[48] J. Henkel, L. Bauer, H. Zhang, S. Rehman, and M. Shafique, “Multi-Layer Dependability: From Microarchitecture to Application Level,” Proc. 51st Annu. Des. Autom. Conf. Des. Autom. Conf., pp. 47:1–47:6, 2014.
[49] J. Xu, K. Li, and G. Min, “Reliable and energy-efficient multipath communications in underwater sensor networks,” IEEE Trans. Parallel Distrib. Syst., vol. 23, no. 7, pp. 1326–1335, 2012.
[50] N. Z. Zenia, M. Aseeri, M. R. Ahmed, Z. I. Chowdhury, and M. Shamim Kaiser, “Energy-efficiency and reliability in MAC and routing protocols for underwater wireless sensor network: A survey,” J. Netw. Comput. Appl., vol. 71, pp. 72–85, 2016.
[51] C. Lal, R. Petroccia, M. Conti, and J. Alves, “Secure underwater acoustic networks: Current and future research directions,” 3rd Underw. Commun. Netw. Conf. Ucomms 2016, 2016.
[52] A. P. Das and S. M. Thampi, “Fault-resilient localization for underwater sensor networks,” Ad Hoc Networks, vol. 55, pp. 132–142, 2017.
[53] N. Mohamed, I. Jawhar, J. Al-Jaroodi, and L. Zhang, “Sensor network architectures for monitoring underwater pipelines,” Sensors, vol. 11, no. 11, pp. 10738–10764, 2011.