SlideShare a Scribd company logo
1 of 119
Download to read offline
Resilience in Cyber-Physical Systems: 
Challenges and Opportunities 
Gabor Karsai 
Institute for Software-Integrated Systems 
Vanderbilt University 
SERENE 2014 – Autumn School
Acknowledgements 
 People: Janos Sztipanovits, Daniel Balasubramanian, 
Abhishek Dubey, Tihamer Levendovszky, Nag Mahadevan, 
and many others at the Institute for Software-Integrated 
Systems @ Vanderbilt University 
 Sponsors: AFRL, DARPA, NASA, NSF through various 
programs
Outline 
 Introduction 
 Cyber-physical Systems 
 Resilience 
 Building resilient CPS 
 System-level fault diagnostics 
 Software health management 
 Resilient architectures and autonomy 
 Conclusions
Cyber-Physical Systems
What is a Cyber-Physical System? 
 An engineered system that integrates physical and cyber 
components where relevant functions are realized 
through the interactions between the physical and cyber 
parts. 
 Physical = some tangible, physical device + environment 
 Cyber = computational + communicational
Courtesy of Kuka Robotics Corp. 
Cyber-Physical Systems (CPS): 
Integrating networked computational 
resources with physical systems 
E-Corner, Siemens 
Courtesy of Doug Schmidt 
Power 
generation and 
distribution 
Courtesy of 
General Electric 
Military systems: 
Transportation 
(Air traffic 
control at 
Avionics SFO) 
Telecommunications 
Factory automation 
Instrumentation 
(Soleil Synchrotron) 
Daimler-Chrysler 
Automotive 
Building Systems 
Courtesy of Ed Lee, UCB
CPS Examples
CPS Examples
CPS Challenge Problem: Prevent This
A Typical Cyber-Physical System 
Printing Press 
• Application aspects 
• local (control) 
• distributed (coordination) 
• global (modes) 
• Ethernet network 
• Synchronous, Time-Triggered 
• IEEE 1588 time-sync protocol 
• High-speed, high precision 
• Speed: 1 inch/ms (~100km/hr) 
• Precision: 0.01 inch 
Bosch-Rexroth -> Time accuracy: 10us 
Courtesy of Ed Lee, UCB
Example – Flying Paster 
Source: http://offsetpressman.blogspot.com/2011/03/how-flying-paster-works.html 
Courtesy of Ed Lee, UCB 
S.e n so r top dead center 
Active 
paper 
feed 
Paper 
cutt,er 
Idle roller 
!Flyi ng 
R,.~_$J.fil:. 
Drive roll,er 
Dancer 
Idle roller 
!D rive roller 
~--------------------------------------------------------------------------------------------------------------------------
Source: http://offsetpressman.blogspot.com/2011/03/how-flying-paster-works.html 
Flying Paster 
Courtesy of Ed Lee, UCB
Example: Medical Devices 
Emerging direction: Cell phone 
based medical devices for 
affordable healthcare 
e.g. “Telemicroscopy” project 
at Berkeley 
e.g. Cell-phone based blood 
testing device developed at 
UCLA 
Courtesy of Ed Lee, UCB
Example: Toyota autonomous vehicle technology 
roadmap, c. 2007 
Source: Toyota Web site
DARPA Robotics Challenge 
 http://www.theroboticschallenge.org/
The Good News… 
Networking and computing delivers unique precision and flexibility in 
interaction and coordination 
Computing/Communication Integrated CPS 
 Rich time models 
 Precise interactions across highly 
extended spatial/temporal 
dimension 
 Flexible, dynamic communication 
mechanisms 
 Precise time-variant, nonlinear 
behavior 
 Introspection, learning, reasoning 
 Elaborate coordination of 
physical processes 
 Hugely increased system size 
with controllable, stable 
behavior 
 Dynamic, adaptive architectures 
 Adaptive, autonomic systems 
 Self monitoring, self-healing 
system architectures and better 
safety/security guarantees.
…and the Challenges 
Fusing networking and computing with physical processes brings new 
Computing/Communication Integrated CPS 
 Cyber vulnerability 
 New type of interactions across 
highly extended spatial/temporal 
dimension 
 Flexible, dynamic communication 
mechanisms 
 Precise time-variant, nonlinear 
behavior 
 Introspection, learning, reasoning 
 Physical behavior of systems 
can be manipulated 
 Lack of composition theories for 
heterogeneous systems: much 
unsolved problems 
 Vastly increased complexity 
and emergent behaviors 
 Lack of theoretical foundations 
for CPS dynamics 
 Verification, certification, 
predictability has fundamentally 
new challenges. 
problems
Abstraction layers allow 
the verification of 
different properties . 
Key Idea: Manage design complexity by creating abstraction 
layers in the design flow. 
Abstraction layers define 
platforms. 
Physical Platform 
Software Platform 
Computation/Communication Platform 
Abstractions are linked 
through mapping. 
Claire Tomlin, UC Berkeley 
Example for a CPS Approach
Abstraction layers and models: 
Real-time Software 
Sifakis at al: “Building Models of Real-Time 
Systems from Application Software,” 
Proceedings of the IEEE Vol. 91, No. 1. pp. 
100-111, January 2003 
Software models 
  T Out f T In  :  2 
correctness: implementation 
Real-time system models 
In CPS, essential system properties 
such as stability, safety, 
performance are expressed in 
terms of physical behavior 
• f 
: reactive program. Program execution 
creates a mapping between logical-time 
inputs and outputs. 
• f 
: real-time system. Programs are 
R packaged into interacting components. 
Scheduler control access to computational 
and communicational resources according 
to time constraints P 
timing analysis (P) 
, ( ( )) ( ( )) out    R in  E  f  f  
  T Out 
f T In R  :  2 
R R 
E f P R   ,   (), (, )
Abstraction layers and models: 
Cyber-Physical Systems 
Physical models 
  T Out 
p T In R  :  2   T Out 
R R 
f T In R  ; :  2 
R R 
implementation 
Software models 
  T Out f T In  :  2 
correctness: implementation 
Real-time system models 
Re-defined Goals: 
• Compositional verification of 
essential dynamic properties 
− stability 
− safety 
• Derive dynamics offering 
robustness against 
implementation changes, 
uncertainties caused by faults 
and cyber attacks 
− fault/intrusion induced 
reconfiguration of SW/HW 
− network uncertainties 
(packet drops, delays) 
• Decreased verification 
complexity 
timing analysis (P) 
, ( ( )) ( ( )) out    R in  E  f  f  
  T Out 
f T In R  :  2 
R R 
E f P R   ,   (), (, )
Why is CPS Hard? 
Software Control Systems 
package org.apache.tomcat.session; 
import org.apache.tomcat.core.*; 
import org.apache.tomcat.util.StringManager; 
import java.io.*; 
import java.net.*; 
import java.util.*; 
import javax.servlet.*; 
import javax.servlet.http.*; 
/** 
* Core implementation of a server session 
* 
* @author James Duncan Davidson [duncan@eng.sun.com] 
* @author James Todd [gonzo@eng.sun.com] 
*/ 
public class ServerSession { 
private StringManager sm = 
StringManager.getManager("org.apache.tomcat.session"); 
private Hashtable values = new Hashtable(); 
private Hashtable appSessions = new Hashtable(); 
private String id; 
private long creationTime = System.currentTimeMillis();; 
private long thisAccessTime = creationTime; 
private long lastAccessed = creationTime; 
private int inactiveInterval = -1; 
ServerSession(String id) { 
this.id = id; 
} 
public String getId() { 
return id; 
} 
public long getCreationTime() { 
return creationTime; 
} 
public long getLastAccessedTime() { 
return lastAccessed; 
} 
public ApplicationSession getApplicationSession(Context context, 
boolean create) { 
ApplicationSession appSession = 
(ApplicationSession)appSessions.get(context); 
if (appSession == null && create) { 
// XXX 
// sync to ensure valid? 
appSession = new ApplicationSession(id, this, context); 
appSessions.put(context, appSession); 
} 
// XXX 
// make sure that we haven't gone over the end of our 
// inactive interval -- if so, invalidate and create 
// a new appSession 
return appSession; 
} 
void removeApplicationSession(Context context) { 
appSessions.remove(context); 
} 
/** 
* Called by context when request comes in so that accesses and 
* inactivities can be dealt with accordingly. 
*/ 
void accessed() { 
// set last accessed to thisAccessTime as it will be left over 
// from the previous access 
lastAccessed = thisAccessTime; 
thisAccessTime = System.currentTimeMillis(); 
} 
void validate() 
Crosses Interdisciplinary Boundaries 
• Disciplinary boundaries need to be realigned 
• New fundamentals need to be created 
• New technologies and tools need to be developed 
• Education need to be restructured
Resilience
Cyber-Physical Systems: 
Software Intensive Systems 
 Embedded software …. 
 is a crucial ingredient in modern 
systems 
 is the ‘universal system integrator’ 
 could exhibit faults that lead to 
system failures 
 complexity has progressed to the 
point that zero-defect systems 
(containing both hardware and 
software) are very difficult to build 
 need to evolve while in operation 
The challenge is to build software intensive systems that 
anticipate change: uncertain environments, faults, updates, and 
exhibit resilience: they survive and adapt to changes, while 
being dependably functional.
Resilience 
 Webster: 
 Capable of withstanding shock without permanent deformation or 
rupture 
 Tending to recover from or adjust easily to misfortune or change. 
 Technical: 
 The persistence of the avoidance of failures that are unacceptably 
frequent or severe, when facing changes. [Laprie, ‘04] 
 A resilient system is trusted and effective out of the box in a wide 
range of contexts, and easily adapted to many others through 
reconfiguration or replacement. [R. Neches, OSD] 
 Resilient system detects anomalies in itself, diagnoses 
its causes, and is able to recover lost functionality. 
Research issues 
•Model-driven engineering of Resilient Software Systems 
•Design-time + Run-time aspects 
•Resilience to: (1) faults, (2) environmental changes, (3) updates 
•Target system category: distributed, real-time, embedded systems 
Objective: Model-based 
engineering approach 
and tools to build 
verifiably resilient 
systems
The definition…
Building Resilient Cyber-Physical 
Systems
Refinement/Compilation 
Abstraction 
Computational Interaction 
Physical Interaction 
Cyber-Physical Systems 
Layers and Interactions 
Implementation Implementation 
Platform Layer 
Physical 
Object 
Physical Layer 
Cyber-Physical 
Object 
Physical 
Physical Object 
Object 
Cyber-Physical 
Object 
Computational 
Object 
Computational 
Object 
Computational 
Object 
Computational 
Object 
Computational 
Object 
Communication Platform Computational 
Platform 
Computational 
Platform 
Fault 
Fault 
Fault 
Fault
Cyber-Physical Systems 
Faults and resilience 
 In CPS faults can appear in (and cascade to) any place 
 Physical system 
 Hardware (computing and communication) system 
 Software (application and platform) system 
 In CPS physical and cyber elements are integrated 
 Many interaction pathways: P2P, P2C, C2C, P2C2P, C2P2P2C 
 Many modeling paradigms for physical systems 
 Consider engineering or physics! 
 Heterogeneous models need to be integrated for detailed analysis 
 In CPS recovery can take many forms 
 Physical action 
 Cyber restart 
 Software adaptation
CPS and Model-based Design 
Design of CPS layers via MDE 
 Software models 
 Platform models 
 Physical models 
Challenge: How to integrate the models so that cross-domain 
interactions can be understood and managed?
A Strategy for Resilient CPS 
 Overall scheme: 
 Faults can originate in any layer of a hierarchy, in any component 
 Anomalies caused by the fault can be detected in the same or a higher layer 
 Based on anomalies a fault source isolation (diagnosis) is performed. The 
diagnosis result may be reported to a higher layer, depending on the nature of 
the fault. 
 The fault is locally mitigated first, but when that mitigation fails the higher layer is 
informed about the anomaly, the diagnosed fault, and the mitigation action taken. 
 High-level view: Fault management is a control problem. 
 Faults are disturbances in the system whose effects prevent the system to 
provide the required service/s 
 Anomalies are the sensory inputs, mitigation actions are the actuators of the 
fault management system 
 Fault mitigation must happen by considering (1) the current functional goals and 
(2) the actual state of the system, on the right level of abstraction
A Strategy for Resilient CPS 
Layered fault management 
 Concepts: 
1. Faults propagate to neighboring layers via 
guaranteed behaviors 
2. Each layer includes pro-active and reactive fault 
management mechanisms 
 Each layer provides a ‘fault reporting’ and 
‘fault management’ interface 
 Fault management services are built into the 
‘middleware’: 
 Temporal/spatial Isolation 
 Fault Tolerant Clock Sync 
 Time-triggered Communications 
 Group Communication and Transactions 
 Fault-tolerant Resource Sharing 
 Component/Service Migration 
 Primary/Backup 
 Replication 
 Autonomous Failure Management 
 Safe Dynamic Composition of Components
System-level Fault Diagnostics
The need for resilience 
In complex systems even simple 
failures lead to complex cascades of 
events that are difficult to understand 
and manage. 
How to 
•detect and isolate faults? 
•react to faults to mitigate their effect?
FACT: 
A model-driven toolsuite for system-level diagnostics 
Visual modeling tool for creating: 
•System architecture models 
•Timed failure propagation graph models 
Run-time Platform (RTOS) 
Modular run-time environment contains: 
•Monitors detect anomalies in sensor data 
and track mode changes 
•TFPG Diagnostics Engine performs 
diagnosis and isolates the source(s) of 
observed anomalies 
•Reports are generated for operators and 
maintainers 
Modules can be used standalone on an 
embedded target processor with an RTOS 
TFPG DIAGNOSTICS ENGINE 
MONITORS 
OPERATOR 
MAINTAINER
Modeling Language 
Temporal Failure Propagation Graphs 
•Failure modes 
•Discrepancies 
•Monitors/Alarms 
•Propagation links with: 
•Time delay 
•Mode 
Fault model: 
Known physical failure modes whose functional effects (discrepancies) are 
monitored. 
Diagnostic problem: 
Given a set of active monitors and their temporal activation sequence, 
which failure mode(s) explain the observations? 
A causal network-like 
model describing how 
component failure effects 
propagate across the 
system activating 
monitors. 
Failure propagation links 
and monitors could be 
mode-dependent.
Modeling Language 
Temporal Failure Propagation Graphs 
Modeling variants 
•Untimed, causal network (no modes, propagation = [0..inf]) 
•Modal networks: edges are mode dependent 
•Timed models 
•Hierarchical component models 
Nodes: 
•Failure modes 
•Discrepancies 
• AND/OR 
• Monitored (option) 
Edges: 
•Propagation delay: [min, max] 
•Discrete Modes (activation) 
Example models (#components,#failuremodes,#alarms) 
•Trivial examples 
•Simplified fuel system (~30,~80,~100) 
•Realistic fuel system (~200,~400,~600) 
•Aircraft avionics (~2000,~8000,~25000) – generated
TFPG Example 
Example 
Not shown: 
- Timing on propagation links 
- Components/hierarchy 
- Modal propagation 
TFPG captures cause-effect 
relationships that can be modal 
and temporal. Effects may be 
cumulative and/or monitored. 
Legend 
Component
Timed Failure Propagation Graphs 
 Causal models that describe the system behavior in 
presence of faults. 
 Model is a labeled directed graph where 
 Nodes represent either failure modes or discrepancies 
 Edges between nodes in the graph represent causality 
 Edges are attributed with timing and mode constraints on failure 
propagation. 
 A discrepancy can be either monitored unmonitored. 
 The monitor detects a sensory manifestation of an anomaly and 
generates alarms. 
Failure Cascades 
Propagation 
links 
Alarm 
Allocation 
Failure Modes Discrepancies Alarms
TFPG Example 
Rocket Engine
TFPG Example 
Propellant Tank
TFPG Example 
Turbo Pump
TFPG Example 
Combustion Chamber
TFPG Reasoning 
On-line diagnostics: 
Input: Sequence of alarms and mode changes 
Output: Sequence of sorted and ranked hypotheses containing failure mode(s) 
that explain the observations (alarms, mode changes)
TFPG Hypothesis 
 TFPG Hypothesis: estimation of the current system state. 
 Directly, points to failure modes that “best” explain the 
current set of observed alarms. 
 Indirectly, points to failed monitored discrepancies; those 
with a state that is inconsistent with the (hypothesized) state 
of the failure modes 
 Structure 
 List of possible Failure Modes 
 List of alarms in each set ( Consistent (C)/ Inconsistent 
(IC)/Missing (M) / Expected (E)) 
 Metrics : Plausibility/ Robustness/ Failure Rate
Hypotheses Evaluation Metrics 
Hypotheses are evaluated based on the following 
metrics: 
 Plausibility: reflects the support of a hypothesis based on the 
current observed alarm state. It answers the question: Which 
hypothesis to consider? 
 Robustness: reflects the potential of a hypothesis (evidence) 
to change based on remaining alarms. It answers the question: 
When to take an action? 
 Failure Rate: is a measure of how often a particular failure 
mode has occurred in the past.
Run-time System 
Diagnostics Engine 
 Algorithm outline: 
 Check if new evidence is explained by 
current hypotheses. 
 If not, create a new hypothesis that 
assumes a hypothetical state of the 
system consistent with observations 
 Rank hypotheses for plausibility and 
robustness 
 Discard low-rank hypotheses, keep 
plausible ones 
 Fault state: ‘total state vector’ of the 
system, i.e. all failure modes and 
discrepancies 
 Alarms could be 
 Missing: should have fired but did not 
 Inconsistent: fired, but it is not consistent 
with the hypothesis 
 Robust diagnostics: tolerates missing and 
inconsistent alarms 
 Metrics: 
Plausibility: how plausible is the 
hypothesis w.r.t alarm consistency 
Robustness: how likely is that the 
hypothesis will change in the future
Run-time System 
Diagnostics Engine 
 Novel properties: 
 Multi-fault hypothesis is the default 
 Fault state == State of all failure 
modes/discrepancies 
 Reasoner works with sets of failure modes 
(instead of individual failure modes) 
 Robust algorithm: can tolerate 
missing/inconsistent alarms 
 Parsimony principle: Use simplest 
explanation 
 Time-dependent diagnosis 
 Reasoner can be asked to recompute 
diagnosis upon the advance of time 
 Extensions: 
 Modal edges: Propagation happens only if 
edge is enabled (controlled by system 
mode) 
 Diagnosis takes into consideration the last 
propagation effect 
Non-monotonic alarms: 
 Alarm retraction triggers a re-computation 
of the diagnosis
Run-time System 
Discrete (TFPG) Diagnostics 
 Additional capabilities: 
 Intermittent failure modes 
 Consequence: alarm/s change to ‘Off’ 
 Assumption: low frequency intermittents 
 Upon alarm changing to ‘Off’, backtrack to 
last change to ‘On’ and re-evaluate 
 Maintain alternate branches (for alarms ‘On’ 
and ‘Off’) 
 Test alarms: can be considered only 
after activation 
 If inactive, it is an un-monitored 
discrepancy. 
 If activated, it is used but timing may be 
inconsistent (re: parent’s timing) 
 Metrics summary: 
Plausibility: 
Robustness:
Performance Evaluation 
 For n failure modes and m discrepancies, maximum number of 
hypotheses is nm but more likely to be O(n). 
 Updating hypothesis is polynomial with the number of nodes and 
exponential w.r.t sensor faults. 
Model #C #FM #D #A #M #P #R Avg. 
Time (sec) 
#1 15 36 48 21 0 120 1 0.000311 
#2 11 36 120 174 27 3 1 0.000445 
#3 153 481 1973 270 9 3409 1 0.013589 
#4 24 64 116 116 0 695 4 0.016 
#5 21 100 282 69 0 431 18 0.00288 
• Keys: #C – Number of Components / #FM – Failure Modes/ #D – Discrepancies/ #A – Alarms/ #M – 
Modes/ #P – Propagation links, #R – Regions 
• Avg. Time = Average Computational Time taken by the reasoner (in seconds) after every event on 
2.67GHz Intel Xeon® CPU, 8 GB RAM.
Tool Operations 
1. Modeling 
2. Desktop experimentation, 
validation 
3. Feedback 
4. Deployment on 
embedded platform 
Model 
Interpretation
Software Health-Management
Motivation: Software as Failure Source? 
Qantas 72 - Oct 7, 2008 – A330 (Australia) – ATSB Report 
At 1240:28, while the aircraft was cruising at 37,000 ft, the autopilot disconnected. From about 
the same time there were various aircraft system failure indications. At 1242:27, while the 
crew was evaluating the situation, the aircraft abruptly pitched nose-down. The aircraft reached a 
maximum pitch angle of about 8.4 degrees nose-down, and descended 650 ft during the event. 
After returning the aircraft to 37,000 ft, the crew commenced actions to deal with multiple 
failure messages. At 1245:08, the aircraft commenced a second uncommanded pitch-down event. 
The aircraft reached a maximum pitch angle of about 3.5 degrees nose-down, and descended 
about 400 ft during this second event. At 1249, the crew made a PAN urgency broadcast to air 
traffic control, and requested a clearance to divert to and track direct to Learmonth. At 1254, 
after receiving advice from the cabin of several serious injuries, the crew declared a MAYDAY. 
The aircraft subsequently landed at Learmonth at 1350. 
The investigation to date has identified two significant safety factors related to the pitch-down 
movements. Firstly, immediately prior to the autopilot disconnect, one of the air data 
inertial reference units (ADIRUs) started providing erroneous data (spikes) on 
many parameters to other aircraft systems. The other two ADIRUs continued to 
function correctly. Secondly, some of the spikes in angle of attack data were not 
filtered by the flight control computers, and the computers subsequently commanded 
the pitch-down movements. 
http://www.atsb.gov.au/publications/investigation_reports/2008/AAIR/pdf/AO2008070_interim.pdf
Understanding the Problem 
Embedded software is a complex engineering artifact that can have latent 
faults, uncaught by testing and verification. Such faults become apparent 
during operation when unforeseen modes and/or (system) faults appear. 
The problem: 
 General: How to construct a Software Health Management system that 
detects such faults, isolates their source/s, prognosticates their progression, 
and takes mitigation actions in the system context? 
 Specific: How to specify, design, and implement such a system using a model-based 
framework? 
The larger picture: 
 General: Software Health Management must be integrated with System 
Health Management – ‘Software Health Effects’ must be understood on the 
System (Vehicle) Level.
What is ‘Systems Health Management’ ? 
The ‘on-line’ view: 
1. Detection of anomalies in system or component behavior 
2. Identification and isolation of the fault source/s 
3. Prognostication of impending faults that could lead to system failures 
4. Mitigation of current or impending fault effects while preserving mission objective/s 
Reports 
Observations Corrections 
Detection 
Isolation 
Prognostics 
Mitigation 
Examples: 
- Automotive OBD (detection) 
- Boeing 777 CMC (detection + isolation) 
- Spacecraft fault protection (detection + isolation + mitigation) 
- Aircraft fleet (detection + isolation + prognostics)
Software Health Management 
 Software is a complex 
engineering artifact. 
 Software can have latent faults. 
 Faults appear during operation 
when unforeseen modes or 
interactions happen. 
 Techniques like Voting and Self- 
Checking pairs have 
shortcomings 
 Common mode faults 
 Fault cascades 
• SHM is the extension of FDIR 
techniques used in Physical systems to 
Software. 
Stimuli Responses 
Fault mitigation 
Fault detection 
Environmental 
Assumptions 
Observed 
Behavior 
Domain 
Assumptions 
Observed Inputs 
Fault isolation
Why ‘Software Health Management’? 
 Complexity of systems necessitates an additional layer ‘above’ SFT that 
manages ‘Software Health’ 
 Embedded software …. 
 is a crucial ingredient in aerospace systems 
 is a method for implementing functionality 
 is the ‘universal system integrator’ 
 could exhibit faults that lead to system failures 
 complexity has progressed to the point that zero-defect systems (containing 
both hardware and software) are very difficult to build 
 Systems Health Management is an emerging field that addresses precisely 
this problem: How to manage systems’ health in case of faults ? 
 ‘Software Health Management’ is not… 
 A replacement for existing and robust engineering processes and standards 
(DO-178B) 
 A substitute for hardware- and software fault tolerance 
 An ‘ultimate’ solution for fault tolerance
Software Health Management 
Key ideas 
 Use software components as units of fault management: detection, diagnosis, 
and mitigation 
 Components must be observable, provide fault isolation, and be capable of mitigation 
 Use a two-level architecture: 
 Component level: detect anomalies and mitigate locally 
 System level: received anomaly reports, isolate faulty component(s), and mitigate 
on the component 
 Use models to represent 
 anomalous conditions 
 fault cascades 
 mitigation actions (when / what) 
 Use model-based generators to synthesize code artifacts 
 Developer can use higher-level abstractions to design and implement the 
software health management functions of a system
Software Component Framework 
The Component Model should enable: 
 Monitoring 
 Interfaces (synchronous/asynchronous calls) 
 Component state 
 Scheduling and timing (WCET) 
 Resource usage 
 Anomaly Detection via: 
 Pre/post conditions over call parameters, rates, and component state 
 Conditions over (1) timing properties, (2) resource usage (e.g. memory footprint), and (3) 
usage patterns 
 Combinations of the above 
 Mitigation: 
 Given detected anomaly and state of the component take action 
 Can be time- or event-triggered 
 Actions: restart, initialize, block call, inject value, inject call, release resource, modify state; 
checkpoint/restore, combination of the above
Notional Component Model 
Parameter 
Component 
Resource Trigger 
Subscribe 
(Event) 
Publish 
(Event) 
Provided 
(Interface) 
Required 
(Interface) 
State 
A component is a unit (containing potentially many objects). The component is parameterized, has 
state, it consumes resources, publishes and subscribes to events, provides interfaces to 
and requires interfaces from other components. 
Publish/Subscribe: Event-driven, asynchronous communication (publisher does not wait) 
Required/Provided: Synchronous communication using call/return semantics. 
Triggering can be periodic or sporadic. 
Extension of a Component Model defined by OMG (CCM) : state, resource, trigger interfaces.
Example: Component Interactions 
Sampler 
Component GPS 
Component 
Display 
Component 
P 
S 
S 
Components can interact via asynchronous/event-triggered and synchronous/call-driven connections. 
Example: The Sampler component is triggered periodically and it publishes an event upon each 
activation. The GPS component subscribes to this event and is triggered sporadically to obtain 
GPS data from the receiver, and when ready it publishes its own output event. The Display 
component is triggered sporadically via this event and it uses a required interface to retrieve the 
position data from the GPS component.
Component Monitoring 
Component 
Monitor arriving 
events 
Monitor incoming 
calls 
Monitor published 
events 
Monitor outgoing 
calls 
Observe state 
Monitor resource 
usage 
Monitor control flow/ 
triggering
ACM: 
The ARINC Component Model 
 Provide a CCM-like layer on top of ARINC-653 abstractions 
 Notional model: 
 Terminology: 
 Synchronous: call/return 
 Asynchronous: publish-return/trigger-process 
 Periodic: time-triggered 
 Aperiodic: event-triggered 
 Note: 
 All component interactions are realized via the framework 
 Process (method) execution time has deadline, which is monitored
ACM: 
The ARINC Component Model 
 Each ‘input interface’ has its own process 
 Process must obtain read-write/lock on component 
 Asynchronous publisher (subscriber) interface: 
 Listener (publisher) process 
 Pushes (receives) one event (a struct), with a validity flag 
 Can be event-triggered or time-triggered (i.e. 4 variations) 
 Synchronous provided (required) interface: 
 Handles incoming synchronous RMI call 
 Forwards outgoing synchronous RMI call 
 Other interfaces: 
 State: to observe component state variables 
 Resource: to monitor resource usage 
 Trigger: to monitor execution timing
ACM: 
A Prototype Implementation 
 ARINC-653 Emulator 
 Emulates APEX services using Linux API-s 
 Partition  Process, Process  Thread 
 Module manager: schedules partition set 
 Partition level scheduler: schedules threads within partition 
 CORBA foundation 
 CCM Implementation 
 No modification 
 ACM component interactions 
 Mainly implemented via APEX 
 RMI interactions use threads
Implementation: Mapping ACM to APEX 
APEX - Abstractions Platform (Linux) 
Module Host/Processor 
Partition Process 
Process Thread 
ACM: APEX Component Model APEX APEX Concept Used 
Component method Periodic Periodic process Process start, stop 
Semaphores 
Sporadic Aperiodic process 
Invocation Synchronous 
Call-Return 
Periodic 
Target 
Co-located N/A 
Non-co-located N/A 
Sporadic 
Target 
Co-located Caller method signals callee to release 
then waits for callee until completion. 
Event, Blackboard 
Non-co-located Caller method sends RMI (via CM) to 
release callee then waits for RMI to 
complete. 
TCP/IP, Semaphore, 
Event 
Asynchronous 
Publish-Subscribe 
Periodic 
Target 
Co-located Callee is periodically triggered and polls 
‘event buffer’ – validity flag indicates 
whether data is stale or fresh 
Blackboard 
Non-co-located Sampling port, Channel 
Sporadic 
Target 
Co-located Callee is released when event is available Blackboard, 
Semaphore, Event 
Non-co-located Caller notifies via TCP/IP, callee is 
released upon receipt 
Queuing port, 
Semaphore, Event
ACM: 
Modeling Language 
 Modeling elements: 
 Data types: primitive, structs, vectors 
 Interfaces: methods with arguments 
 Components: 
 Publish/Subscribe ports (with data type) 
 Provided/Required interfaces (with i/f type) 
 Health Manager 
 Assemblies 
 Deployment 
 Modules, Partitions 
 Component  Partition
Example: Sensor/GPS/Display 
get 
gps_data_source 
data_in 
invokes 
Component Port Period Time Capacity Deadline 
Sensor data_out 4 sec 4 sec Hard 
GPS data_out aperiodic 4 sec Hard 
GPS data_in 4 sec 4 sec Hard 
GPS gps_data_src aperiodic 4 sec Hard 
Navdisplay data_in aperiodic 4 sec Hard 
Navdisplay gps_data_src aperiodic 4 sec Hard 
component NavDisplay { 
consumes SensorOutput data_in ; //APERIODIC 
uses GPSDataSource gps_data_source ;} ; 
data_out 
component Sensor { 
Publishes SensorOutput data_out ; }; 
data_out 
GPS 
get 
gps_data_src 
GPSValue 
data_in 
reads 
invokes 
updates reads 
Nav 
Display 
Sensor 
component GPS { 
publishes SensorOutput data_out ; //APERIODIC 
consumes SensorOutput data_in ; //PERIODIC 
provides GPSDataSource gps_data_src ; }; 
struct SensorOutput 
{ 
Timespec time ; 
SensorData data ; 
}; 
struct SensorData 
{ 
FLOATINGPOINT alpha ; 
FLOATINGPOINT beta ; 
FLOATINGPOINT gamma ; 
}; 
struct Timespec 
{ 
LONGLONG tv_sec ; 
LONGLONG tv_nsec ; 
}; 
interface GPSDataSource 
{ 
void getGPSData (out GPSData d); 
};
Anomaly Detection 
 Model-Based Specification of 
monitoring expressions 
 Post/Pre condition violations: 
threshold, rate, custom filter 
(moving average) 
 Resource Violations: Deadline 
 Validity Violation: Stale data on 
a consumer 
 Concurrency Violations: Lock 
timeouts. 
 User code violations: reported 
error conditions from 
application code. 
 Code Generators 
 Synthesize code for 
implementing the monitors 
Monitor 
arriving events 
Monitor 
incoming calls 
Monitor published 
events 
Monitor outgoing 
calls 
Observe state Monitor resource 
usage 
Monitor control 
flow/ triggering 
Port Monitors 
Non-Port Monitors 
• Based on these local detection, 
each component developer can 
implement a local health 
manager 
• It is a reactive timed state 
machine with pre specified 
actions. 
• All alarms, actions are reported 
to the system health manager
ACM: 
Modeling Language: Monitoring 
 Monitoring on component interfaces 
 Subscriber port  ‘Subscriber process’ and 
Publisher port  ‘Publisher process’ 
 Monitor: pre-conditions and post-conditions 
 On subscriber: Data validity (‘age’ of data) 
 Deadline (hard / soft) 
 Provided interface  ‘Provider methods’ and 
Required interface  ‘Required methods’ 
 Monitor: pre-conditions and post-conditions 
 Deadline (hard / soft) 
 Can be specified on a per-component basis 
 Monitoring language: 
 Simple, named expressions over input (output) 
parameters, component state, delta(var), and 
rate(var,dt). The expression yields a Boolean condition. 
74
Component-level Health Management 
 Manager’s behavioral model: 
 Finite-state machine 
 Triggers: monitored events, time 
 Actions: mitigation activities 
 Manager is local to component 
container (for efficiency) but shall be 
protected from the faults of functional 
components 
 Notional behavior: 
 Track component state changes via 
detected events and progression of 
time 
 Take mitigation actions as needed 
 Design issues: 
 Co-location with component (fault containment) 
 Local detection may implicate another component 
Component 
Monitor 
WCET 
Component Framework 
Manager 
Actions 
Events 
Events 
Idle 
Exec 
InvA 
start 
finish 
timeout 
/init 
invA_violation 
/reset
ACM - Modeling Language: 
Component Health Manager 
 Reactive Timed State Machine 
 Event trigger: 
 Predefined conditions (e.g. deadline violation, data validity validation) 
 User-defined conditions (e.g. pre-condition violation) 
 Reaction: mitigation action (start, reset, refuse, ignore, etc.) 
 State: current state of the machine 
 (Event X State)  Action
Component Health Management 
Available Actions 
Component Health Manager (High priority ARINC-653 process) 
Error 
Message /Action 
HM Response 
Component 
NOMINAL ERROR CHECK 
RESULT FAILURE 
Action Successful 
Timeout or 
Action Failed 
B 
U 
F 
F 
E 
R 
Incoming 
Events 
Component 
Port (653 
PRocess) 
PPrroocceessss 3 1 
HM Response 
BBlalacckkBBooaardrd 
BlackBoard 
Blocking 
Read 
Architecture
Assembly Definition 
Validity(GPS.data_in)<4ms 
Delta(Nav.data_in.time)>0 
Rate(gps_data_src.data)>1 
Specified Monitoring Conditions 
 The Sensor component is triggered periodically and it publishes an event upon each 
activation. 
 The GPS component subscribes to this event and is triggered periodically to obtain GPS 
data from the receiver. It publishes its own output event. 
 The Nav Display component is triggered sporadically via this event and it uses a required 
78 Model-Based Software Health Management 
interface to retrieve the position data from the GPS component.
System-level Health Management 
 Focus issue: Cascading faults 
 Hypothesis: Fault effects cascade via component interactions 
 Anomalies detected on the component level are not 
‘diagnosed’  can be caused by other components 
 Problem: 
 How to model fault cascades? 
 How to diagnose and isolate fault cascade root causes? 
 How to mitigate fault cascades?
Recap: Fault diagnosis 
 Model: Timed Failure Propagation Graphs 
Modeling variants 
•Untimed, causal network (no modes, propagation = [0..inf]) 
•Modal networks: edges are mode dependent 
•Timed models 
•Hierarchical component models 
Nodes: 
•Failure modes 
•Discrepancies 
• AND/OR (combination) 
• Monitored (option) 
Edges: 
•Propagation delay: [min, max] 
•Discrete Modes (activation) 
Example models (#components, #failuremodes, #alarms) 
•Trivial examples 
•Simplified fuel system (~30,~80,~100) 
•Realistic fuel system (~200,~400,~600) 
•Aircraft avionics (~2000,~8000,~25000) – generated
Recap: Fault diagnosis 
Fault diagnosis algorithm: 
• Outline: 
– Check if new evidence is explained 
by current hypotheses. 
– If not, create a new hypothesis that 
assumes a hypothetical state of the 
system consistent with observations 
– Rank hypotheses for plausibility and 
robustness metrics 
– Discard low-rank hypotheses, keep 
plausible ones Fault state: ‘total state vector’ of the system, 
i.e. all failure modes and discrepancies 
Alarms could be 
Missing: should have fired but did not 
Inconsistent: fired, but it is not consistent 
with the hypothesis 
Robust diagnostics: tolerates missing and 
inconsistent alarms 
Metrics: 
Plausibility: how plausible is the 
hypothesis w.r.t. alarm consistency 
Robustness: how likely is that the 
hypothesis will change in the future
Modeling Cascading Faults 
 Not needed - the cascades can be computed from the 
component assemblies, if the anomaly types and their 
interactions are known. 
 Component ‘elements’ 
Every method belongs to one of these (7) 
 Fault cascades within component 
(A few of the 38 patterns)
Modeling Cascading Faults 
 Inter-component propagation is regular – always follows the 
same pattern 
 Intra-component propagation depends on the component!  
Need to model internal dataflow and control flow of the 
component. 
Note: Could be determined via source code analysis.
Modeling Cascading Faults 
 Fault Propagation Graph for GPS Example 
 Here: hand-crafted, but it is generated automatically in the 
system
System-level Fault Mitigation 
 Model-based system-level mitigation engine 
 Model-based diagnoser is automatically generated 
 Designer specifies fault mitigation 
strategies using a reactive state machine 
Advantages: 
Diagnoser Engine Mitigation Engine • Models are higher-level 
programs to specify 
(potentially complex) 
D D FM 
behavior – more readable and 
comprehensible 
•Models lend themselves to 
formal analysis – e.g. model 
Managed Component 
checking 
Component CHM 
Component Platform 
Managed Component 
Component CHM 
Component Fault Model 
Component Fault Model 
FM 
FM 
FM 
FM 
D 
D 
D 
D 
D
System-level Fault 
Mitigation 
 Model-based mitigation specification at 
two levels 
 Component level: quick action 
 System level: Reactive action taking the 
system state into consideration 
 System designer specifies them as a 
parallel timed state machine. 
 Fixed set of mitigation actions are 
available 
 Runtime code is generated from 
models 
 Advantages: 
 Models are higher-level programs to 
specify (potentially complex) behavior – 
more readable and comprehensible 
 Models lend themselves to formal 
analysis – e.g. model checking 
List of predefined Mitigation Actions 
HM Action Semantics 
CLHM: IGNORE Continue as if nothing has happened 
CLHM:ABORT Discontinue current operation, but opera-tion 
can run again 
CLHM: 
USE PAST 
DATA 
Use most recent data (only for operations 
that expect fresh data) 
CLHM: STOP Discontinue current operation 
Aperiodic processes (ports): operation can 
run again 
Periodic processes (ports): operation must 
be enabled by a future START HM action 
CLHM: START Re-enable a STOP-ped periodic operation 
CLHM RESTART A Macro for STOP followed by a START 
for the current operation 
SLHM: RESET Stop all operations, initialize state of component, start all periodic 
operations 
SLHM: STOP Stop all operations 
Diagnoser Engine Mitigation Engine 
Alarms 
Alarms 
Alarms
System-level Health Management 
Functional components 
 1. Aggregator: 
 Integrates (collates) health information coming 
from components (typically in one hyperperiod) 
 2. Diagnoser: 
 Performs fault diagnosis, based on the fault 
propagation graph model 
 Ranks hypotheses 
 Component that appears in all hypotheses with 
the highest rank is chosen for mitigation 
 3. Response Engine: 
 Issues mitigation actions to components based 
on diagnosis results 
 Based on a state machine model that maps 
diagnostic results to mitigation actions 
These components are generated 
automatically from the models 
The Health Management Approach: 
1. Locally detected anomalies are mitigated 
locally first. – Quick reactive response. 
2. Anomalies and local mitigation actions are 
reported to the system level. 
3. Aggregated reports are subjected to 
diagnosis, potentially followed by a system-level 
mitigation action. 
4. System-level response commands are 
propagated down to components.
Example: 
2005 Malaysian Air Boeing 777 in-flight upset 
 Low airspeed advisory. 
 Airplane’s autopilot experienced excessive acceleration values. 
 Vertical acceleration decreased to -2.3g within ½ second 
 Lateral acceleration decreased to -1.01g (left) within ½ second 
 Longitudinal acceleration increases to +1.2 g within ½ second 
 Autopilot pitched nose-up to 17.6 degree and climbed at a vertical speed 
of 10,650 fpm. 
 Airspeed reduced to 241 knots. 
 Stick shaker activated at top of the climb. 
 Aircraft descended 4,000 ft. 
 Re-engagement of autopilot followed by another climb of 2,000 ft. 
 Maximum rate of climb = 4440 fpm.
B777 ADIRU Architecture 
• Designed to be serviceable with 
one fault in each FCA 
• Can fly but maintenance 
required upon landing with two 
faults in each FCA 
• Each ARINC 629 end unit voted 
on the processor data bit-by-bit. 
• Processors monitor the ARlNC 
629 modules by full data wrap-around 
• Processors also monitor the 
power supplies, any one of 
which can power the entire unit 
• Accelerometer and gyro in 
skewed redundant configuration 
• A S(econdary)AARU also 
provided inertial data 
Based on Air Data Inertial Reference Unit (ADIRU) 
Architecture (ATSB, 2007, p.5)
Cause of Inflight Upset 
 June 2001: accelerometer 5 fails with high output value, ADIRU disregards it. 
 A power cycle on ADIRU occurs. A latent software bug disregards the faulty status 
of accelerometer 5. 
 Status of failed unit was recorded on-board maintenance memory, but that memory was 
not checked by the software. 
 An inflight fault was recorded in accelerometer 6 and it was disregarded. 
 FDI software allowed use of accelerometer 5. 
 High acceleration value was passed to all computers. 
 Due to common-mode nature of fault, voters allowed high accelerometer data to 
go on all channels. 
 This high value was used by primary flight computer. 
 Mid value select function used by the flight computer lessened the effect of pitch 
motion. 
Pro blem: System relied on redundancy to mask a fault. But due to latent software 
bug and common-mode fault, the effect cascaded into the system failure 
Reading Material: The dangers of failure masking in fault-tolerant software: aspects of a recent in-flight upset event 
C.W. Johnson and C.M. Holloway, IET Conf. Pub. 2007, 60 (2007), DOI:10.1049/cp:20070442
Case Study 
• Modeled the architecture as a 
software component assembly 
• Created the fault scenario 
• Only modeled part of the system 
to illustrate the point of SHM 
• Accelerometers are arranged on 
six faces of a dodecahedron. 
Used for regression Equations
ADIRU Assembly (Accelerometers) 
Runs at 20 Hz
ADIRU Assembly (Processors) 
Observer tracks the age 
of accelerometer data. 
Specified as timed state 
machine (with timeout) 
Runs at 20 Hz
ADIRU Assembly (Voters) 
Runs at 20 Hz
ADIRU Assembly (Display- Mimics PFC) 
Runs aperiodically
Deployment Model 
Each Module is a processor running the 
ARINC Component Runtime Environment
Execution 
Accelerometers 
Machine – durip02 
SHM 
Machine – durip09 
ADIRU Processors 
ADIRU Computers 
Machine – durip03 
Voter + Display Computer 
Machine – durip06 
Accelerometers 
SHM VOTERS + DISPLAY
System Health Manager 
other machines have similar specification 
These components are auto generated 
The hypothesis generated by the diagnoser is translated to 
Component(s) that is most likely faulty. This list is fed to 
Response Engine, which triggers the mitigation state machine
Demonstration 
 Fault Scenario 
 Accelerometer 5 has initial fault 
 It is started which causes an alarm 
 Then Accelerometer 6 develops fault 
 Successful mitigation 
 Identifying the faulty components 
 Stopping the fault components 
 Processors can still function with four accelerometers.
Demonstration: Faulty Scenario
Resilient architectures and 
autonomy
Resilience and autonomy 
 Model-based Software Health Management 
 Requires explicit specification of component-level and system-level 
health management (recovery) actions 
 Complex and error-prone… too many options! 
 Resilient systems should recover autonomously 
 Concepts: 
 Model the system architecture + functions. 
 Express what is needed from the system to implement 
functions. 
 Embed models into the run-time system 
 Use a reasoner to figure out how to recover function upon 
failures
Modeling 
Functional Requirements for IMU 
 Inertial Position 
• Determine inertial position. 
• Functional Requirement (AND) 
 GPS Position 
 Position Tracking 
 GPS Position 
• Sense GPS position for computing 
Inertial Position 
 Position Tracking 
• Continuously track position to compute 
Inertial position 
• Functional Requirement 
 Body Acceleration Measurement 
 Body Acceleration Measurement 
• Sense body acceleration for Position 
Tracking. 
Inertial 
Position 
GPS 
Position 
Position 
Tracking 
Body 
Acceleration 
Measurement
Modeling 
Complete Redundant Architecture
Modeling the Architecture 
Function Allocation 
Body Acceleration 
Measurement 
EXACTLY ONE (Primary /Secondary ADIRU 
Subsystem) 
ADIRU Subsystem has 
• Accelerometers (6) 
• ADIRU Computers (4) 
• Voters (3) 
Functional / Operational ADIRU Subsystem 
requires 
• ATLEAST 4 of 6 Accelerometers 
• ATLEAST 2 of 4 Filters or ADIRU 
computers 
• ATLEAST 1 of 3 Voter 
Inside one ADIRU:
Modeling the Architecture 
Function Allocation 
GPS Position 
 EXACTLY ONE (Primary/Secondary 
GPS Subsystem) 
 GPS Subsystem includes 
 GPS Receiver (1) 
 GPS Processor (1) 
 Functional / Operational GPS 
subsystem requires 
 EXACTLY ONE of GPS Receiver 
 EXACTLY ONE of GPS Processor 
Inside one GPS Subsystem:
Modeling the Architecture 
Function Allocation 
POSITION TRACKING 
 ATLEAST ONE OF ( LEFT/ CENTER/ 
RIGHT PFC NavFilter Subsystem) 
 PFC NavFilter Subsystem includes 
 PFC Nav Filter (1) 
 PFC Processor (1) 
 Functional/ Operational Requirement 
for PFC Subsystem 
 EXACTLY ONE PFC NavFilter 
 EXACTLY ONE PFC Processor 
Inside one PFC Subsystem:
Component Operational Requirement 
 EXPLICIT – Local dependency 
 Display Subsystem 
 ATLEAST 1 of 3 Consumers (Left, Center, Right) 
 EXPLICIT – Local dependency 
 ADIRU Computer inside ADIRU Subsystem 
 ATLEAST 4 of 6 Consumer Port 
Implies 
 ATLEAST 4 of 6 Accelerometer Components
Component Operational Requirement 
 IMPLICIT – Inferred dependency 
 PFC NavFilter in PFC Subsystem 
 EXACTLY 1 of 1 Consumer Port AND 
 ATLEAST 1 of 1 Requires Port 
Implies 
 EXACTLY 1 of 2 ADIRU Subsystems AND 
 ATLEAST 1 of 2 GPS Subsystem
Component Operational Requirement 
 IMPLICIT – Inferred dependency 
 PFC Processor inside PFC Subsystem 
 EXACTLY 1 of 1 Consumer Port 
Implies 
 EXACTLY 1 of 1 PFC NavFilter 
 GPS Processor inside GPS Subsystem 
 EXACTLY 1 of 1 Consumer Port 
Implies 
 EXACTLY 1 of 1 GPS Receiver
Modeling the problem: 
Boolean SAT 
Functional Requirements + Function allocation + 
Component operational requirements + Component states 
 Encoded as Boolean (CNF) Expression for SATisfiability 
problem 
Solution: valid component architecture 
Size: #Variables: 493/ #Clauses: 1776 
FAULT / Scenario SAT solver - 
RECONFIG 
COMPUTE 
Time (s) 
RECONFIG 
COMMANDS 
Verifying Initial State 0.004228 No commands. Initial State accepted as satisfying/ 
meeting functional requirements.
Initial Configuration
Fault: ADIRU Accelerometer 
 Fault introduced, anomaly detected, fault source 
component diagnosed, then: 
 Compute the new component architecture that satisfies 
the functional requirements AND minimizes the number 
of reconfiguration changes 
FAULT / Scenario SAT solver - 
RECONFIG 
COMPUTE 
Time (s) 
RECONFIG 
COMMANDS 
Primary_ 
ADIRU_Subsystem_ 
Accelerometer6 
0.002989 
STOP Primary_ADIRU_Subsystem_Accelerometer6 
Primary_ 
ADIRU_Subsystem_ 
Accelerometer5 
0.003151 
STOP Primary_ADIRU_Subsystem_Accelerometer5
Primary ADIRU Subsystem 
Partial fault – Primary still functional
ADIRU Accelerometer Fault 
(contd.) 
 3rd fault  failover to secondary ADIRU 
FAULT / Scenario SAT solver - 
RECONFIG 
COMPUTE 
Time (s) 
RECONFIG 
COMMANDS 
Primary_ 
ADIRU_Subsystem_ 
Accelerometer4 
0.020825 
STOP Primary_ADIRU_Subsystem_Accelerometer4 
STOP Primary_ADIRU Subsystem 
(stop all accelerometers, ADIRU computers, Voters in Primary 
ADIRU subsystem) 
START Secondary_ADIRU Subsystem 
(start all accelerometers, ADIRU computers, 
Voters in Secondary ADIRU subsystem)
Primary ADIRU Subsystem 
Complete fault
Primary ADIRU Subsystem Faulty 
Failover to secondary ADIRU
GPS Fault 
FAULT / Scenario SAT solver - 
RECONFIG 
COMPUTE 
Time (s) 
RECONFIG 
COMMANDS 
Primary_ 
GPS_Subsystem_ 
GPSProcessor 
0.004720 
STOP Primary_GPS_Subsystem 
(stop GPS Receiver, GPS Processor) 
START Secondary_GPS Subsystem 
(start GPS Receiver, GPS Processor)
Primary GPS Subsystem Faulty
Reconfiguration after 
Primary GPS Subsystem becomes faulty
PFC NavFilter Faults 
FAULT / Scenario SAT solver - 
RECONFIG 
COMPUTE 
Time (s) 
RECONFIG 
COMMANDS 
Left_ 
PFC_Subsystem_ 
PFCNavFilter 
0.003107 
STOP 
Left_PFC_Subsystem 
( stop PFCNavFilter, PFC Processor) 
Right_ 
PFC_Subsystem_ 
PFCNavFilter 
0.003089 
STOP 
Right_PFC_Subsystem 
( stop PFCNavFilter, PFC Processor)
Left PFC NavFilter Faulty
Right PFC NavFilter Faulty
Research challenges 
 Modeling and engineering of r-CPS 
 Modeling paradigm / verification paradigm / synthesis 
 Verify recoverability under all scenarios 
 Efficient recovery 
 Analytics: 
 Comparing architectures and solutions 
 Resilience against… 
 Cascading, cross-domain faults 
 Cyber attacks possibly with physical faults 
 Engineering process 
 ‘Simian army’ or systematic design? 
 Principles of multi-layer resilience

More Related Content

What's hot

1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS
1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS
1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMSHenry Muccini
 
SERENE 2014 School: Luigi pomante serene2014_school
SERENE 2014 School: Luigi pomante serene2014_schoolSERENE 2014 School: Luigi pomante serene2014_school
SERENE 2014 School: Luigi pomante serene2014_schoolHenry Muccini
 
Final cyber physical system (1)
Final cyber physical system (1)Final cyber physical system (1)
Final cyber physical system (1)vanisre jaiswal
 
Next Generation Standards - A Science-Based Discipline of Information Managem...
Next Generation Standards - A Science-Based Discipline of Information Managem...Next Generation Standards - A Science-Based Discipline of Information Managem...
Next Generation Standards - A Science-Based Discipline of Information Managem...Steve Ray
 
BDCAM: big data for context-aware Monitoring
BDCAM: big data for context-aware MonitoringBDCAM: big data for context-aware Monitoring
BDCAM: big data for context-aware Monitoringkitechsolutions
 
Cyber security for the smart grid, Clifford Neuman, Information Sciences Inst...
Cyber security for the smart grid, Clifford Neuman, Information Sciences Inst...Cyber security for the smart grid, Clifford Neuman, Information Sciences Inst...
Cyber security for the smart grid, Clifford Neuman, Information Sciences Inst...University of Southern California
 
IEEE 2014 JAVA NETWORK SECURITY PROJECTS Integrated security analysis on casc...
IEEE 2014 JAVA NETWORK SECURITY PROJECTS Integrated security analysis on casc...IEEE 2014 JAVA NETWORK SECURITY PROJECTS Integrated security analysis on casc...
IEEE 2014 JAVA NETWORK SECURITY PROJECTS Integrated security analysis on casc...IEEEGLOBALSOFTSTUDENTPROJECTS
 
Mark Minnucci: Deployment of MBSE and the Emergence of a Systems-Thinking Cul...
Mark Minnucci: Deployment of MBSE and the Emergence of a Systems-Thinking Cul...Mark Minnucci: Deployment of MBSE and the Emergence of a Systems-Thinking Cul...
Mark Minnucci: Deployment of MBSE and the Emergence of a Systems-Thinking Cul...EnergyTech2015
 
Bob Garrett: Network of Networks Analysis
Bob Garrett: Network of Networks AnalysisBob Garrett: Network of Networks Analysis
Bob Garrett: Network of Networks AnalysisEnergyTech2015
 
Clase 1 Ingenieria de Software
Clase 1 Ingenieria de SoftwareClase 1 Ingenieria de Software
Clase 1 Ingenieria de Softwarechristianben
 
Smart Grid Cyber Security
Smart Grid Cyber SecuritySmart Grid Cyber Security
Smart Grid Cyber SecurityJAZEEL K T
 
Cybersecurity Considerations for Power Substation SCADA Systems Using IEC 618...
Cybersecurity Considerations for Power Substation SCADA Systems Using IEC 618...Cybersecurity Considerations for Power Substation SCADA Systems Using IEC 618...
Cybersecurity Considerations for Power Substation SCADA Systems Using IEC 618...Power System Operation
 
Zpryme Report on Modeling and Simulation
Zpryme Report on Modeling and SimulationZpryme Report on Modeling and Simulation
Zpryme Report on Modeling and SimulationPaula Smith
 
Smart grid chinedu opara(m00560830)
Smart grid   chinedu opara(m00560830)Smart grid   chinedu opara(m00560830)
Smart grid chinedu opara(m00560830)Chinedu Opara
 
Zpryme Report on Distribution Sensors for QinetiQ
Zpryme Report on Distribution Sensors for QinetiQZpryme Report on Distribution Sensors for QinetiQ
Zpryme Report on Distribution Sensors for QinetiQPaula Smith
 
Semantic Web for Advanced Engineering
Semantic Web for Advanced EngineeringSemantic Web for Advanced Engineering
Semantic Web for Advanced EngineeringMarta Sabou
 

What's hot (19)

1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS
1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS
1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS
 
SERENE 2014 School: Luigi pomante serene2014_school
SERENE 2014 School: Luigi pomante serene2014_schoolSERENE 2014 School: Luigi pomante serene2014_school
SERENE 2014 School: Luigi pomante serene2014_school
 
Final cyber physical system (1)
Final cyber physical system (1)Final cyber physical system (1)
Final cyber physical system (1)
 
Next Generation Standards - A Science-Based Discipline of Information Managem...
Next Generation Standards - A Science-Based Discipline of Information Managem...Next Generation Standards - A Science-Based Discipline of Information Managem...
Next Generation Standards - A Science-Based Discipline of Information Managem...
 
BDCAM: big data for context-aware Monitoring
BDCAM: big data for context-aware MonitoringBDCAM: big data for context-aware Monitoring
BDCAM: big data for context-aware Monitoring
 
Cyber security for the smart grid, Clifford Neuman, Information Sciences Inst...
Cyber security for the smart grid, Clifford Neuman, Information Sciences Inst...Cyber security for the smart grid, Clifford Neuman, Information Sciences Inst...
Cyber security for the smart grid, Clifford Neuman, Information Sciences Inst...
 
G41044251
G41044251G41044251
G41044251
 
IEEE 2014 JAVA NETWORK SECURITY PROJECTS Integrated security analysis on casc...
IEEE 2014 JAVA NETWORK SECURITY PROJECTS Integrated security analysis on casc...IEEE 2014 JAVA NETWORK SECURITY PROJECTS Integrated security analysis on casc...
IEEE 2014 JAVA NETWORK SECURITY PROJECTS Integrated security analysis on casc...
 
Mark Minnucci: Deployment of MBSE and the Emergence of a Systems-Thinking Cul...
Mark Minnucci: Deployment of MBSE and the Emergence of a Systems-Thinking Cul...Mark Minnucci: Deployment of MBSE and the Emergence of a Systems-Thinking Cul...
Mark Minnucci: Deployment of MBSE and the Emergence of a Systems-Thinking Cul...
 
Bob Garrett: Network of Networks Analysis
Bob Garrett: Network of Networks AnalysisBob Garrett: Network of Networks Analysis
Bob Garrett: Network of Networks Analysis
 
Clase 1 Ingenieria de Software
Clase 1 Ingenieria de SoftwareClase 1 Ingenieria de Software
Clase 1 Ingenieria de Software
 
Smart Grid Cyber Security
Smart Grid Cyber SecuritySmart Grid Cyber Security
Smart Grid Cyber Security
 
Alasiri Tosin
Alasiri TosinAlasiri Tosin
Alasiri Tosin
 
The arca of iris one
The arca of iris oneThe arca of iris one
The arca of iris one
 
Cybersecurity Considerations for Power Substation SCADA Systems Using IEC 618...
Cybersecurity Considerations for Power Substation SCADA Systems Using IEC 618...Cybersecurity Considerations for Power Substation SCADA Systems Using IEC 618...
Cybersecurity Considerations for Power Substation SCADA Systems Using IEC 618...
 
Zpryme Report on Modeling and Simulation
Zpryme Report on Modeling and SimulationZpryme Report on Modeling and Simulation
Zpryme Report on Modeling and Simulation
 
Smart grid chinedu opara(m00560830)
Smart grid   chinedu opara(m00560830)Smart grid   chinedu opara(m00560830)
Smart grid chinedu opara(m00560830)
 
Zpryme Report on Distribution Sensors for QinetiQ
Zpryme Report on Distribution Sensors for QinetiQZpryme Report on Distribution Sensors for QinetiQ
Zpryme Report on Distribution Sensors for QinetiQ
 
Semantic Web for Advanced Engineering
Semantic Web for Advanced EngineeringSemantic Web for Advanced Engineering
Semantic Web for Advanced Engineering
 

Viewers also liked

Cyber physical systems and robotics
Cyber physical systems and roboticsCyber physical systems and robotics
Cyber physical systems and roboticstrinhanhtuan247
 
Considering Execution Environment Resilience: A White-Box Approach
Considering Execution Environment Resilience: A White-Box ApproachConsidering Execution Environment Resilience: A White-Box Approach
Considering Execution Environment Resilience: A White-Box ApproachSERENEWorkshop
 
SERENE 2014 Workshop: Paper "Using Instrumentation for Quality Assessment of ...
SERENE 2014 Workshop: Paper "Using Instrumentation for Quality Assessment of ...SERENE 2014 Workshop: Paper "Using Instrumentation for Quality Assessment of ...
SERENE 2014 Workshop: Paper "Using Instrumentation for Quality Assessment of ...SERENEWorkshop
 
Hot Stand-By Disaster Recovery Solutions for Ensuring the Resilience of Railw...
Hot Stand-By Disaster Recovery Solutions for Ensuring the Resilience of Railw...Hot Stand-By Disaster Recovery Solutions for Ensuring the Resilience of Railw...
Hot Stand-By Disaster Recovery Solutions for Ensuring the Resilience of Railw...SERENEWorkshop
 
Towards Robust and Safe Autonomous Drones
Towards Robust and Safe Autonomous DronesTowards Robust and Safe Autonomous Drones
Towards Robust and Safe Autonomous DronesSERENEWorkshop
 
Cyber-Physical Systems - contradicting requirements as drivers for innovation
Cyber-Physical Systems - contradicting requirements as drivers for innovationCyber-Physical Systems - contradicting requirements as drivers for innovation
Cyber-Physical Systems - contradicting requirements as drivers for innovationMichael Heiss
 
RADIANCE @ DSN 2016
RADIANCE @ DSN 2016RADIANCE @ DSN 2016
RADIANCE @ DSN 2016Nuno Antunes
 
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...Alessandra Bagnato
 
[TALK] Exokernel vs. Microkernel
[TALK] Exokernel vs. Microkernel[TALK] Exokernel vs. Microkernel
[TALK] Exokernel vs. MicrokernelHawx Chen
 
SERENE 2014 Workshop: Paper "Adaptive Domain-Specific Service Monitoring"
SERENE 2014 Workshop: Paper "Adaptive Domain-Specific Service Monitoring"SERENE 2014 Workshop: Paper "Adaptive Domain-Specific Service Monitoring"
SERENE 2014 Workshop: Paper "Adaptive Domain-Specific Service Monitoring"SERENEWorkshop
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENEWorkshop
 
SERENE 2014 Workshop: Paper "Combined Error Propagation Analysis and Runtime ...
SERENE 2014 Workshop: Paper "Combined Error Propagation Analysis and Runtime ...SERENE 2014 Workshop: Paper "Combined Error Propagation Analysis and Runtime ...
SERENE 2014 Workshop: Paper "Combined Error Propagation Analysis and Runtime ...SERENEWorkshop
 
SERENE 2014 Workshop: Paper "Advanced Modelling, Simulation and Verification ...
SERENE 2014 Workshop: Paper "Advanced Modelling, Simulation and Verification ...SERENE 2014 Workshop: Paper "Advanced Modelling, Simulation and Verification ...
SERENE 2014 Workshop: Paper "Advanced Modelling, Simulation and Verification ...SERENEWorkshop
 
SERENE 2014 Workshop: Paper "Simulation Testing and Model Checking: A Case St...
SERENE 2014 Workshop: Paper "Simulation Testing and Model Checking: A Case St...SERENE 2014 Workshop: Paper "Simulation Testing and Model Checking: A Case St...
SERENE 2014 Workshop: Paper "Simulation Testing and Model Checking: A Case St...SERENEWorkshop
 
Risk Assessment Based Cloudification
Risk Assessment Based CloudificationRisk Assessment Based Cloudification
Risk Assessment Based CloudificationSERENEWorkshop
 
SERENE 2014 Workshop: Panel on "Views on Runtime Resilience Assessment of Dyn...
SERENE 2014 Workshop: Panel on "Views on Runtime Resilience Assessment of Dyn...SERENE 2014 Workshop: Panel on "Views on Runtime Resilience Assessment of Dyn...
SERENE 2014 Workshop: Panel on "Views on Runtime Resilience Assessment of Dyn...SERENEWorkshop
 
SERENE 2014 School: System management overview
SERENE 2014 School: System management overviewSERENE 2014 School: System management overview
SERENE 2014 School: System management overviewSERENEWorkshop
 
SERENE 2014 Workshop: Paper "Verification and Validation of a Pressure Contro...
SERENE 2014 Workshop: Paper "Verification and Validation of a Pressure Contro...SERENE 2014 Workshop: Paper "Verification and Validation of a Pressure Contro...
SERENE 2014 Workshop: Paper "Verification and Validation of a Pressure Contro...SERENEWorkshop
 
Biological Immunity and Software Resilience: Two Faces of the Same Coin?
Biological Immunity and Software Resilience: Two Faces of the Same Coin?Biological Immunity and Software Resilience: Two Faces of the Same Coin?
Biological Immunity and Software Resilience: Two Faces of the Same Coin?SERENEWorkshop
 
Engineering Cross-Layer Fault Tolerance in Many-Core Systems
Engineering Cross-Layer Fault Tolerance in Many-Core SystemsEngineering Cross-Layer Fault Tolerance in Many-Core Systems
Engineering Cross-Layer Fault Tolerance in Many-Core SystemsSERENEWorkshop
 

Viewers also liked (20)

Cyber physical systems and robotics
Cyber physical systems and roboticsCyber physical systems and robotics
Cyber physical systems and robotics
 
Considering Execution Environment Resilience: A White-Box Approach
Considering Execution Environment Resilience: A White-Box ApproachConsidering Execution Environment Resilience: A White-Box Approach
Considering Execution Environment Resilience: A White-Box Approach
 
SERENE 2014 Workshop: Paper "Using Instrumentation for Quality Assessment of ...
SERENE 2014 Workshop: Paper "Using Instrumentation for Quality Assessment of ...SERENE 2014 Workshop: Paper "Using Instrumentation for Quality Assessment of ...
SERENE 2014 Workshop: Paper "Using Instrumentation for Quality Assessment of ...
 
Hot Stand-By Disaster Recovery Solutions for Ensuring the Resilience of Railw...
Hot Stand-By Disaster Recovery Solutions for Ensuring the Resilience of Railw...Hot Stand-By Disaster Recovery Solutions for Ensuring the Resilience of Railw...
Hot Stand-By Disaster Recovery Solutions for Ensuring the Resilience of Railw...
 
Towards Robust and Safe Autonomous Drones
Towards Robust and Safe Autonomous DronesTowards Robust and Safe Autonomous Drones
Towards Robust and Safe Autonomous Drones
 
Cyber-Physical Systems - contradicting requirements as drivers for innovation
Cyber-Physical Systems - contradicting requirements as drivers for innovationCyber-Physical Systems - contradicting requirements as drivers for innovation
Cyber-Physical Systems - contradicting requirements as drivers for innovation
 
RADIANCE @ DSN 2016
RADIANCE @ DSN 2016RADIANCE @ DSN 2016
RADIANCE @ DSN 2016
 
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
 
[TALK] Exokernel vs. Microkernel
[TALK] Exokernel vs. Microkernel[TALK] Exokernel vs. Microkernel
[TALK] Exokernel vs. Microkernel
 
SERENE 2014 Workshop: Paper "Adaptive Domain-Specific Service Monitoring"
SERENE 2014 Workshop: Paper "Adaptive Domain-Specific Service Monitoring"SERENE 2014 Workshop: Paper "Adaptive Domain-Specific Service Monitoring"
SERENE 2014 Workshop: Paper "Adaptive Domain-Specific Service Monitoring"
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
 
SERENE 2014 Workshop: Paper "Combined Error Propagation Analysis and Runtime ...
SERENE 2014 Workshop: Paper "Combined Error Propagation Analysis and Runtime ...SERENE 2014 Workshop: Paper "Combined Error Propagation Analysis and Runtime ...
SERENE 2014 Workshop: Paper "Combined Error Propagation Analysis and Runtime ...
 
SERENE 2014 Workshop: Paper "Advanced Modelling, Simulation and Verification ...
SERENE 2014 Workshop: Paper "Advanced Modelling, Simulation and Verification ...SERENE 2014 Workshop: Paper "Advanced Modelling, Simulation and Verification ...
SERENE 2014 Workshop: Paper "Advanced Modelling, Simulation and Verification ...
 
SERENE 2014 Workshop: Paper "Simulation Testing and Model Checking: A Case St...
SERENE 2014 Workshop: Paper "Simulation Testing and Model Checking: A Case St...SERENE 2014 Workshop: Paper "Simulation Testing and Model Checking: A Case St...
SERENE 2014 Workshop: Paper "Simulation Testing and Model Checking: A Case St...
 
Risk Assessment Based Cloudification
Risk Assessment Based CloudificationRisk Assessment Based Cloudification
Risk Assessment Based Cloudification
 
SERENE 2014 Workshop: Panel on "Views on Runtime Resilience Assessment of Dyn...
SERENE 2014 Workshop: Panel on "Views on Runtime Resilience Assessment of Dyn...SERENE 2014 Workshop: Panel on "Views on Runtime Resilience Assessment of Dyn...
SERENE 2014 Workshop: Panel on "Views on Runtime Resilience Assessment of Dyn...
 
SERENE 2014 School: System management overview
SERENE 2014 School: System management overviewSERENE 2014 School: System management overview
SERENE 2014 School: System management overview
 
SERENE 2014 Workshop: Paper "Verification and Validation of a Pressure Contro...
SERENE 2014 Workshop: Paper "Verification and Validation of a Pressure Contro...SERENE 2014 Workshop: Paper "Verification and Validation of a Pressure Contro...
SERENE 2014 Workshop: Paper "Verification and Validation of a Pressure Contro...
 
Biological Immunity and Software Resilience: Two Faces of the Same Coin?
Biological Immunity and Software Resilience: Two Faces of the Same Coin?Biological Immunity and Software Resilience: Two Faces of the Same Coin?
Biological Immunity and Software Resilience: Two Faces of the Same Coin?
 
Engineering Cross-Layer Fault Tolerance in Many-Core Systems
Engineering Cross-Layer Fault Tolerance in Many-Core SystemsEngineering Cross-Layer Fault Tolerance in Many-Core Systems
Engineering Cross-Layer Fault Tolerance in Many-Core Systems
 

Similar to SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Opportunities

Transfer Learning for Software Performance Analysis: An Exploratory Analysis
Transfer Learning for Software Performance Analysis: An Exploratory AnalysisTransfer Learning for Software Performance Analysis: An Exploratory Analysis
Transfer Learning for Software Performance Analysis: An Exploratory AnalysisPooyan Jamshidi
 
Requirements vs design vs runtime
Requirements vs design vs runtimeRequirements vs design vs runtime
Requirements vs design vs runtimebdemchak
 
Transfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwareTransfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwarePooyan Jamshidi
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environmentsDocker, Inc.
 
cyber physical by Koyal sharma.pptx
cyber physical by Koyal sharma.pptxcyber physical by Koyal sharma.pptx
cyber physical by Koyal sharma.pptxCoreGaming3
 
Os rtos.ppt
Os rtos.pptOs rtos.ppt
Os rtos.pptrahul km
 
GRIFFOR_OxfordU CPS 20Mar2017.pptx
GRIFFOR_OxfordU CPS 20Mar2017.pptxGRIFFOR_OxfordU CPS 20Mar2017.pptx
GRIFFOR_OxfordU CPS 20Mar2017.pptxDAYARNABBAIDYA3
 
RCW@DEI - Real Needs And Limits
RCW@DEI - Real Needs And LimitsRCW@DEI - Real Needs And Limits
RCW@DEI - Real Needs And LimitsMarco Santambrogio
 
Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...John Allspaw
 
Software Requirements and Design Process in the Aerospace Industry
Software Requirements and Design Process in the Aerospace IndustrySoftware Requirements and Design Process in the Aerospace Industry
Software Requirements and Design Process in the Aerospace IndustryLeif Bloomquist
 
Why AIOps Matters For Kubernetes
Why AIOps Matters For KubernetesWhy AIOps Matters For Kubernetes
Why AIOps Matters For KubernetesTimothy Chen
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examplesPeter Lawrey
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The SupercomputerAnkit Singh
 
Slides 6 design of sw arch using add
Slides 6 design of sw arch using addSlides 6 design of sw arch using add
Slides 6 design of sw arch using addJavid iqbal hashmi
 
Complex System Engineering
Complex System EngineeringComplex System Engineering
Complex System EngineeringEmmanuel Fuchs
 
Secure Proactive Recovery- a Hardware Based Mission Assurance Scheme
Secure Proactive Recovery- a Hardware Based Mission Assurance SchemeSecure Proactive Recovery- a Hardware Based Mission Assurance Scheme
Secure Proactive Recovery- a Hardware Based Mission Assurance SchemeRuchika Mehresh
 
WINSEM2023-24_BCSE429L_TH_CH2023240501528_Reference_Material_III_S3-Homoheter...
WINSEM2023-24_BCSE429L_TH_CH2023240501528_Reference_Material_III_S3-Homoheter...WINSEM2023-24_BCSE429L_TH_CH2023240501528_Reference_Material_III_S3-Homoheter...
WINSEM2023-24_BCSE429L_TH_CH2023240501528_Reference_Material_III_S3-Homoheter...KumarSuman24
 
Cse3 March2009cwd35with Crane
Cse3 March2009cwd35with CraneCse3 March2009cwd35with Crane
Cse3 March2009cwd35with CraneEmmanuel Fuchs
 

Similar to SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Opportunities (20)

Transfer Learning for Software Performance Analysis: An Exploratory Analysis
Transfer Learning for Software Performance Analysis: An Exploratory AnalysisTransfer Learning for Software Performance Analysis: An Exploratory Analysis
Transfer Learning for Software Performance Analysis: An Exploratory Analysis
 
Requirements vs design vs runtime
Requirements vs design vs runtimeRequirements vs design vs runtime
Requirements vs design vs runtime
 
Transfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwareTransfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable Software
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environments
 
cyber physical by Koyal sharma.pptx
cyber physical by Koyal sharma.pptxcyber physical by Koyal sharma.pptx
cyber physical by Koyal sharma.pptx
 
Os rtos.ppt
Os rtos.pptOs rtos.ppt
Os rtos.ppt
 
GRIFFOR_OxfordU CPS 20Mar2017.pptx
GRIFFOR_OxfordU CPS 20Mar2017.pptxGRIFFOR_OxfordU CPS 20Mar2017.pptx
GRIFFOR_OxfordU CPS 20Mar2017.pptx
 
Linux capacity planning
Linux capacity planningLinux capacity planning
Linux capacity planning
 
RCW@DEI - Real Needs And Limits
RCW@DEI - Real Needs And LimitsRCW@DEI - Real Needs And Limits
RCW@DEI - Real Needs And Limits
 
Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...
 
Software Requirements and Design Process in the Aerospace Industry
Software Requirements and Design Process in the Aerospace IndustrySoftware Requirements and Design Process in the Aerospace Industry
Software Requirements and Design Process in the Aerospace Industry
 
Why AIOps Matters For Kubernetes
Why AIOps Matters For KubernetesWhy AIOps Matters For Kubernetes
Why AIOps Matters For Kubernetes
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The Supercomputer
 
Slides 6 design of sw arch using add
Slides 6 design of sw arch using addSlides 6 design of sw arch using add
Slides 6 design of sw arch using add
 
Complex System Engineering
Complex System EngineeringComplex System Engineering
Complex System Engineering
 
Secure Proactive Recovery- a Hardware Based Mission Assurance Scheme
Secure Proactive Recovery- a Hardware Based Mission Assurance SchemeSecure Proactive Recovery- a Hardware Based Mission Assurance Scheme
Secure Proactive Recovery- a Hardware Based Mission Assurance Scheme
 
WINSEM2023-24_BCSE429L_TH_CH2023240501528_Reference_Material_III_S3-Homoheter...
WINSEM2023-24_BCSE429L_TH_CH2023240501528_Reference_Material_III_S3-Homoheter...WINSEM2023-24_BCSE429L_TH_CH2023240501528_Reference_Material_III_S3-Homoheter...
WINSEM2023-24_BCSE429L_TH_CH2023240501528_Reference_Material_III_S3-Homoheter...
 
Cse3 March2009cwd35with Crane
Cse3 March2009cwd35with CraneCse3 March2009cwd35with Crane
Cse3 March2009cwd35with Crane
 
2453
24532453
2453
 

More from SERENEWorkshop

SERENE 2014 School: System-Level Concurrent Error Detection
SERENE 2014 School: System-Level Concurrent Error Detection SERENE 2014 School: System-Level Concurrent Error Detection
SERENE 2014 School: System-Level Concurrent Error Detection SERENEWorkshop
 
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...SERENEWorkshop
 
SERENE 2014 School: Challenges in Cyber-Physical Systems
SERENE 2014 School: Challenges in Cyber-Physical SystemsSERENE 2014 School: Challenges in Cyber-Physical Systems
SERENE 2014 School: Challenges in Cyber-Physical SystemsSERENEWorkshop
 
SERENE 2014 Workshop: Paper "Formal Fault Tolerance Analysis of Algorithms fo...
SERENE 2014 Workshop: Paper "Formal Fault Tolerance Analysis of Algorithms fo...SERENE 2014 Workshop: Paper "Formal Fault Tolerance Analysis of Algorithms fo...
SERENE 2014 Workshop: Paper "Formal Fault Tolerance Analysis of Algorithms fo...SERENEWorkshop
 
SERENE 2014 Workshop: Paper "Modelling Resilience of Data Processing Capabili...
SERENE 2014 Workshop: Paper "Modelling Resilience of Data Processing Capabili...SERENE 2014 Workshop: Paper "Modelling Resilience of Data Processing Capabili...
SERENE 2014 Workshop: Paper "Modelling Resilience of Data Processing Capabili...SERENEWorkshop
 
SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...
SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...
SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...SERENEWorkshop
 
SERENE 2014 Workshop: Paper "Enhancing Architecture Design Decisions Evolutio...
SERENE 2014 Workshop: Paper "Enhancing Architecture Design Decisions Evolutio...SERENE 2014 Workshop: Paper "Enhancing Architecture Design Decisions Evolutio...
SERENE 2014 Workshop: Paper "Enhancing Architecture Design Decisions Evolutio...SERENEWorkshop
 
SERENE 2014 Workshop: Paper "On Applying FMEA to SOAs: A Proposal and Open Ch...
SERENE 2014 Workshop: Paper "On Applying FMEA to SOAs: A Proposal and Open Ch...SERENE 2014 Workshop: Paper "On Applying FMEA to SOAs: A Proposal and Open Ch...
SERENE 2014 Workshop: Paper "On Applying FMEA to SOAs: A Proposal and Open Ch...SERENEWorkshop
 
SERENE 2014 Workshop: Paper "The Role of Parts in the System Behaviour"
SERENE 2014 Workshop: Paper "The Role of Parts in the System Behaviour"SERENE 2014 Workshop: Paper "The Role of Parts in the System Behaviour"
SERENE 2014 Workshop: Paper "The Role of Parts in the System Behaviour"SERENEWorkshop
 

More from SERENEWorkshop (9)

SERENE 2014 School: System-Level Concurrent Error Detection
SERENE 2014 School: System-Level Concurrent Error Detection SERENE 2014 School: System-Level Concurrent Error Detection
SERENE 2014 School: System-Level Concurrent Error Detection
 
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
 
SERENE 2014 School: Challenges in Cyber-Physical Systems
SERENE 2014 School: Challenges in Cyber-Physical SystemsSERENE 2014 School: Challenges in Cyber-Physical Systems
SERENE 2014 School: Challenges in Cyber-Physical Systems
 
SERENE 2014 Workshop: Paper "Formal Fault Tolerance Analysis of Algorithms fo...
SERENE 2014 Workshop: Paper "Formal Fault Tolerance Analysis of Algorithms fo...SERENE 2014 Workshop: Paper "Formal Fault Tolerance Analysis of Algorithms fo...
SERENE 2014 Workshop: Paper "Formal Fault Tolerance Analysis of Algorithms fo...
 
SERENE 2014 Workshop: Paper "Modelling Resilience of Data Processing Capabili...
SERENE 2014 Workshop: Paper "Modelling Resilience of Data Processing Capabili...SERENE 2014 Workshop: Paper "Modelling Resilience of Data Processing Capabili...
SERENE 2014 Workshop: Paper "Modelling Resilience of Data Processing Capabili...
 
SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...
SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...
SERENE 2014 Workshop: Paper "Automatic Generation of Description Files for Hi...
 
SERENE 2014 Workshop: Paper "Enhancing Architecture Design Decisions Evolutio...
SERENE 2014 Workshop: Paper "Enhancing Architecture Design Decisions Evolutio...SERENE 2014 Workshop: Paper "Enhancing Architecture Design Decisions Evolutio...
SERENE 2014 Workshop: Paper "Enhancing Architecture Design Decisions Evolutio...
 
SERENE 2014 Workshop: Paper "On Applying FMEA to SOAs: A Proposal and Open Ch...
SERENE 2014 Workshop: Paper "On Applying FMEA to SOAs: A Proposal and Open Ch...SERENE 2014 Workshop: Paper "On Applying FMEA to SOAs: A Proposal and Open Ch...
SERENE 2014 Workshop: Paper "On Applying FMEA to SOAs: A Proposal and Open Ch...
 
SERENE 2014 Workshop: Paper "The Role of Parts in the System Behaviour"
SERENE 2014 Workshop: Paper "The Role of Parts in the System Behaviour"SERENE 2014 Workshop: Paper "The Role of Parts in the System Behaviour"
SERENE 2014 Workshop: Paper "The Role of Parts in the System Behaviour"
 

Recently uploaded

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 

Recently uploaded (20)

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 

SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Opportunities

  • 1. Resilience in Cyber-Physical Systems: Challenges and Opportunities Gabor Karsai Institute for Software-Integrated Systems Vanderbilt University SERENE 2014 – Autumn School
  • 2. Acknowledgements  People: Janos Sztipanovits, Daniel Balasubramanian, Abhishek Dubey, Tihamer Levendovszky, Nag Mahadevan, and many others at the Institute for Software-Integrated Systems @ Vanderbilt University  Sponsors: AFRL, DARPA, NASA, NSF through various programs
  • 3. Outline  Introduction  Cyber-physical Systems  Resilience  Building resilient CPS  System-level fault diagnostics  Software health management  Resilient architectures and autonomy  Conclusions
  • 5. What is a Cyber-Physical System?  An engineered system that integrates physical and cyber components where relevant functions are realized through the interactions between the physical and cyber parts.  Physical = some tangible, physical device + environment  Cyber = computational + communicational
  • 6. Courtesy of Kuka Robotics Corp. Cyber-Physical Systems (CPS): Integrating networked computational resources with physical systems E-Corner, Siemens Courtesy of Doug Schmidt Power generation and distribution Courtesy of General Electric Military systems: Transportation (Air traffic control at Avionics SFO) Telecommunications Factory automation Instrumentation (Soleil Synchrotron) Daimler-Chrysler Automotive Building Systems Courtesy of Ed Lee, UCB
  • 9. CPS Challenge Problem: Prevent This
  • 10. A Typical Cyber-Physical System Printing Press • Application aspects • local (control) • distributed (coordination) • global (modes) • Ethernet network • Synchronous, Time-Triggered • IEEE 1588 time-sync protocol • High-speed, high precision • Speed: 1 inch/ms (~100km/hr) • Precision: 0.01 inch Bosch-Rexroth -> Time accuracy: 10us Courtesy of Ed Lee, UCB
  • 11. Example – Flying Paster Source: http://offsetpressman.blogspot.com/2011/03/how-flying-paster-works.html Courtesy of Ed Lee, UCB S.e n so r top dead center Active paper feed Paper cutt,er Idle roller !Flyi ng R,.~_$J.fil:. Drive roll,er Dancer Idle roller !D rive roller ~--------------------------------------------------------------------------------------------------------------------------
  • 13. Example: Medical Devices Emerging direction: Cell phone based medical devices for affordable healthcare e.g. “Telemicroscopy” project at Berkeley e.g. Cell-phone based blood testing device developed at UCLA Courtesy of Ed Lee, UCB
  • 14. Example: Toyota autonomous vehicle technology roadmap, c. 2007 Source: Toyota Web site
  • 15. DARPA Robotics Challenge  http://www.theroboticschallenge.org/
  • 16. The Good News… Networking and computing delivers unique precision and flexibility in interaction and coordination Computing/Communication Integrated CPS  Rich time models  Precise interactions across highly extended spatial/temporal dimension  Flexible, dynamic communication mechanisms  Precise time-variant, nonlinear behavior  Introspection, learning, reasoning  Elaborate coordination of physical processes  Hugely increased system size with controllable, stable behavior  Dynamic, adaptive architectures  Adaptive, autonomic systems  Self monitoring, self-healing system architectures and better safety/security guarantees.
  • 17. …and the Challenges Fusing networking and computing with physical processes brings new Computing/Communication Integrated CPS  Cyber vulnerability  New type of interactions across highly extended spatial/temporal dimension  Flexible, dynamic communication mechanisms  Precise time-variant, nonlinear behavior  Introspection, learning, reasoning  Physical behavior of systems can be manipulated  Lack of composition theories for heterogeneous systems: much unsolved problems  Vastly increased complexity and emergent behaviors  Lack of theoretical foundations for CPS dynamics  Verification, certification, predictability has fundamentally new challenges. problems
  • 18. Abstraction layers allow the verification of different properties . Key Idea: Manage design complexity by creating abstraction layers in the design flow. Abstraction layers define platforms. Physical Platform Software Platform Computation/Communication Platform Abstractions are linked through mapping. Claire Tomlin, UC Berkeley Example for a CPS Approach
  • 19. Abstraction layers and models: Real-time Software Sifakis at al: “Building Models of Real-Time Systems from Application Software,” Proceedings of the IEEE Vol. 91, No. 1. pp. 100-111, January 2003 Software models   T Out f T In  :  2 correctness: implementation Real-time system models In CPS, essential system properties such as stability, safety, performance are expressed in terms of physical behavior • f : reactive program. Program execution creates a mapping between logical-time inputs and outputs. • f : real-time system. Programs are R packaged into interacting components. Scheduler control access to computational and communicational resources according to time constraints P timing analysis (P) , ( ( )) ( ( )) out    R in  E  f  f    T Out f T In R  :  2 R R E f P R   ,   (), (, )
  • 20. Abstraction layers and models: Cyber-Physical Systems Physical models   T Out p T In R  :  2   T Out R R f T In R  ; :  2 R R implementation Software models   T Out f T In  :  2 correctness: implementation Real-time system models Re-defined Goals: • Compositional verification of essential dynamic properties − stability − safety • Derive dynamics offering robustness against implementation changes, uncertainties caused by faults and cyber attacks − fault/intrusion induced reconfiguration of SW/HW − network uncertainties (packet drops, delays) • Decreased verification complexity timing analysis (P) , ( ( )) ( ( )) out    R in  E  f  f    T Out f T In R  :  2 R R E f P R   ,   (), (, )
  • 21. Why is CPS Hard? Software Control Systems package org.apache.tomcat.session; import org.apache.tomcat.core.*; import org.apache.tomcat.util.StringManager; import java.io.*; import java.net.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*; /** * Core implementation of a server session * * @author James Duncan Davidson [duncan@eng.sun.com] * @author James Todd [gonzo@eng.sun.com] */ public class ServerSession { private StringManager sm = StringManager.getManager("org.apache.tomcat.session"); private Hashtable values = new Hashtable(); private Hashtable appSessions = new Hashtable(); private String id; private long creationTime = System.currentTimeMillis();; private long thisAccessTime = creationTime; private long lastAccessed = creationTime; private int inactiveInterval = -1; ServerSession(String id) { this.id = id; } public String getId() { return id; } public long getCreationTime() { return creationTime; } public long getLastAccessedTime() { return lastAccessed; } public ApplicationSession getApplicationSession(Context context, boolean create) { ApplicationSession appSession = (ApplicationSession)appSessions.get(context); if (appSession == null && create) { // XXX // sync to ensure valid? appSession = new ApplicationSession(id, this, context); appSessions.put(context, appSession); } // XXX // make sure that we haven't gone over the end of our // inactive interval -- if so, invalidate and create // a new appSession return appSession; } void removeApplicationSession(Context context) { appSessions.remove(context); } /** * Called by context when request comes in so that accesses and * inactivities can be dealt with accordingly. */ void accessed() { // set last accessed to thisAccessTime as it will be left over // from the previous access lastAccessed = thisAccessTime; thisAccessTime = System.currentTimeMillis(); } void validate() Crosses Interdisciplinary Boundaries • Disciplinary boundaries need to be realigned • New fundamentals need to be created • New technologies and tools need to be developed • Education need to be restructured
  • 23. Cyber-Physical Systems: Software Intensive Systems  Embedded software ….  is a crucial ingredient in modern systems  is the ‘universal system integrator’  could exhibit faults that lead to system failures  complexity has progressed to the point that zero-defect systems (containing both hardware and software) are very difficult to build  need to evolve while in operation The challenge is to build software intensive systems that anticipate change: uncertain environments, faults, updates, and exhibit resilience: they survive and adapt to changes, while being dependably functional.
  • 24. Resilience  Webster:  Capable of withstanding shock without permanent deformation or rupture  Tending to recover from or adjust easily to misfortune or change.  Technical:  The persistence of the avoidance of failures that are unacceptably frequent or severe, when facing changes. [Laprie, ‘04]  A resilient system is trusted and effective out of the box in a wide range of contexts, and easily adapted to many others through reconfiguration or replacement. [R. Neches, OSD]  Resilient system detects anomalies in itself, diagnoses its causes, and is able to recover lost functionality. Research issues •Model-driven engineering of Resilient Software Systems •Design-time + Run-time aspects •Resilience to: (1) faults, (2) environmental changes, (3) updates •Target system category: distributed, real-time, embedded systems Objective: Model-based engineering approach and tools to build verifiably resilient systems
  • 27. Refinement/Compilation Abstraction Computational Interaction Physical Interaction Cyber-Physical Systems Layers and Interactions Implementation Implementation Platform Layer Physical Object Physical Layer Cyber-Physical Object Physical Physical Object Object Cyber-Physical Object Computational Object Computational Object Computational Object Computational Object Computational Object Communication Platform Computational Platform Computational Platform Fault Fault Fault Fault
  • 28. Cyber-Physical Systems Faults and resilience  In CPS faults can appear in (and cascade to) any place  Physical system  Hardware (computing and communication) system  Software (application and platform) system  In CPS physical and cyber elements are integrated  Many interaction pathways: P2P, P2C, C2C, P2C2P, C2P2P2C  Many modeling paradigms for physical systems  Consider engineering or physics!  Heterogeneous models need to be integrated for detailed analysis  In CPS recovery can take many forms  Physical action  Cyber restart  Software adaptation
  • 29. CPS and Model-based Design Design of CPS layers via MDE  Software models  Platform models  Physical models Challenge: How to integrate the models so that cross-domain interactions can be understood and managed?
  • 30. A Strategy for Resilient CPS  Overall scheme:  Faults can originate in any layer of a hierarchy, in any component  Anomalies caused by the fault can be detected in the same or a higher layer  Based on anomalies a fault source isolation (diagnosis) is performed. The diagnosis result may be reported to a higher layer, depending on the nature of the fault.  The fault is locally mitigated first, but when that mitigation fails the higher layer is informed about the anomaly, the diagnosed fault, and the mitigation action taken.  High-level view: Fault management is a control problem.  Faults are disturbances in the system whose effects prevent the system to provide the required service/s  Anomalies are the sensory inputs, mitigation actions are the actuators of the fault management system  Fault mitigation must happen by considering (1) the current functional goals and (2) the actual state of the system, on the right level of abstraction
  • 31. A Strategy for Resilient CPS Layered fault management  Concepts: 1. Faults propagate to neighboring layers via guaranteed behaviors 2. Each layer includes pro-active and reactive fault management mechanisms  Each layer provides a ‘fault reporting’ and ‘fault management’ interface  Fault management services are built into the ‘middleware’:  Temporal/spatial Isolation  Fault Tolerant Clock Sync  Time-triggered Communications  Group Communication and Transactions  Fault-tolerant Resource Sharing  Component/Service Migration  Primary/Backup  Replication  Autonomous Failure Management  Safe Dynamic Composition of Components
  • 33. The need for resilience In complex systems even simple failures lead to complex cascades of events that are difficult to understand and manage. How to •detect and isolate faults? •react to faults to mitigate their effect?
  • 34. FACT: A model-driven toolsuite for system-level diagnostics Visual modeling tool for creating: •System architecture models •Timed failure propagation graph models Run-time Platform (RTOS) Modular run-time environment contains: •Monitors detect anomalies in sensor data and track mode changes •TFPG Diagnostics Engine performs diagnosis and isolates the source(s) of observed anomalies •Reports are generated for operators and maintainers Modules can be used standalone on an embedded target processor with an RTOS TFPG DIAGNOSTICS ENGINE MONITORS OPERATOR MAINTAINER
  • 35. Modeling Language Temporal Failure Propagation Graphs •Failure modes •Discrepancies •Monitors/Alarms •Propagation links with: •Time delay •Mode Fault model: Known physical failure modes whose functional effects (discrepancies) are monitored. Diagnostic problem: Given a set of active monitors and their temporal activation sequence, which failure mode(s) explain the observations? A causal network-like model describing how component failure effects propagate across the system activating monitors. Failure propagation links and monitors could be mode-dependent.
  • 36. Modeling Language Temporal Failure Propagation Graphs Modeling variants •Untimed, causal network (no modes, propagation = [0..inf]) •Modal networks: edges are mode dependent •Timed models •Hierarchical component models Nodes: •Failure modes •Discrepancies • AND/OR • Monitored (option) Edges: •Propagation delay: [min, max] •Discrete Modes (activation) Example models (#components,#failuremodes,#alarms) •Trivial examples •Simplified fuel system (~30,~80,~100) •Realistic fuel system (~200,~400,~600) •Aircraft avionics (~2000,~8000,~25000) – generated
  • 37. TFPG Example Example Not shown: - Timing on propagation links - Components/hierarchy - Modal propagation TFPG captures cause-effect relationships that can be modal and temporal. Effects may be cumulative and/or monitored. Legend Component
  • 38. Timed Failure Propagation Graphs  Causal models that describe the system behavior in presence of faults.  Model is a labeled directed graph where  Nodes represent either failure modes or discrepancies  Edges between nodes in the graph represent causality  Edges are attributed with timing and mode constraints on failure propagation.  A discrepancy can be either monitored unmonitored.  The monitor detects a sensory manifestation of an anomaly and generates alarms. Failure Cascades Propagation links Alarm Allocation Failure Modes Discrepancies Alarms
  • 43. TFPG Reasoning On-line diagnostics: Input: Sequence of alarms and mode changes Output: Sequence of sorted and ranked hypotheses containing failure mode(s) that explain the observations (alarms, mode changes)
  • 44. TFPG Hypothesis  TFPG Hypothesis: estimation of the current system state.  Directly, points to failure modes that “best” explain the current set of observed alarms.  Indirectly, points to failed monitored discrepancies; those with a state that is inconsistent with the (hypothesized) state of the failure modes  Structure  List of possible Failure Modes  List of alarms in each set ( Consistent (C)/ Inconsistent (IC)/Missing (M) / Expected (E))  Metrics : Plausibility/ Robustness/ Failure Rate
  • 45. Hypotheses Evaluation Metrics Hypotheses are evaluated based on the following metrics:  Plausibility: reflects the support of a hypothesis based on the current observed alarm state. It answers the question: Which hypothesis to consider?  Robustness: reflects the potential of a hypothesis (evidence) to change based on remaining alarms. It answers the question: When to take an action?  Failure Rate: is a measure of how often a particular failure mode has occurred in the past.
  • 46. Run-time System Diagnostics Engine  Algorithm outline:  Check if new evidence is explained by current hypotheses.  If not, create a new hypothesis that assumes a hypothetical state of the system consistent with observations  Rank hypotheses for plausibility and robustness  Discard low-rank hypotheses, keep plausible ones  Fault state: ‘total state vector’ of the system, i.e. all failure modes and discrepancies  Alarms could be  Missing: should have fired but did not  Inconsistent: fired, but it is not consistent with the hypothesis  Robust diagnostics: tolerates missing and inconsistent alarms  Metrics: Plausibility: how plausible is the hypothesis w.r.t alarm consistency Robustness: how likely is that the hypothesis will change in the future
  • 47. Run-time System Diagnostics Engine  Novel properties:  Multi-fault hypothesis is the default  Fault state == State of all failure modes/discrepancies  Reasoner works with sets of failure modes (instead of individual failure modes)  Robust algorithm: can tolerate missing/inconsistent alarms  Parsimony principle: Use simplest explanation  Time-dependent diagnosis  Reasoner can be asked to recompute diagnosis upon the advance of time  Extensions:  Modal edges: Propagation happens only if edge is enabled (controlled by system mode)  Diagnosis takes into consideration the last propagation effect Non-monotonic alarms:  Alarm retraction triggers a re-computation of the diagnosis
  • 48. Run-time System Discrete (TFPG) Diagnostics  Additional capabilities:  Intermittent failure modes  Consequence: alarm/s change to ‘Off’  Assumption: low frequency intermittents  Upon alarm changing to ‘Off’, backtrack to last change to ‘On’ and re-evaluate  Maintain alternate branches (for alarms ‘On’ and ‘Off’)  Test alarms: can be considered only after activation  If inactive, it is an un-monitored discrepancy.  If activated, it is used but timing may be inconsistent (re: parent’s timing)  Metrics summary: Plausibility: Robustness:
  • 49. Performance Evaluation  For n failure modes and m discrepancies, maximum number of hypotheses is nm but more likely to be O(n).  Updating hypothesis is polynomial with the number of nodes and exponential w.r.t sensor faults. Model #C #FM #D #A #M #P #R Avg. Time (sec) #1 15 36 48 21 0 120 1 0.000311 #2 11 36 120 174 27 3 1 0.000445 #3 153 481 1973 270 9 3409 1 0.013589 #4 24 64 116 116 0 695 4 0.016 #5 21 100 282 69 0 431 18 0.00288 • Keys: #C – Number of Components / #FM – Failure Modes/ #D – Discrepancies/ #A – Alarms/ #M – Modes/ #P – Propagation links, #R – Regions • Avg. Time = Average Computational Time taken by the reasoner (in seconds) after every event on 2.67GHz Intel Xeon® CPU, 8 GB RAM.
  • 50. Tool Operations 1. Modeling 2. Desktop experimentation, validation 3. Feedback 4. Deployment on embedded platform Model Interpretation
  • 52. Motivation: Software as Failure Source? Qantas 72 - Oct 7, 2008 – A330 (Australia) – ATSB Report At 1240:28, while the aircraft was cruising at 37,000 ft, the autopilot disconnected. From about the same time there were various aircraft system failure indications. At 1242:27, while the crew was evaluating the situation, the aircraft abruptly pitched nose-down. The aircraft reached a maximum pitch angle of about 8.4 degrees nose-down, and descended 650 ft during the event. After returning the aircraft to 37,000 ft, the crew commenced actions to deal with multiple failure messages. At 1245:08, the aircraft commenced a second uncommanded pitch-down event. The aircraft reached a maximum pitch angle of about 3.5 degrees nose-down, and descended about 400 ft during this second event. At 1249, the crew made a PAN urgency broadcast to air traffic control, and requested a clearance to divert to and track direct to Learmonth. At 1254, after receiving advice from the cabin of several serious injuries, the crew declared a MAYDAY. The aircraft subsequently landed at Learmonth at 1350. The investigation to date has identified two significant safety factors related to the pitch-down movements. Firstly, immediately prior to the autopilot disconnect, one of the air data inertial reference units (ADIRUs) started providing erroneous data (spikes) on many parameters to other aircraft systems. The other two ADIRUs continued to function correctly. Secondly, some of the spikes in angle of attack data were not filtered by the flight control computers, and the computers subsequently commanded the pitch-down movements. http://www.atsb.gov.au/publications/investigation_reports/2008/AAIR/pdf/AO2008070_interim.pdf
  • 53. Understanding the Problem Embedded software is a complex engineering artifact that can have latent faults, uncaught by testing and verification. Such faults become apparent during operation when unforeseen modes and/or (system) faults appear. The problem:  General: How to construct a Software Health Management system that detects such faults, isolates their source/s, prognosticates their progression, and takes mitigation actions in the system context?  Specific: How to specify, design, and implement such a system using a model-based framework? The larger picture:  General: Software Health Management must be integrated with System Health Management – ‘Software Health Effects’ must be understood on the System (Vehicle) Level.
  • 54. What is ‘Systems Health Management’ ? The ‘on-line’ view: 1. Detection of anomalies in system or component behavior 2. Identification and isolation of the fault source/s 3. Prognostication of impending faults that could lead to system failures 4. Mitigation of current or impending fault effects while preserving mission objective/s Reports Observations Corrections Detection Isolation Prognostics Mitigation Examples: - Automotive OBD (detection) - Boeing 777 CMC (detection + isolation) - Spacecraft fault protection (detection + isolation + mitigation) - Aircraft fleet (detection + isolation + prognostics)
  • 55. Software Health Management  Software is a complex engineering artifact.  Software can have latent faults.  Faults appear during operation when unforeseen modes or interactions happen.  Techniques like Voting and Self- Checking pairs have shortcomings  Common mode faults  Fault cascades • SHM is the extension of FDIR techniques used in Physical systems to Software. Stimuli Responses Fault mitigation Fault detection Environmental Assumptions Observed Behavior Domain Assumptions Observed Inputs Fault isolation
  • 56. Why ‘Software Health Management’?  Complexity of systems necessitates an additional layer ‘above’ SFT that manages ‘Software Health’  Embedded software ….  is a crucial ingredient in aerospace systems  is a method for implementing functionality  is the ‘universal system integrator’  could exhibit faults that lead to system failures  complexity has progressed to the point that zero-defect systems (containing both hardware and software) are very difficult to build  Systems Health Management is an emerging field that addresses precisely this problem: How to manage systems’ health in case of faults ?  ‘Software Health Management’ is not…  A replacement for existing and robust engineering processes and standards (DO-178B)  A substitute for hardware- and software fault tolerance  An ‘ultimate’ solution for fault tolerance
  • 57. Software Health Management Key ideas  Use software components as units of fault management: detection, diagnosis, and mitigation  Components must be observable, provide fault isolation, and be capable of mitigation  Use a two-level architecture:  Component level: detect anomalies and mitigate locally  System level: received anomaly reports, isolate faulty component(s), and mitigate on the component  Use models to represent  anomalous conditions  fault cascades  mitigation actions (when / what)  Use model-based generators to synthesize code artifacts  Developer can use higher-level abstractions to design and implement the software health management functions of a system
  • 58. Software Component Framework The Component Model should enable:  Monitoring  Interfaces (synchronous/asynchronous calls)  Component state  Scheduling and timing (WCET)  Resource usage  Anomaly Detection via:  Pre/post conditions over call parameters, rates, and component state  Conditions over (1) timing properties, (2) resource usage (e.g. memory footprint), and (3) usage patterns  Combinations of the above  Mitigation:  Given detected anomaly and state of the component take action  Can be time- or event-triggered  Actions: restart, initialize, block call, inject value, inject call, release resource, modify state; checkpoint/restore, combination of the above
  • 59. Notional Component Model Parameter Component Resource Trigger Subscribe (Event) Publish (Event) Provided (Interface) Required (Interface) State A component is a unit (containing potentially many objects). The component is parameterized, has state, it consumes resources, publishes and subscribes to events, provides interfaces to and requires interfaces from other components. Publish/Subscribe: Event-driven, asynchronous communication (publisher does not wait) Required/Provided: Synchronous communication using call/return semantics. Triggering can be periodic or sporadic. Extension of a Component Model defined by OMG (CCM) : state, resource, trigger interfaces.
  • 60. Example: Component Interactions Sampler Component GPS Component Display Component P S S Components can interact via asynchronous/event-triggered and synchronous/call-driven connections. Example: The Sampler component is triggered periodically and it publishes an event upon each activation. The GPS component subscribes to this event and is triggered sporadically to obtain GPS data from the receiver, and when ready it publishes its own output event. The Display component is triggered sporadically via this event and it uses a required interface to retrieve the position data from the GPS component.
  • 61. Component Monitoring Component Monitor arriving events Monitor incoming calls Monitor published events Monitor outgoing calls Observe state Monitor resource usage Monitor control flow/ triggering
  • 62. ACM: The ARINC Component Model  Provide a CCM-like layer on top of ARINC-653 abstractions  Notional model:  Terminology:  Synchronous: call/return  Asynchronous: publish-return/trigger-process  Periodic: time-triggered  Aperiodic: event-triggered  Note:  All component interactions are realized via the framework  Process (method) execution time has deadline, which is monitored
  • 63. ACM: The ARINC Component Model  Each ‘input interface’ has its own process  Process must obtain read-write/lock on component  Asynchronous publisher (subscriber) interface:  Listener (publisher) process  Pushes (receives) one event (a struct), with a validity flag  Can be event-triggered or time-triggered (i.e. 4 variations)  Synchronous provided (required) interface:  Handles incoming synchronous RMI call  Forwards outgoing synchronous RMI call  Other interfaces:  State: to observe component state variables  Resource: to monitor resource usage  Trigger: to monitor execution timing
  • 64. ACM: A Prototype Implementation  ARINC-653 Emulator  Emulates APEX services using Linux API-s  Partition  Process, Process  Thread  Module manager: schedules partition set  Partition level scheduler: schedules threads within partition  CORBA foundation  CCM Implementation  No modification  ACM component interactions  Mainly implemented via APEX  RMI interactions use threads
  • 65. Implementation: Mapping ACM to APEX APEX - Abstractions Platform (Linux) Module Host/Processor Partition Process Process Thread ACM: APEX Component Model APEX APEX Concept Used Component method Periodic Periodic process Process start, stop Semaphores Sporadic Aperiodic process Invocation Synchronous Call-Return Periodic Target Co-located N/A Non-co-located N/A Sporadic Target Co-located Caller method signals callee to release then waits for callee until completion. Event, Blackboard Non-co-located Caller method sends RMI (via CM) to release callee then waits for RMI to complete. TCP/IP, Semaphore, Event Asynchronous Publish-Subscribe Periodic Target Co-located Callee is periodically triggered and polls ‘event buffer’ – validity flag indicates whether data is stale or fresh Blackboard Non-co-located Sampling port, Channel Sporadic Target Co-located Callee is released when event is available Blackboard, Semaphore, Event Non-co-located Caller notifies via TCP/IP, callee is released upon receipt Queuing port, Semaphore, Event
  • 66. ACM: Modeling Language  Modeling elements:  Data types: primitive, structs, vectors  Interfaces: methods with arguments  Components:  Publish/Subscribe ports (with data type)  Provided/Required interfaces (with i/f type)  Health Manager  Assemblies  Deployment  Modules, Partitions  Component  Partition
  • 67. Example: Sensor/GPS/Display get gps_data_source data_in invokes Component Port Period Time Capacity Deadline Sensor data_out 4 sec 4 sec Hard GPS data_out aperiodic 4 sec Hard GPS data_in 4 sec 4 sec Hard GPS gps_data_src aperiodic 4 sec Hard Navdisplay data_in aperiodic 4 sec Hard Navdisplay gps_data_src aperiodic 4 sec Hard component NavDisplay { consumes SensorOutput data_in ; //APERIODIC uses GPSDataSource gps_data_source ;} ; data_out component Sensor { Publishes SensorOutput data_out ; }; data_out GPS get gps_data_src GPSValue data_in reads invokes updates reads Nav Display Sensor component GPS { publishes SensorOutput data_out ; //APERIODIC consumes SensorOutput data_in ; //PERIODIC provides GPSDataSource gps_data_src ; }; struct SensorOutput { Timespec time ; SensorData data ; }; struct SensorData { FLOATINGPOINT alpha ; FLOATINGPOINT beta ; FLOATINGPOINT gamma ; }; struct Timespec { LONGLONG tv_sec ; LONGLONG tv_nsec ; }; interface GPSDataSource { void getGPSData (out GPSData d); };
  • 68. Anomaly Detection  Model-Based Specification of monitoring expressions  Post/Pre condition violations: threshold, rate, custom filter (moving average)  Resource Violations: Deadline  Validity Violation: Stale data on a consumer  Concurrency Violations: Lock timeouts.  User code violations: reported error conditions from application code.  Code Generators  Synthesize code for implementing the monitors Monitor arriving events Monitor incoming calls Monitor published events Monitor outgoing calls Observe state Monitor resource usage Monitor control flow/ triggering Port Monitors Non-Port Monitors • Based on these local detection, each component developer can implement a local health manager • It is a reactive timed state machine with pre specified actions. • All alarms, actions are reported to the system health manager
  • 69. ACM: Modeling Language: Monitoring  Monitoring on component interfaces  Subscriber port  ‘Subscriber process’ and Publisher port  ‘Publisher process’  Monitor: pre-conditions and post-conditions  On subscriber: Data validity (‘age’ of data)  Deadline (hard / soft)  Provided interface  ‘Provider methods’ and Required interface  ‘Required methods’  Monitor: pre-conditions and post-conditions  Deadline (hard / soft)  Can be specified on a per-component basis  Monitoring language:  Simple, named expressions over input (output) parameters, component state, delta(var), and rate(var,dt). The expression yields a Boolean condition. 74
  • 70. Component-level Health Management  Manager’s behavioral model:  Finite-state machine  Triggers: monitored events, time  Actions: mitigation activities  Manager is local to component container (for efficiency) but shall be protected from the faults of functional components  Notional behavior:  Track component state changes via detected events and progression of time  Take mitigation actions as needed  Design issues:  Co-location with component (fault containment)  Local detection may implicate another component Component Monitor WCET Component Framework Manager Actions Events Events Idle Exec InvA start finish timeout /init invA_violation /reset
  • 71. ACM - Modeling Language: Component Health Manager  Reactive Timed State Machine  Event trigger:  Predefined conditions (e.g. deadline violation, data validity validation)  User-defined conditions (e.g. pre-condition violation)  Reaction: mitigation action (start, reset, refuse, ignore, etc.)  State: current state of the machine  (Event X State)  Action
  • 72. Component Health Management Available Actions Component Health Manager (High priority ARINC-653 process) Error Message /Action HM Response Component NOMINAL ERROR CHECK RESULT FAILURE Action Successful Timeout or Action Failed B U F F E R Incoming Events Component Port (653 PRocess) PPrroocceessss 3 1 HM Response BBlalacckkBBooaardrd BlackBoard Blocking Read Architecture
  • 73. Assembly Definition Validity(GPS.data_in)<4ms Delta(Nav.data_in.time)>0 Rate(gps_data_src.data)>1 Specified Monitoring Conditions  The Sensor component is triggered periodically and it publishes an event upon each activation.  The GPS component subscribes to this event and is triggered periodically to obtain GPS data from the receiver. It publishes its own output event.  The Nav Display component is triggered sporadically via this event and it uses a required 78 Model-Based Software Health Management interface to retrieve the position data from the GPS component.
  • 74. System-level Health Management  Focus issue: Cascading faults  Hypothesis: Fault effects cascade via component interactions  Anomalies detected on the component level are not ‘diagnosed’  can be caused by other components  Problem:  How to model fault cascades?  How to diagnose and isolate fault cascade root causes?  How to mitigate fault cascades?
  • 75. Recap: Fault diagnosis  Model: Timed Failure Propagation Graphs Modeling variants •Untimed, causal network (no modes, propagation = [0..inf]) •Modal networks: edges are mode dependent •Timed models •Hierarchical component models Nodes: •Failure modes •Discrepancies • AND/OR (combination) • Monitored (option) Edges: •Propagation delay: [min, max] •Discrete Modes (activation) Example models (#components, #failuremodes, #alarms) •Trivial examples •Simplified fuel system (~30,~80,~100) •Realistic fuel system (~200,~400,~600) •Aircraft avionics (~2000,~8000,~25000) – generated
  • 76. Recap: Fault diagnosis Fault diagnosis algorithm: • Outline: – Check if new evidence is explained by current hypotheses. – If not, create a new hypothesis that assumes a hypothetical state of the system consistent with observations – Rank hypotheses for plausibility and robustness metrics – Discard low-rank hypotheses, keep plausible ones Fault state: ‘total state vector’ of the system, i.e. all failure modes and discrepancies Alarms could be Missing: should have fired but did not Inconsistent: fired, but it is not consistent with the hypothesis Robust diagnostics: tolerates missing and inconsistent alarms Metrics: Plausibility: how plausible is the hypothesis w.r.t. alarm consistency Robustness: how likely is that the hypothesis will change in the future
  • 77. Modeling Cascading Faults  Not needed - the cascades can be computed from the component assemblies, if the anomaly types and their interactions are known.  Component ‘elements’ Every method belongs to one of these (7)  Fault cascades within component (A few of the 38 patterns)
  • 78. Modeling Cascading Faults  Inter-component propagation is regular – always follows the same pattern  Intra-component propagation depends on the component!  Need to model internal dataflow and control flow of the component. Note: Could be determined via source code analysis.
  • 79. Modeling Cascading Faults  Fault Propagation Graph for GPS Example  Here: hand-crafted, but it is generated automatically in the system
  • 80. System-level Fault Mitigation  Model-based system-level mitigation engine  Model-based diagnoser is automatically generated  Designer specifies fault mitigation strategies using a reactive state machine Advantages: Diagnoser Engine Mitigation Engine • Models are higher-level programs to specify (potentially complex) D D FM behavior – more readable and comprehensible •Models lend themselves to formal analysis – e.g. model Managed Component checking Component CHM Component Platform Managed Component Component CHM Component Fault Model Component Fault Model FM FM FM FM D D D D D
  • 81. System-level Fault Mitigation  Model-based mitigation specification at two levels  Component level: quick action  System level: Reactive action taking the system state into consideration  System designer specifies them as a parallel timed state machine.  Fixed set of mitigation actions are available  Runtime code is generated from models  Advantages:  Models are higher-level programs to specify (potentially complex) behavior – more readable and comprehensible  Models lend themselves to formal analysis – e.g. model checking List of predefined Mitigation Actions HM Action Semantics CLHM: IGNORE Continue as if nothing has happened CLHM:ABORT Discontinue current operation, but opera-tion can run again CLHM: USE PAST DATA Use most recent data (only for operations that expect fresh data) CLHM: STOP Discontinue current operation Aperiodic processes (ports): operation can run again Periodic processes (ports): operation must be enabled by a future START HM action CLHM: START Re-enable a STOP-ped periodic operation CLHM RESTART A Macro for STOP followed by a START for the current operation SLHM: RESET Stop all operations, initialize state of component, start all periodic operations SLHM: STOP Stop all operations Diagnoser Engine Mitigation Engine Alarms Alarms Alarms
  • 82. System-level Health Management Functional components  1. Aggregator:  Integrates (collates) health information coming from components (typically in one hyperperiod)  2. Diagnoser:  Performs fault diagnosis, based on the fault propagation graph model  Ranks hypotheses  Component that appears in all hypotheses with the highest rank is chosen for mitigation  3. Response Engine:  Issues mitigation actions to components based on diagnosis results  Based on a state machine model that maps diagnostic results to mitigation actions These components are generated automatically from the models The Health Management Approach: 1. Locally detected anomalies are mitigated locally first. – Quick reactive response. 2. Anomalies and local mitigation actions are reported to the system level. 3. Aggregated reports are subjected to diagnosis, potentially followed by a system-level mitigation action. 4. System-level response commands are propagated down to components.
  • 83. Example: 2005 Malaysian Air Boeing 777 in-flight upset  Low airspeed advisory.  Airplane’s autopilot experienced excessive acceleration values.  Vertical acceleration decreased to -2.3g within ½ second  Lateral acceleration decreased to -1.01g (left) within ½ second  Longitudinal acceleration increases to +1.2 g within ½ second  Autopilot pitched nose-up to 17.6 degree and climbed at a vertical speed of 10,650 fpm.  Airspeed reduced to 241 knots.  Stick shaker activated at top of the climb.  Aircraft descended 4,000 ft.  Re-engagement of autopilot followed by another climb of 2,000 ft.  Maximum rate of climb = 4440 fpm.
  • 84. B777 ADIRU Architecture • Designed to be serviceable with one fault in each FCA • Can fly but maintenance required upon landing with two faults in each FCA • Each ARINC 629 end unit voted on the processor data bit-by-bit. • Processors monitor the ARlNC 629 modules by full data wrap-around • Processors also monitor the power supplies, any one of which can power the entire unit • Accelerometer and gyro in skewed redundant configuration • A S(econdary)AARU also provided inertial data Based on Air Data Inertial Reference Unit (ADIRU) Architecture (ATSB, 2007, p.5)
  • 85. Cause of Inflight Upset  June 2001: accelerometer 5 fails with high output value, ADIRU disregards it.  A power cycle on ADIRU occurs. A latent software bug disregards the faulty status of accelerometer 5.  Status of failed unit was recorded on-board maintenance memory, but that memory was not checked by the software.  An inflight fault was recorded in accelerometer 6 and it was disregarded.  FDI software allowed use of accelerometer 5.  High acceleration value was passed to all computers.  Due to common-mode nature of fault, voters allowed high accelerometer data to go on all channels.  This high value was used by primary flight computer.  Mid value select function used by the flight computer lessened the effect of pitch motion. Pro blem: System relied on redundancy to mask a fault. But due to latent software bug and common-mode fault, the effect cascaded into the system failure Reading Material: The dangers of failure masking in fault-tolerant software: aspects of a recent in-flight upset event C.W. Johnson and C.M. Holloway, IET Conf. Pub. 2007, 60 (2007), DOI:10.1049/cp:20070442
  • 86. Case Study • Modeled the architecture as a software component assembly • Created the fault scenario • Only modeled part of the system to illustrate the point of SHM • Accelerometers are arranged on six faces of a dodecahedron. Used for regression Equations
  • 88. ADIRU Assembly (Processors) Observer tracks the age of accelerometer data. Specified as timed state machine (with timeout) Runs at 20 Hz
  • 89. ADIRU Assembly (Voters) Runs at 20 Hz
  • 90. ADIRU Assembly (Display- Mimics PFC) Runs aperiodically
  • 91. Deployment Model Each Module is a processor running the ARINC Component Runtime Environment
  • 92. Execution Accelerometers Machine – durip02 SHM Machine – durip09 ADIRU Processors ADIRU Computers Machine – durip03 Voter + Display Computer Machine – durip06 Accelerometers SHM VOTERS + DISPLAY
  • 93. System Health Manager other machines have similar specification These components are auto generated The hypothesis generated by the diagnoser is translated to Component(s) that is most likely faulty. This list is fed to Response Engine, which triggers the mitigation state machine
  • 94. Demonstration  Fault Scenario  Accelerometer 5 has initial fault  It is started which causes an alarm  Then Accelerometer 6 develops fault  Successful mitigation  Identifying the faulty components  Stopping the fault components  Processors can still function with four accelerometers.
  • 97. Resilience and autonomy  Model-based Software Health Management  Requires explicit specification of component-level and system-level health management (recovery) actions  Complex and error-prone… too many options!  Resilient systems should recover autonomously  Concepts:  Model the system architecture + functions.  Express what is needed from the system to implement functions.  Embed models into the run-time system  Use a reasoner to figure out how to recover function upon failures
  • 98. Modeling Functional Requirements for IMU  Inertial Position • Determine inertial position. • Functional Requirement (AND)  GPS Position  Position Tracking  GPS Position • Sense GPS position for computing Inertial Position  Position Tracking • Continuously track position to compute Inertial position • Functional Requirement  Body Acceleration Measurement  Body Acceleration Measurement • Sense body acceleration for Position Tracking. Inertial Position GPS Position Position Tracking Body Acceleration Measurement
  • 100. Modeling the Architecture Function Allocation Body Acceleration Measurement EXACTLY ONE (Primary /Secondary ADIRU Subsystem) ADIRU Subsystem has • Accelerometers (6) • ADIRU Computers (4) • Voters (3) Functional / Operational ADIRU Subsystem requires • ATLEAST 4 of 6 Accelerometers • ATLEAST 2 of 4 Filters or ADIRU computers • ATLEAST 1 of 3 Voter Inside one ADIRU:
  • 101. Modeling the Architecture Function Allocation GPS Position  EXACTLY ONE (Primary/Secondary GPS Subsystem)  GPS Subsystem includes  GPS Receiver (1)  GPS Processor (1)  Functional / Operational GPS subsystem requires  EXACTLY ONE of GPS Receiver  EXACTLY ONE of GPS Processor Inside one GPS Subsystem:
  • 102. Modeling the Architecture Function Allocation POSITION TRACKING  ATLEAST ONE OF ( LEFT/ CENTER/ RIGHT PFC NavFilter Subsystem)  PFC NavFilter Subsystem includes  PFC Nav Filter (1)  PFC Processor (1)  Functional/ Operational Requirement for PFC Subsystem  EXACTLY ONE PFC NavFilter  EXACTLY ONE PFC Processor Inside one PFC Subsystem:
  • 103. Component Operational Requirement  EXPLICIT – Local dependency  Display Subsystem  ATLEAST 1 of 3 Consumers (Left, Center, Right)  EXPLICIT – Local dependency  ADIRU Computer inside ADIRU Subsystem  ATLEAST 4 of 6 Consumer Port Implies  ATLEAST 4 of 6 Accelerometer Components
  • 104. Component Operational Requirement  IMPLICIT – Inferred dependency  PFC NavFilter in PFC Subsystem  EXACTLY 1 of 1 Consumer Port AND  ATLEAST 1 of 1 Requires Port Implies  EXACTLY 1 of 2 ADIRU Subsystems AND  ATLEAST 1 of 2 GPS Subsystem
  • 105. Component Operational Requirement  IMPLICIT – Inferred dependency  PFC Processor inside PFC Subsystem  EXACTLY 1 of 1 Consumer Port Implies  EXACTLY 1 of 1 PFC NavFilter  GPS Processor inside GPS Subsystem  EXACTLY 1 of 1 Consumer Port Implies  EXACTLY 1 of 1 GPS Receiver
  • 106. Modeling the problem: Boolean SAT Functional Requirements + Function allocation + Component operational requirements + Component states  Encoded as Boolean (CNF) Expression for SATisfiability problem Solution: valid component architecture Size: #Variables: 493/ #Clauses: 1776 FAULT / Scenario SAT solver - RECONFIG COMPUTE Time (s) RECONFIG COMMANDS Verifying Initial State 0.004228 No commands. Initial State accepted as satisfying/ meeting functional requirements.
  • 108. Fault: ADIRU Accelerometer  Fault introduced, anomaly detected, fault source component diagnosed, then:  Compute the new component architecture that satisfies the functional requirements AND minimizes the number of reconfiguration changes FAULT / Scenario SAT solver - RECONFIG COMPUTE Time (s) RECONFIG COMMANDS Primary_ ADIRU_Subsystem_ Accelerometer6 0.002989 STOP Primary_ADIRU_Subsystem_Accelerometer6 Primary_ ADIRU_Subsystem_ Accelerometer5 0.003151 STOP Primary_ADIRU_Subsystem_Accelerometer5
  • 109. Primary ADIRU Subsystem Partial fault – Primary still functional
  • 110. ADIRU Accelerometer Fault (contd.)  3rd fault  failover to secondary ADIRU FAULT / Scenario SAT solver - RECONFIG COMPUTE Time (s) RECONFIG COMMANDS Primary_ ADIRU_Subsystem_ Accelerometer4 0.020825 STOP Primary_ADIRU_Subsystem_Accelerometer4 STOP Primary_ADIRU Subsystem (stop all accelerometers, ADIRU computers, Voters in Primary ADIRU subsystem) START Secondary_ADIRU Subsystem (start all accelerometers, ADIRU computers, Voters in Secondary ADIRU subsystem)
  • 111. Primary ADIRU Subsystem Complete fault
  • 112. Primary ADIRU Subsystem Faulty Failover to secondary ADIRU
  • 113. GPS Fault FAULT / Scenario SAT solver - RECONFIG COMPUTE Time (s) RECONFIG COMMANDS Primary_ GPS_Subsystem_ GPSProcessor 0.004720 STOP Primary_GPS_Subsystem (stop GPS Receiver, GPS Processor) START Secondary_GPS Subsystem (start GPS Receiver, GPS Processor)
  • 115. Reconfiguration after Primary GPS Subsystem becomes faulty
  • 116. PFC NavFilter Faults FAULT / Scenario SAT solver - RECONFIG COMPUTE Time (s) RECONFIG COMMANDS Left_ PFC_Subsystem_ PFCNavFilter 0.003107 STOP Left_PFC_Subsystem ( stop PFCNavFilter, PFC Processor) Right_ PFC_Subsystem_ PFCNavFilter 0.003089 STOP Right_PFC_Subsystem ( stop PFCNavFilter, PFC Processor)
  • 119. Research challenges  Modeling and engineering of r-CPS  Modeling paradigm / verification paradigm / synthesis  Verify recoverability under all scenarios  Efficient recovery  Analytics:  Comparing architectures and solutions  Resilience against…  Cascading, cross-domain faults  Cyber attacks possibly with physical faults  Engineering process  ‘Simian army’ or systematic design?  Principles of multi-layer resilience