Dezfuli.h

A ―Systems/Case-based‖ Approach to
System Safety

Presented at NASA Project Management Challenge 2012

February 22-23, 2012

Homayoon Dezfuli, Ph.D.
NASA Technical Fellow for System Safety
Office of Safety and Mission assurance (OSMA)
NASA Headquarters

Introduction
• We have developed a System Safety
Framework under which system safety
activities are conducted and
communicated
• The three elements of the framework
are:
– Safety objectives
– System safety activities
– Risk-Informed Safety Case (RISC)

• Guidance on the System Safety Framework is contained in the
NASA System Safety Handbook – Volume 1: System Safety
Framework and Concepts for Implementation (NASA/SP-2010-
580)
• Volume 1 will be followed by Volume 2 on methods
Presented by Homayoon Dezfuli
2

Motivation
• Development of the System Safety Framework is motivated by
a desire to:
– Foster a systems view of safety (i.e., a holistic, systems
engineering view of safety)
– Improve integration and effectiveness of system safety activities
– Establish a process for defining ―adequate safety‖
– Establish a means for presenting a coherent case for the safety
of the system to decision makers
– Establish a process that is compatible with the growing trend
toward insight/oversight relationships with commercial providers

3

Safety Objectives

4

What is Safety?

“Safety is freedom from those conditions that can cause
death, injury, occupational illness, damage to or loss of
equipment or property, or damage to the environment”
NPR 8715.3

• The specific scope of safety is application-specific, and must
be clearly defined by the stakeholders in terms of the entities
to which it applies and the consequences against which it is
assessed
• The degree of safety that is considered acceptable is also
application-specific
– We strive to attain a degree of safety that fulfills obligations to
the at-risk communities and addresses agency priorities
– We do not expect to attain absolute safety (nor consider it
possible to do so)

5

Adequate Safety
• Achieving an adequately safe system requires adherence to the following
fundamental safety principles:
– The system meets or exceeds a minimum tolerable level of safety. Below this
level the system is considered unsafe
– The system is as safe as reasonably practicable (ASARP)

Achieve an
adequately safe
system

Achieve a system that
Achieve a system that is
meets or exceeds the
as safe as reasonably
minimum tolerable level
practicable (ASARP)
of safety

Operate the system
Design the system to Build the system to Design the system to Build the system to Operate the system
to continuously meet
meet or exceed the meet or exceed the be as safe as be as safe as to continuously be as
or exceed the
minimum tolerable minimum tolerable reasonably reasonably safe as reasonably
minimum tolerable
level of safety level of safety practicable practicable practicable
level of safety

• The minimum tolerable level of safety is not necessarily static, and may
evolve over the course of the system life cycle
• The principles of adequate safety must be maintained throughout all
phases of the system life cycle
6

NASA Safety Thresholds & Goals
• NASA‘s minimum level of tolerable safety for human spaceflight
missions is articulated in NASA‘s agency-level safety goals and
thresholds for crew transportation system missions to the ISS
• They reflect a tolerance for an initial safety performance that is
acceptable initially but below long-term expectations

7

NASA Safety Thresholds and Goals:
Accounting for the Unknowns
“ There are known knowns; there are things we know we know.
We also know there are known unknowns; that is to say we know there are some
things we do not know.
But there are also unknown unknowns – the ones we don't know we don't know. ”
— Former United States Secretary of Defense Donald Rumsfeld, 2002

• Meeting quantitative safety requirements means more than simply showing
that the known-known risks do not exceed the applicable goal or threshold
• We must also be able to show that:
– The known-unknowns (risks that have been identified but are not quantifiable) and the
unknown-unknowns (risks that exist but have not been identified) are bounded
– The bounds of the unknowns do not threaten the quantitative safety requirements

• Methods for doing this include:
– Reliability growth analyses of US vehicles and other countries‘ vehicles
– Analyses of historical precursors and anomalies

8

As Safe As Reasonably Practicable (ASARP)
“ASARP entails weighing the safety performance of a
system against the sacrifice needed to further improve
it. A system is ASARP if an incremental improvement in
safety would require a disproportionate deterioration of
system performance in other areas.”
From SS Handbook

• The ASARP concept is closely related to the ―as low as reasonably
achievable‖ (ALARA) and ―as low as reasonably practicable‖ (ALARP)
concepts that are found in U.S. nuclear applications and U.K. Health and
Safety law
• ASARP implies that:
– A comprehensive spectrum of alternative means for achieving operational
objectives has been identified
– The performance of each alternative has been analyzed to determine the relative
gains and losses in performance (operational effectiveness, safety, cost, and
schedule) that would result from selecting one alternative over another
– Safety performance is given priority in the selection of an alternative, insofar as
the selection is within operational constraints
9

ASARP (Cont.)

• The ASARP region contains those
alternatives whose safety
performance is as high as can be
achieved without resulting in
intolerable performance in one or
more of the other mission
execution domains

– ASARP is a region of the trade space and can contain more than one
specific alternative
– The ASARP concept makes no explicit reference to the absolute value
of a system‘s safety performance
– Improvements to cost, schedule, or technical performance beyond
minimum tolerable levels are not justifiable if they come at the expense
of safety performance

10

Deriving Operational Safety Objectives
• The fundamental safety principles set the stage for the further
development of safety objectives, negotiated on an
application-specific basis
• Safety objectives are developed using an objectives hierarchy
down to a level where they can be clearly addressed by
systems safety activities, thereby creating a link that:
– Assures that system safety activities are directed towards
accomplishing defined safety objectives
– Enables the system safety activities to be assessed in terms of
the degree to which their target safety objectives have been met

• The safety objectives at the bottom level of the objectives
hierarchy represent the operational definition of safety for the
system under consideration, and are referred to as
operational safety objectives

11

System Safety Objectives Hierarchy

12

System Safety Activities

13

System Safety Activities as a Part of the
System Safety Framework

• System safety activities are conducted as part of the overall
systems engineering technical process activities

• System safety activities are designed to promote the development
of safe systems and to provide evidence to help demonstrate
through the Risk-Informed Safety Case (discussed later) that the
stated system safety objectives have been achieved

System Safety
Objectives
(define safety)
RISC
Evaluation
Risk-Informed Safety Case (demonstrate safety)
(confirm
safety)
(achieve safety)

14

System Safety Activities – Early Design

1. Initial constraints focus on
applicable safety requirements,
design alternatives, operational
constraints, and risk tolerances
2. The RIDM process provides
models and results to evaluate
trade-offs in the search for a final
design that is ASARP
3. ISAs (integration of hazard
analysis, physical response
7. Inform
analysis, and probabilistic 8. Allocate 6. Initialize

analysis) provide input needed to
demonstrate the system meets
quantitative safety requirements
2. Conduct RIDM
4. Under the ASARP objective, trade 1. Set Initial Constraints 5. Select Design

studies are performed to examine
how variations (e.g., in design)
affect not only safety but also the
other mission execution domains
3. ISAs 4. Trades

Note: Interfaces are shown in some cases by nesting rather than by arrows.
The nesting format automatically implies an arrow from the smaller activities
within the nest to the larger activity surrounding it.
15

System Safety Activities – Early Design (Cont.)
5. The process of down-selecting
from the design alternatives to
one particular design concept is
conducted through a risk-
informed deliberation by the
decision makers
6. The initialization role of CRM is to
complete the risk modeling
started during RIDM to include all
hazards and associated scenarios
that affect the risks
7. Informed compliance with
requirements that have been 8. Allocate
7. Inform
6. Initialize
developed historically and are
recognized as best practices in
their engineering disciplines tend
to provide protection against 2. Conduct RIDM
1. Set Initial Constraints 5. Select Design
known unknowns and unknown
unknowns
8. The process for determining
lower level performance
requirements involves a risk- 3. ISAs 4. Trades

informed allocation of
requirements from system to sub-
system level

16

System Safety Activities – Detailed Design
Design a safe system
during detailed design

Safety Detailed Design
1. During detailed
design, the role of

Objectives
CRM evolves to Design the system to
Design the system to be
as safe as reasonably
include the meet or exceed the
practicable
minimum tolerable
development and level of safety
implementation of
new controls when Maintain Minimize the
Be responsive Comply with
needed to allocation of Risk-inform
to new
introduction of
levied
requirements design potentially
counteract any new information requirements
consistent with solution adverse
during system that affect
achievable safety decisions conditions during
or changed risks performance
design
system design
safety

2. Program controls
and commitments
include RISC Evaluation
RISC
management Confirms Safety

activities to
(within Systems Engineering)

promote an

environment within 1. Conduct CRM (analytic deliberative process)
Conduct CRM (analytic deliberative process) 2. Program control & commitments
Program control and commitments
Also conduct RIDM ififmajor re-planningis needed
Also conduct RIDM major re-planning is needed
which design
Implement
opportunities for Maintain risk analysis of system
communication
performance Management Conduct
improving safety Conduct
Control proactively
protocols,
verification
research configuration
without incurring Maintain Maintain other and
identified seeks net-
management,
and validation
mission exe- individual beneficial that safety
unreasonable cost, integrated testing design best
cution domain risks safety requirements
safety programs practices,
schedule, and analysis performance improvements
lessons
are being met
models
technical impacts learned, etc.

are sought out and
implemented
17

Risk-Informed Safety Case

18

Risk-Informed Safety Case (RISC)
• The risk-informed safety case (RISC) is the means by which
the satisfaction of the system‘s safety objectives is
demonstrated and communicated to decision makers at major
milestones such as Key Decision Points (KDPs)
• The RISC presents decision makers with a coherent case for
safety, rather than presenting them with a set of individual
safety analysis and safety management products

19

Risk-Informed Safety Case (RISC) (cont.)

“A risk-informed safety case (RISC) is a structured
argument, supported by a body of evidence, that
provides a compelling, comprehensible and valid case
that a system is or will be adequately safe for a given
application in a given environment. This is
accomplished by addressing each of the operational
safety objectives that have been negotiated for the
system, including articulation of the roadmap for the
achievement of safety objectives that are applicable to
later phases of the system life cycle.”

From NASA/SP-2010-580 (SS Handbook)

• The term ‗risk-informed‘ is used to emphasize that adequate safety is
the result of a deliberative decision making process that involves an
assessment of risks, and strives for a proper balance between safety
performance and performance in other mission execution domains

20

Risk-Informed Safety Case (RISC) (cont.)
• The elements of the RISC are:
– An explicit set of safety claims about the system(s), for example,
the probability of an accident or a group of accidents is lower
than a specified value and/or as low as reasonably practicable
– Supporting evidence for the claim, for example, representative
operating history, redundancy in design, or results of analysis
– Structured safety arguments that link claims to evidence and that
use logically valid rules of inference

• RISCs produced by lower-level organizational units (e.g., sub-
system-level units) can be used as sub-claims of the RISC at the next
higher level of the NASA hierarchy
21

RISC Life Cycle Considerations
• The RISC addresses the full system life cycle, regardless of
the particular point in the life cycle at which the RISC is
developed. This results in two types of safety claims:
– Claims related to the safety objectives of the current or previous
phases argue that the objectives have been met
– Claims related to the safety objectives of future phases argue
that necessary planning and preparation have been conducted,
and that commitments are in place to satisfy the objectives at the
appropriate time

22

Example RISC Safety Claims Derived from
Safety Objectives
• The claims made (and defended) by the RISC dovetail with the
safety objectives negotiated at the outset of system
formulation
• RISC Design Claims Derived from Design Objectives:
The system
design is
adequately safe

The system design
The system design is
meets or exceeds
as safe as
the minimum
reasonably
tolerable level of
practicable (ASARP)
safety

Appropriate
historically-informed
defenses against Requirements have
Design solution been allocated
unknown and un-
decisions have been consistent with
quantified safety achievable safety
risk informed
hazards have been performance
incorporated into
the design
23

Example RISC Structure The system design meets or
exceeds the minimum tolerable
level of safety

• Claim: The system design
meets or exceeds the An ISA has been properly The ISA shows that the
conducted design solution meets the
minimum tolerable level of allocated safety goal/
threshold requirements.

safety
The design solution has The ISA methods used are Unknown and un-
been sufficiently well appropriate to the level of quantified safety hazards
developed to support the design solution definition do not significantly impact
ISA and the decision context safety performance

Design solution elements:: ISA methods: The design is robust The design minimizes the
ConOps Identify hazards against identified but un- potential for vulnerability to
DRMs comprehensively quantified hazards unknown hazards
Operating Characterize initiating
environments events and system
System schematics control responses
Design drawings probabilistically The design incorporates: The design incorporates:
... Quantify events Historically-informed Minimal complexity
consistent with margins against Appropriate TRL
physics and available comparable stresses items
data Appropriate Proven solutions to
... redundancies the extent possible
Appropriate materials Appropriate
for intended use inspection and
Appropriate maintenance
The ISA analysts are fully inspection and accesses
qualified to conduct the maintenance ...
ISA accesses
...

Adjusted/waived
requirements, standards,
best practices do not
significantly increase
vulnerabilities to unknown/
unquantified hazards
24

Example RISC Structure (cont.)

• Claim: Design solution Design solution decisions
are
risk informed
decisions are risk
informed
RIDM has been conducted The tailored set of
to select the design that requirements, standards,
maximizes safety without and best practices to
excessive performance which the design complies
penalties in other mission supports a design solution
execution domains that is as safe as
reasonably practicable

Stakeholder objectives are The RIDM methods used
understood and are appropriate to the life
The set of applicable There is an appropriate
requirements (or imposed cycle phase and the
requirements, standards, analytical basis for all
constraints) have been decision context
and best practices was adjustments/waivers to
allocated from the level
comprehensively identified requirements, standards,
above
and best practices

RIDM methods:
Identify alternatives
Analyze the risks
associated with each Adjusted/waived
alternative requirements, standards,
Support the risk- best practices:
informed, deliberative Improve the balance
selection of a design between analyzed
alternative performance
measures
Preserve safety
performance as a
priority
The RIDM analysts are Do not significantly
fully qualified to conduct increase
RIDM vulnerabilities to
unknown/
unquantified hazards

25

Example RISC Structure (cont.)
• Claim: Appropriate historically-
informed defenses against Appropriate historically-informed
defenses against unknown and
unknown and un-quantified safety un-quantified safety hazards are
incorporated into the design
hazards are incorporated into the
design
The design is robust The design minimizes the

•
against identified but un- potential for vulnerability to
Claim: Requirements are allocated quantified hazards unknown hazards

consistent with achievable safety
The design incorporates: The design incorporates:
performance Historically-informed Minimal complexity
margins against Appropriate TRL
comparable stresses items
Appropriate Proven solutions to
redundancies the extent possible
Appropriate materials Appropriate
for intended use inspection and
Allocated requirements Appropriate maintenance
are consistent with inspection and accesses
achievable safety maintenance ...
performance accesses
...

Performance requirements Allocated requirements
are consistent with the have been negotiated Adjusted/waived
performance commitments between the requirements requirements, standards,
developed during RIDM owner and the best practices do not
organization responsible significantly increase
for meeting the vulnerabilities to unknown/
requirements unquantified hazards

26

Independent Evaluation of the RISC
• It is good practice for an evaluator to have one or more checklists for determining
whether the evidence is sufficient to support a claim
• The checklist should be organized independently from the RISC and should tend
to be generically applicable rather than application specific

EVALUATION BY ANALYSIS TYPE
ANALYSIS ATTRIBUTE Physical Hazards Individual Aggregate Risk Risk
Responses Risks Risks Drivers Allocations

Important issues are identified and evaluated Grade: Grade: Grade: Grade: Grade: Grade:
Comment: Comment: Comment: Comment: Comment: Comment:
Models are graded according to the importance of the issue Grade: Grade: Grade: Grade: Grade: Grade:
Tests support models and analysis of important issues Grade: Grade: Grade: Grade: Grade: Grade:
Best available models are used for all risk significant issues Grade: Grade: Grade: Grade: Grade: Grade:
Etc.

PROGRAMMATIC CONTROL EVALUATION

Plans related to programmatic controls are comprehensively and clearly documented. Grade:
Comment:

Management will actively promote an environment within which design opportunities for improving safety Grade:
without incurring unreasonable cost, schedule, and technical impacts are sought out and implemented Comment:
during each phase.

Protocols are in place that will promote effective and timely communication among design teams from Grade:
different organizations working on different parts of the system. Comment:

Etc.

27

Putting It All Together

28

Challenges Ahead
• Organizational challenges
– Integrating system safety personnel/activities more closely with
systems engineering, operations management, and risk
management

• Analytical challenges
– Integrating/refining existing analysis activities to support the
development of an integrated safety analysis (ISA)
– Meaningful accounting for unknown and under-evaluated risks in
determining whether safety thresholds and goals have been
achieved

• Procedural and regulatory challenges
– Development of standards and practices for formulating and
evaluating risk informed safety cases (RISCs)
– Development of guidelines for excising unnecessary
requirements while maintaining safety beneficial requirements
29

Independent Evaluation of the RISC
• A flowdown checklist for evaluating the RISC has the advantage of explicitly
showing how arguments based on evidence support claims.
1.0 TOP-LEVEL CLAIM
Safety Performance Measures
This flow-down checklist examines ―how safe‖ the system is (or will be),* how well it is demonstrated, and what is being done to make sure
Safety Performance Requirements
that the top-level safety claim is true (or remains true).* This is the technical basis for the claim:
(including Goal and Threshold)
Evidence, including operating experience, testing, associated engineering analysis, and a comprehensive, integrated design and safety
Engineering Requirements
analysis (IDSA), including scenario modeling using Probabilistic Safety Analysis (PSA)
Process Requirements
A credible set of performance commitments, deterministic requirements, and implementation measures.
*
The nature and specificity of the claim, and the character of the underlying evidence, depend on the life cycle phase at which the safety case is being applied.

The results of analysis have been clearly presented, conditional on an
The design intent is characterized in terms of It has been successfully demonstrated The implementation aspects needed to
explicitly characterized baseline allocation of levels of performance,
design reference missions, CONOPS, and that no further improvements to the achieve the level of safety claimed is
risk-informed requirements, and operating experience. An effective
deterministic requirements to be satisfied. The design or operations are currently net- correctly understood, and the
process for identifying departures from this baseline and/or
design itself is characterized at a level of detail beneficial (as safe as reasonably necessary measures have been
addressing future emergent issues that are not addressed by this
appropriate to the current life cycle phase. practicable). committed to.
baseline has been developed.
1.1 1.2 1.3 1.4

An effective process for
The design for the current life Analyses performed provide the An effective process has been
addressing unresolved and
The design and mission intent cycle phase (including following results: carried out to identify significant It has been confirmed that allocated
non-quantified safety issues safety improvements, but no
is well charctterized.* requirements and controls) is Aggregate risk results (issues invalidating the performance is feasible
well specified.* Dominant accident scenarios candidate measures have been
baseline case) has been identified
1.1.1 1.1.2 Comparison with threshold/ formuulated. 1.4.1
1.2.2 1.3.1
goal
Established baseline for An effective process has been
A reasonable defense developed for monitoring and
precursor analysis It has been demonstrated that further
against unknown safety assuring ongoing satisfaction of
….. issues is included in the improvements in safety would
allocated performance levels, and
design and controls unacceptably affect schedule
1.2.1 there are commitments to implement
Concept of Operation What is credited is reasonable these measures
Design Reference and justifiable
1.2.2.1 1.3.2 1.4.2
Missions
Operation Environments 1.1.2.1
Historically Informed In addition to reviewing existing information sources and A reasonable attempt has been
It has been demonstrated that further
Elements operating experience, the best processes known for identifying improvements in safety would incur
made to identify and prioritize all
The nominal performance and previously unrecognized safety hazards has been applied. significant risks in the risk
1.1.1.1 dynamic responses in design excessive performance penalties
management program
reference phases are well 1.2.1.1 1.3.3 1.4.3
understood and justified
1.1.2.2
The limits of the safety models are recognized, the caliber of
evidence used in the models has been evaluated, and uncertainty An effective process has been
It has been demonstrated that further
developed for evaluating flight and
The performance tailoring and and sensitivity analyses have been performed. improvements in safety would incur
test experience for the presence of
allocation are well understood Completeness issue excessive cost
accident precursors
and justified Understanding of key phenomenology and assumptions
1.3.4 1.4.4
1.1.2.3 1.2.1.3

Hazard controls, crew survival methods (if applicable), deterministic
requirements, and fault protection approaches have been formulated
effectively in a risk-informed manner
1.1.2.4 1.2.1.2

31

Dezfuli.h

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (6)

Similaire à Dezfuli.h

Similaire à Dezfuli.h (20)

Plus de NASAPMC

Plus de NASAPMC (20)

Dernier

Dernier (20)

Dezfuli.h