2. Lecture
outline
• Course
informa+on
– Examina+on:
project
• What
is
a
“safety-‐cri+cal
embedded
system”?
– Embedded
systems
– Real-‐+me
systems
– Safety-‐cri+cal
systems
• Fundamental
concepts
of
dependability
– The
“dependability”
concept
– Threats:
fault,
error,
failure
– ALributes:
reliability,
availability
Lecture
1/2
3. Course
informa+on
• Contact
– Paul
Pop,
course
leader
and
examiner
• Email:
Paul.Pop@imm.dtu.dk
• Phone:
4525
3732
• Office:
building
322,
office
228
• Webpage
– All
the
informa+on
is
on
CampusNet
Lecture
1/3
4. Course
informa+on,
cont.
• Textbook:
Israel
Koren
and
C.
Mani
Krishna,
Fault-‐Tolerant
Systems
Morgan
Kaufmann
• Full
text
available
online,
see
the
link
on
CampusNet
Lecture
1/4
5. Course
informa+on,
cont.
• Lectures
– Language:
English
– 12
lectures
• Lecture
notes:
available
on
CampusNet
as
a
PDF
file
the
day
before
• Dec.
1
is
used
for
the
project
• Two
invited
lectures,
from
Novo
Nordisk
and
Danfoss
• Examina+on
– Project:
70%
report
+
30%
presenta+on
• 7.5
ECTS
points
Lecture
1/5
6. Project
• Milestones
– End
of
September:
Group
registra+on
and
topic
selec+on
• Email
to
paul.pop@imm.dtu.dk
– End
of
October:
Project
report
drae
• Upload
drae
to
CampusNet
– End
of
November:
Report
submission
• Upload
final
report
to
CampusNet
– Last
lecture:
Project
presenta+on
and
oral
opposi+on
• Upload
presenta+on
to
CampusNet
Lecture
1/6
7. Project,
cont.
• Project
registra+on
– E-‐mail
Paul
Pop,
paul.pop@imm.dtu.dk
• Subject:
02228
registra+on
• Body:
– Name
student
#1,
student
ID
– Name
student
#2,
student
ID
– Project
+tle
– Project
details
• Notes
Project
approval
– Groups
of
max.
3
persons
Lecture
1/7
8. Project,
cont.
• Topic
categories
1. Literature
survey
• See
the
“references”
and
“further
reading”
in
the
course
literature
2. Tool
case-‐study
• Select
a
commercial
or
research
tool
and
use
it
on
a
case-‐study
3. Soeware
implementa+on
• Implement
a
technique,
e.g.,
error
detec+on
or
fault-‐tolerance
technique
– Suggested
topics
available
on
CampusNet
Lecture
1/8
9. Project,
cont.
• Examples
of
last
years’
projects
– ARIANE
5:
Flight
501
Failure
– Hamming
Correc+ng
Code
Implementa+on
in
Transmimng
System
– Applica+on
of
a
Fault
Tolerance
to
a
Wind
Turbine
– Guaranteed
Service
in
Fault-‐Tolerant
Network-‐on-‐Chip
– Fault
tolerant
digital
communica+on
– Resilience
in
Mobile
Mul+-‐hop
Ad-‐hoc
Networks
– Fault
tolerant
ALU
– Reliable
message
transmission
in
the
CAN,
TTP
and
FlexRay
Lecture
1/9
10. Project
deliverables
1. Literature
survey
2. Tool
case-‐study
– WriLen
report
– Case-‐study
files
• Structure
– Report
– Title,
authors
• Document
your
work
– Abstract
– Introduc+on
– Body
4. Soeware
implementa+on
– Conclusions
– Source
code
with
comments
– References
– Report
• Document
your
work
Deadline
for
drae:
Deadline
for
final
version
End
of
October
End
of
November
Lecture
1/10
11. Project
presenta+on
&
opposi+on
• Poster
presenta+on
of
project
Deadline:
– 15
min.
+
5
min.
ques+ons
Last
lecture
• Note!
– During
the
presenta+on
you
might
be
asked
general
ques+ons
that
relate
to
any
course
topic
Lecture
1/11
12. Embedded
systems
• Compu+ng
systems
are
everywhere
• Most
of
us
think
of
“desktop”
computers
– PC’s
– Laptops
– Mainframes
– Servers
• But
there’s
another
type
of
compu+ng
system
– Far
more
common...
Lecture
1/12
13. Embedded
systems,
cont.
• Embedded
compu+ng
systems
– Compu+ng
systems
embedded
within
electronic
devices
– Hard
to
define.
Nearly
any
compu+ng
system
other
than
a
desktop
computer
– Billions
of
units
produced
yearly,
versus
millions
of
desktop
units
– Perhaps
50
per
household
and
per
automobile
Lecture
1/13
14. What
is
an
embedded
system?
• Defini+on
– an
embedded
system
special-‐purpose
computer
system,
part
of
a
larger
system
which
it
controls.
• Notes
– A
computer
is
used
in
such
devices
primarily
as
a
means
to
simplify
the
system
design
and
to
provide
flexibility.
– Oeen
the
user
of
the
device
is
not
even
aware
that
a
computer
is
present.
Lecture
1/14
15. Characteris+cs
of
embedded
systems
• Single-‐func+oned
– Dedicated
to
perform
a
single
func+on
• Complex
func+onality
– Oeen
have
to
run
sophis+cated
algorithms
or
mul+ple
algorithms.
• Cell
phone,
laser
printer.
• Tightly-‐constrained
– Low
cost,
low
power,
small,
fast,
etc.
• Reac+ve
and
real-‐+me
– Con+nually
reacts
to
changes
in
the
system’s
environment
– Must
compute
certain
results
in
real-‐+me
without
delay
• Safety-‐cri+cal
– Must
not
endanger
human
life
and
the
environment
Lecture
1/15
16. Func+onal
vs.
non-‐func+onal
requirements
• Func+onal
requirements
– output
as
a
func+on
of
input
• Non-‐func+onal
requirements:
– Time
required
to
compute
output
– Reliability,
availability,
integrity,
maintainability,
dependability
– Size,
weight,
power
consump+on,
etc.
Lecture
1/16
17. Real-‐+me
systems
• Time
– The
correctness
of
the
system
behavior
depends
not
only
on
the
logical
results
of
the
computa+ons,
but
also
on
the
!me
at
which
these
results
are
produced.
• Real
– The
reac+on
to
the
outside
events
must
occur
during
their
evolu+on.
The
system
+me
must
be
measured
using
the
same
+me
scale
used
for
measuring
the
+me
in
the
controlled
environment.
Lecture
1/17
19. Safety-‐cri+cal
systems
• Defini+ons
– Safety
is
a
property
of
a
system
that
will
not
endanger
human
life
or
the
environment.
– A
safety-‐related
system
is
one
by
which
the
safety
of
the
equipment
or
plant
is
ensured.
• Safety-‐cri?cal
system
is:
– Safety-‐related
system,
or
– High-‐integrity
system
Lecture
1/19
20. System
integrity
• Defini+on
– The
integrity
of
a
system
is
its
ability
to
detect
faults
in
its
own
opera+on
and
to
inform
the
human
operator.
• Notes
– The
system
will
enter
a
failsafe
state
if
faults
are
detected
– High-‐integrity
system
• Failure
could
result
large
financial
loss
• Examples:
telephone
exchanges,
communica+on
satellites
Lecture
1/20
21. Failsafe
opera+on
• Defini+on
– A
system
is
failsafe
if
it
adopts
“safe”
output
states
in
the
event
of
failure
and
inability
to
recover.
• Notes
– Example
of
failsafe
opera+on
• Railway
signaling
system:
failsafe
corresponds
to
all
the
lights
on
red
– Many
systems
are
not
failsafe
• Fly-‐by-‐wire
system
in
an
aircrae:
the
only
safe
state
is
on
the
ground
Lecture
1/21
22. Preliminary
topics
• Fundamental
concepts
of
dependability
• Means
of
achieving
dependability
• Hazard
and
risk
analysis
• Reliability
analysis
• Hardware
redundancy
• Informa+on
and
+me
redundancy
• Soeware
redundancy
• Checkpoin+ng
• Fault-‐Tolerant
Networks
Lecture
1/22
23. Dependability:
an
integra+ng
concept
• Dependability
is
a
Availability
property
of
a
system
Reliability
that
jus+fies
placing
Safety
one’s
reliance
on
it.
aEributes
Confiden?ality
Integrity
Maintainability
Fault
preven?on
Fault
tolerance
dependability
means
Fault
removal
Fault
forecas?ng
Faults
threats
Errors
Failures
Lecture
1/23
24. Threats:
Faults,
Errors
&
Failures
Error
Fault
Failure
Unintended
Cause
of
error
internal
state
Devia+on
of
actual
service
(and
failure)
of
subsystem
from
intended
service
Lecture
1/24
25. Threats:
Faults,
Errors
&
Failures,
cont.
• Fault
– Physical
defect,
imperfec+on,
of
flaw
that
occurs
within
some
hardware
or
soeware
component.
– Examples
• Shorts
between
electrical
conductors
• Physical
flaws
or
imperfec+ons
in
semiconductor
devices
• Program
loop
that
when
entered
can
never
be
exited
– Primary
cause
of
an
error
(and,
perhaps,
a
failure)
• Does
not
necessarily
lead
to
an
error
e.g.,
a
bit
in
memory
flipped
by
radia+on
– can
cause
an
error
if
next
opera+on
on
memory
cell
is
“read”
– causes
no
error
if
next
opera+on
on
memory
cell
is
“write”
Lecture
1/25
26. Threats:
Faults,
Errors
&
Failures,
cont.
• Error
– An
incorrect
internal
state
of
a
computer
• Devia+on
from
accuracy
or
correctness
– Example
• Physical
short
results
in
a
line
in
the
circuit
permanently
being
stuck
at
a
logic
1.
The
physical
short
is
a
fault
in
the
circuit.
If
the
line
is
required
to
transi+on
to
a
logic
0,
the
value
on
the
line
will
be
in
error.
– The
manifesta+on
of
a
fault
– May
lead
to
a
failure,
but
does
not
have
to
Lecture
1/26
27. Threats:
Faults,
Errors
&
Failures,
cont.
• Failure
– Denotes
a
devia+on
between
the
actual
service
and
the
specified
or
intended
service
– Example
• A
line
in
a
circuit
is
responsible
for
turning
a
valve
on
or
off:
a
logic
1
turns
the
valve
on
and
a
logic
0
turns
the
valve
off.
If
the
line
is
stuck
at
logic
1,
the
valve
is
stuck
on.
As
long
as
the
user
of
the
system
wants
the
valve
on,
the
system
will
be
func+oning
correctly.
However,
when
the
user
wants
the
valve
off,
the
system
will
experience
a
failure.
– The
failure
is
an
event
(i.e.
occurs
at
some
+me
instant,
if
ever)
caused
by
an
error
Lecture
1/27
29. Three-‐universe
model
1. Physical
universe:
where
the
faults
occur
– Physical
en++es:
semiconductor
devices,
mechanical
elements,
displays,
printers,
power
supplies
– A
fault
is
a
physical
defect
or
altera+on
of
some
component
in
the
physical
universe
2. Informa?onal
universe:
where
the
error
occurs
– Units
of
informa+on:
bits,
data
words
– An
error
has
occurred
when
some
unit
of
informa+on
becomes
incorrect
3. External
(user’s
universe):
where
failures
occur
– User
sees
the
effects
of
faults
and
errors
– The
failure
is
any
devia+on
from
the
desired
or
expected
behavior
Lecture
1/29
30. Causes
of
faults
• Problems
at
any
stages
of
the
design
process
can
result
in
faults
within
the
system.
Lecture
1/30
31. Causes
of
faults,
cont.
• Specifica+on
mistakes
– Incorrect
algorithms,
architectures,
hardware
or
soeware
design
specifica+ons
• Example:
the
designer
of
a
digital
circuit
incorrectly
specified
the
+ming
characteris+cs
of
some
of
the
circuit’s
components
• Implementa+on
mistakes
– Implementa+on:
process
of
turning
the
hardware
and
soeware
designs
into
physical
hardware
and
actual
code
– Poor
design,
poor
component
selec+on,
poor
construc+on,
soeware
coding
mistakes
• Examples:
soeware
coding
error,
a
printed
circuit
board
is
constructed
such
that
adjacent
lines
of
a
circuit
are
shorted
together
Lecture
1/31
32. Causes
of
faults,
cont.
• Component
defects
– Manufacturing
imperfec+ons,
random
device
defects,
component
wear-‐out
– Most
commonly
considered
causes
of
faults
• Examples:
bonds
breaking
within
the
circuit,
corrosion
of
the
metal
• External
disturbance
– Radia+on,
electromagne+c
interference,
operator
mistakes,
environmental
extremes,
baLle
damage
• Example:
lightning
Lecture
1/32
34. Failure
modes,
cont.
• Failure
domain
– Value
failures
:
incorrect
value
delivered
at
interface
– Timing
failures
:
right
result
at
the
wrong
+me
(usually
late)
• Failure
consistency
– Consistent
failures
:
all
nodes
see
the
same,
possibly
wrong,
result
– Inconsistent
failures
:
different
nodes
see
different
results
• Failure
consequences
– Benign
failures
:
essen+ally
loss
of
u+lity
of
the
system
– Malign
failures
:
significantly
more
than
loss
of
u+lity
of
the
system;
catastrophic,
e.g.
airplane
crash
• Failure
oRenness
(failure
frequency
and
persistency)
– Permanent
failure
:
system
ceases
opera+on
un+l
it
is
repaired
– Transient
failure
:
system
con+nues
to
operate
• Frequently
occurring
transient
failures
are
called
intermiLent
Lecture
1/34
35. Failure
modes,
cont.
• Consistent
failures
– Fail-‐silent
• system
produces
correct
results
or
remains
quiet
(no
delivery)
– Fail-‐crash
• system
produces
correct
results
or
stops
quietly
– Fail-‐stop
• system
produces
correct
results
or
stops
(made
known
to
others)
• Inconsistent
failures
– Two-‐faced
failures,
malicious
failures,
Byzan+ne
failures
Lecture
1/35
37. Dependability
aLributes
• Availability:
readiness
for
correct
service
• Reliability:
con+nuity
of
correct
service
• Safety:
absence
of
catastrophic
consequences
on
the
user(s)
and
the
environment
• Confiden?ality:
absence
of
unauthorized
disclosure
of
informa+on
• Integrity:
absence
of
improper
system
altera+ons
• Maintainability:
ability
to
undergo,
modifica+ons,
and
repairs
• Security:
the
concurrent
existence
of
(a)
availability
for
authorized
users
only,
(b)
confiden+ality,
and
(c)
integrity
with
‘improper’
taken
as
meaning
‘unauthorized’.
Lecture
1/37