4. RAMS - DEFINITIONS
Reliability – the probability that an item can perform a required
function under given conditions for a given time interval.
Availability – the ability of a product to be in a state to perform a
required function under given conditions at a given instant of time
or over a given time interval assuming that the required external
resources are provided.
Maintainability – the probability that a given maintenance action, for
an item under given conditions of use can be carried out within a
stated time interval when the maintenance is performed under
stated conditions and using stated procedures and resources.
Safety – freedom from unacceptable risk of harm. (EN50126)
Quality – a users perception about the attributes of a product.
(EN50129) NOTE: Quality is NOT testing!
5. IS IT A FAULT, AN ERROR, OR A
FAILURE? (1)
Fault
• An abnormal condition that could lead to an error in a system. A fault can be
random or systematic.
Examples: a defective hardware component or a software bug.
Error
• A deviation from the intended design which could result in unintended system
behaviour or state within the system boundary. E.g. excessive stress on a
hardware component due to a fault in another component, or a handled
software exemption (say divide by zero).
Failure
• A deviation from the specified performance of a system visible at the
system boundary. A failure is a consequence of a fault or error in a system.
Failures may be graded depending on their effect on the operation of the
system e.g. minor, significant, major etc. E.g. unnecessary emergency brake
application in an ATP system.
6. IS IT A FAULT, AN ERROR, OR A
FAILURE? (2)
Dormant (or latent) faults/errors
• Are faults/errors that have occurred but lie undetected and do not lead to a
failure (unless perhaps in a combination with other faults/errors).
So what is a HAZARD?
Hazard – A physical situation with the potential to cause harm
N A hazard is NOT an accident e.g. electrocution is
not a hazard it is an accident
N A hazard is NOT an event
N A hazard IS a “state of a system” e.g. an exposed voltage is a hazard
N It is an error or a failure
7. FAULTS, ERRORS, AND FAILURES
– WHAT IS WHAT?
Sub-System
Fault Error Failure
System
Fault Error Failure -> Hazard Accident
N Faults cannot be avoided but failures can be prevented
N Unrecognised faults become failures
8. WHY NOT DETECTING A SINGLE
FAULT IS FATAL
Some time later…
0 .. right
1 1 1 0 1 1 1 .. straight
Voted: 1 Voted: 1
FAULT 1 (undetected) FAULT 2 (undetected)
0 1 1 0 0 1
Voted: 1 Voted: 0 FAILURE
9. SAFETY INTEGRITY LEVEL
SIL4 means roughly 25+
years of continuous
operation without any
safety-critical faultILURE
FA
THR … Tolerable Hazard Rate
10. FAULT TREE ANALYSIS (FTA)
FTA is a top down analysis technique used for finding the causes of the top
event
The top even is usually a system hazard
The analysis proceeds by considering the immediate, necessary and
sufficient causes of the top event
These causes are drawn on the tree using logic gates to show their
combination
When all immediate causes have been identified then the analysis moves
down to these causes and finds what were their immediate causes
The analysis completes when it gets down to the basic events that cannot
be broken down any further
FTA can be quantified by assigning the probabilities to the basic events and
using Boolean algebra to calculate the probability of the top event
19. RISK REDUCTION
METHODS (OVERVIEW)
Measures to be considered in priority order are
Remove the hazard or the causes of the hazard
or eliminate the effects at the design phase
1st – Elimination (E.g. operate at a safe working voltage).
A hazardous element is substituted with a
2nd – Substitution nonhazardous element. E.g. specify fireproof
cables when fire is a hazard.
Safety guards/safety barriers are inserted to
3rd – Engineering controls minimise the exposure or probability of a
hazard, i.e., isolating the hazard. The hazard
remains and becomes active if the defence is
4th – Administrative controls for any reason removed. E.g. of measures are
• simplification
• decoupling
• redundancy
5th – Providing protective
systems/subsystems/products/equipment.
20. EN50126
„Railway applications – The specification and demonstration
of Reliability, Availability, Maintainability and Safety (RAMS)“
• General discussion of RAMS
• Introduces risk assessment and the risk assessment matrix
• Introduces Safety Integrity Levels
• Defines a system life-cycle made up of fourteen phases and
describes typical general, RAM and Safety tasks in each
phase.
• Describes the V representation of the life-cycle
22. EN 50128
„Railway applications – Communications, signalling and processing
systems – Software for railway control and protection systems“
• Describes software development lifecycle and the inputs,
requirements and outputs for each phase
• Annex A (normative) provides tables of techniques and measures
to be applied at each phase according to SIL of the software (SIL
0 to SIL4)
• Each technique/measure is given a rating from Mandatory, Highly
Recommended, Recommended, No Recommendation to Not
Recommended
• Some tables give sets of techniques/measures that can be used
in combinations to meet a particular SIL
• Annex B (informative) gives a brief description of each of the
techniques
27. EN 50129
„Railway applications – Communications, signalling and processing
systems – Safety related electronic systems for signalling”
• Describes the structure and expected content of a safety case
• Annex A (normative) describes how Safety Integrity Levels are
determined and gives the SIL versus THR table.
• Annex B (normative) gives detail technical requirements for the
content of the Technical Safety Report part of the safety case
• Annex C (normative) describes expected failure modes of
components
• Annex D (informative) gives information on analysing
independence of items
• Annex E (informative) gives techniques recommended for
different stages in the development life-cycle against SIL0 to SIL4
28. SOME MORE…..
EN 50121-3-2/ IEC 62236-3-2 Railway applications - Electromagnetic
compatibility Part 3-2: Rolling Stock – Apparatus
EN 50121- 4 / IEC 62236-4 Railway applications – Electromagnetic
compatibility. Part 4: Emission and immunity of the signalling and
telecommunications apparatus
EN 50124-1 Railway applications - Insulation coordination - Part 1: Basic
requirements - Clearances and creepage distances for all electrical and
electronic equipment
EN 50125-1 Environmental conditions for equipment - Part 1: Equipment
on board rolling stock
EN 50125-3 Environmental conditions for equipment - Part 3: Equipment
for signalling and telecommunications.
EN 50153 Rolling stock - Protective provisions relating to electrical
hazards
EN 50155 Railway applications - Electronic equipment used on rolling
stock
29. WHAT IS
VERIFICATION?
Confirmation by examination and provision of objective evidence that the
specified process requirements have been fulfilled (EN50126)
Activity of determination, by review and inspection, that the output of each
phase of the life-cycle fulfils the requirements of the previous phase
(EN50128)
The activity of determination, by review and inspection, at each phase of the
lifecycle, that the requirements of the phase under consideration meet the
output of the previous phase and that the output of the phase under
consideration fulfils the requirements (EN50129)
Conclusions?
• Verification can be review or inspection
• Its specific to a particular object (e.g. document, module of code etc.) or
lifecycle phase
• It makes sure the object has been produced according to the specified inputs
30. WHAT IS
VALIDATION?
Confirmation by examination and provision of objective evidence that the
particular requirements for a specified intended use have been fulfilled
(EN50126)
Activity of demonstration, by analysis and test, that the product meets, in all
respects, its specified requirements (EN50128)
The activity applied in order to demonstrate, by test and analysis, that the
product meets in all respects its specified requirements (EN50129)
Conclusions?
• Validation can be analysis or test
• Validation involves demonstration
• Validation applies to a complete product or system
• Validation ensures the product or system meets its specified requirements
31. TESTING TYPES
Functional testing
Performance testing
• Aims to check the quantified system requirements, e.g. does it do what is
supposed to do in the required time, or under maximum load/stress, or
without using more power than it is allowed to etc.
Usability testing
• Usability test to examine how people use a system to find problems and
improvements
Destructive testing
• To find the limits of operation.
Robustness testing
• E.g. Turn the main supply off – will it start up again properly
Degraded mode testing
• E.g. Tests with some parts of the system failed.
32. TEST PHASES (1)
Sub-System testing
• aims to find problems with sub-systems where test coverage is
easier to manage and faults easier to localize, rather than attempting
the same thing in a system test
Integration testing
• To ensure sub-systems interface together correctly
System Tests
• With the complete system in the laboratory to exercise as much of the
system requirements as feasible
Product Qualification Tests
• Type tests e.g. heat, cold, damp, EMC, vibration, pollutants etc.
• Special tests e.g. re-type testing a product from the manufacturing
line to show initial type tests are still valid
Manufacturing Tests
33. TEST PHASES (2)
Factory Acceptance Test
• A test to ensure the system is ready to be taken to site
Site Acceptance Test
• An acceptance test for and with the customer
Field Trials
• Environmental conditions
• Operating procedures
Set-to-work testing
• To ensure sub-system or system at least performs its basic
functions, as a prerequisite to more extensive testing
Installation testing
• To find installation errors (bell tests, insulation tests)
34. TEST PHASES (3)
Commissioning tests
• Correspondence tests (e.g. right light at right cable branch?)
Safety Qualification Test
• Testing in operation but with additional safety controls in place (e.g.
limited speed, backup monitoring systems etc.)
Field Operational Performance Tests
• E.g. headway and schedule running tests
RAM Proving Tests
• Obtaining real RAM figures for the system in operation to
demonstrate the results of the RAM analysis
35. AUTOMATIC TESTING
Wherever feasible automatic testing is to be preferred, the
benefits are
• Doesn’t suffer from human errors caused by boredom,
fatigue, lack of motivation, repetition etc.
• Makes 100% regression tests feasible
• Repeatability
• Can work 24 hours a day
But there are issues too
• You need to design the test system first!
• Verification of the test data
• Validation of the test system
• What SIL do the simulators need to be?
• Maybe slow to setup so delays early testing
Not much used today in this industry, slowly coming
36. TOOLS, AND WHY TO SELECT
THEM CAREFULLY
Tool Classes T1-T3 (EN50128:2011)
Class T1
• generates no outputs which can directly or indirectly contribute to the
executable code (including data) of the software
Class T2
• supports the test or verification of the design or executable code,
where errors in the tool can fail to reveal defects but cannot directly
create errors in the executable software
Class T3
• generates outputs which can directly or indirectly contribute to the
executable code (including data) of the safety related system
37. TOOL CLASS REQUIREMENTS
(EN50128)
„All tools in classes T2 and T3 shall have a specification or manual which clearly
defines the behaviour of the tool and any instructions or constraints on its use”
“For each tool in class T3, evidence shall be available that the output of the tool
conforms to the specification of the output or failures in the output are detected.
Evidence may be based on the same steps necessary for a manual process as a
replacement for the tool and an argument presented if these steps are replaced by
alternatives (e. g. validation of the tool). Evidence may also be based on
• a) a suitable combination of history of successful use in similar environments and for
similar applications (within the organisation or other organisations),
• b) tool validation as specified in 6.7.4.5,
• c) diverse redundant code which allows the detection and control of failures resulting
in faults introduced by a tool,
• d) compliance with the safety integrity levels derived from the risk analysis of the
process and procedures including the tools,
• e) other appropriate methods for avoiding or handling failures introduced by tools.”
39. MAIN PROBLEMS (2)
• Single-Pass V life-cycle
• Testing manual, late in the project
• Long setup-phase for project
• Extensive reviews
• Traceability
• Documentation
• Documentation
• Documentation
• Did I mention:
• Documentation?
40. STRATEGY USING
AN AGILE APPROACH
Reduce cycle-time (1 month vs 1-3 years) to:
• reduce batch-size
• manage complexity step by step
• perform activities as early and often as possible
• provide feedback