SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
PvA 06-2015 1
Technical Bulletin # 15
Real Time Monitoring of component condition degradation
June 2nd
2015
1 Introduction.
The term “Real Time Monitoring” is often used nowadays. It is however not always clear what is monitored
and what is done with the gathered information. Process real time monitoring and component condition
real time monitoring may get mixed up although these two applications of real time monitoring are not the
same in their character and usage.
This bulletin will focus on monitoring of component condition and condition degradation of physical
components operating together in a system and what “Real Time Monitoring” in that case means.
By putting focus on the usage of real time monitoring for component condition this document does not
address the application of real time monitoring for operational usage such as for automated process
control or process operations support.
2 What is Real Time Monitoring?
Real time monitoring is the application of, often sensor based, continuous measurements of a unit that can
be measured and ‘calibrated’ to provide meaningful information about what is measured: the meaning is in
the fact that it is possible to interpret the reading, manual or automated, in such a way to determine
trends or abrupt changes. A reading has a set of boundaries or acceptance and rejection criteria that allow
to assess whether within process or beyond pre-set process limits.
An example is measurement of i.e. pressure. It can be measured in a 4 – 20 mA range where the calibration
tells us humans that 4 mA is i.e. 120 PSI and 20 mA is 1500 PSI. As different pressure ranges may be
measured in similar way the calibration is an inevitable step into the ‘translation’ to meaningful
information. Without calibration the 4 – 20 mA signal has no meaning as it does not by itself give a pressure
indication.
Process limits may be set between 140 PSI as lower rejection criterion and 1000 PSI as upper rejection
criterion. The translated mA measurement will tell if the pressure is within the defined process boundaries
(threshold of operation or operation window).
Pressure is an operational or process value. Continuous monitoring of a process value is meaningful as it
may provide a means to control the process, manually or automated. Loss of the signal can mean that the
process has to stop or cannot be controlled properly.
3 Real time monitoring for condition – (degradation) assessment.
When applying real time monitoring with the above in mind to the condition of a physical component one
need to answer the questions: what affects the condition, how can that be measured and how can that
measurement being translated or calibrated to provide meaningful information about the measured
condition. Equal to the above it is important to set a threshold for what is acceptable for condition of that
component.
This requires the question to be asked: ‘why do we want to know a components’ condition?”.
This goes back to the fact that for a lot of components the condition gives an indication on the ability to
perform according its specifications and for successful system functions.
Without going into mathematical detail: when a component is degrading its condition related to that
performance degrades. Assuming that components degrade over time it makes sense that a component will
lose its ability to perform as expected or required due to the degradation and leading to what is referred
to as ‘functional failure’: the component has lost its ability to function according what is required from it.
That brings us closer to what adds most value to be known: the reliability of a component in the event of
usage in a process. What we are interested in is how far the degradation has progressed in order to
determine for how long it still makes sense to operate the component without the risk of failure and
system seizure as result.
Reliability is the defined ability of a system or component to perform its required functions under stated
conditions for a specified period of time (source: www.reliabilityweb.com)
Process parameters may inform us about condition problems in some components that are used to operate
that process but do not give detailed information about what or how it is degrading, which component it
PvA 06-2015 2
Technical Bulletin # 15
is and how far it is from complete failure. Process information is also lagging: the indication we get from
process parameters are the end result of the failing component, it is the effect not the cause.
Component condition monitoring is important to assess the reliability of a component in order to be able
to react in time to prevent a component failure from happening with system seizure as result. Economic
aspects can also be taken into account when a system stops or needs stopped due a component failure
The following string indicates the logic of aspects we can link:
Figure 1
In this string there are no process parameters. It is possible to link the process parameters to the risk level
side of this string, coming from “the other side” in that way, so it becomes clear what the process risk is if
a single component is failing. Component failure itself will affect the function in which that component is
operating and the functional effect on the whole system of which that function is a part will depict the
severity of that failing component and the risk level it inflicts.
Component failure of a similar component in a different function within a system can have a different
effect and lead to a different risk level.
Maintenance, i.e. reliability informed, focuses on affecting the degradation of components and therefore
has the intent to affect the reliability, condition and risk of failure by preventing a component from
failure.
Maintenance and maintenance effects are not described in this document but with the above in mind it
becomes apparent that maintenance must affect a degradation mechanism in order to be successful and
therefore effective.
A component can fail in more than one way, hence there are more than one failure modes leading to
failure and there is also multiple ways of degradation that eventually lead to a failed state.
In order to be complete and exact we should measure every degradation mechanism leading to all the
failure modes. This is perhaps not feasible, economically and technically and may also not be a very
effective way. Without knowing what we want to measure we can easily end up with a massive amount of
(calibrated-) data that makes it impossible to trend towards a condition or reliability or risk level.
Evaluation and determination of the most dominant failure mode is a must-do at the beginning of
component condition monitoring. Coming from the risk side: which failure causes the highest risk. That
creates focus on where we should (practically) start. An often used method is an Failure Mode and Effect
Analysis (FMEA). When done fully and correct the FMEA lists all known functional failure modes which are
an indication of the damage mechanisms. The list of all damage mechanism can be filtered to assess the
most dominant one.
As an example: if wear of a component is the main degradation mechanism (multiple times faster than
other known degradation mechanisms for that component such as i.e. creep, or corrosion) than it has value
to measure the wear first and more often and not so much the corrosion. Over time corrosion may however
reach its threshold limit of condition and reliability impact so it must also be taken into account, an
PvA 06-2015 3
Technical Bulletin # 15
inspection scheme based on the fastest degradation mechanism will be capable to keep track of the
corrosion as condition parameter.
Saying this brings us to the degradation speed. Not all degradation goes equally fast, and not every
component will experience a similar speed in degradation of similar degradation.
A simple example for this are the wear of door hinges: depending on the weight distribution over the
hinges one will wear more than the other although both are equally often used in opening and closing.
Although both components in this example are of a similar type their condition will differ over time.
In order to gain the highest effect of maintenance for these hinges the interval should be different
depending on the speed of degradation, the applied effort of maintenance (i.e. greasing in this case) stays
the same.
The usage of components in a function is also an aspect that comes into play when addressing condition
measurements. Often a system is built from redundant components or components that are only used on-
demand. When one component is in operation the other one(s) are not operating, or a component is only in
operation when it is required to operate at limited moments in time.
Identical components, with identical degradation and damage mechanism operating in an identical system
may differ in their condition due to the difference in usage. Measuring usage (or ‘units of operation’)
together with the condition degradation is therefore important.
• Components have different failure modes  degradation or damage mechanism for those failure
modes.
• One degradation mechanism has one best practice maintenance activity but the application
frequency may differ due to the speed (or ‘degradation profile’) of that degradation mechanism.
• One degradation mechanism leads to one functional failure but due to speed of degradation some
components may fail sooner than others.
• There is a most dominant degradation mechanism that often goes multiple times faster than others
and leads more often to functional failure of that component. A proper executed and documented
FMEA provides valuable information for this.
• A functional failure (failed component) will therefore occur sooner in one case than in the other
• A component (functional) failure may cause the system it operates in to stop: the risk of
component failure depends on the risk of that system stop.
• Every degradation mechanism has its own way of most effective measuring (i.e. sensor type).
• Every (sensor-) measurement needs to be calibrated to be meaningful in degradation assessment
• It is not feasible or effective to measure all degradation mechanisms.
• Degradation itself needs to be translated onto an agreed condition level and the level of its
functional failed state.
• Every measurement has its own acceptance threshold.
• Different usage causes a different degradation and condition even for identical components. Usage
difference can occur i.e. when identical components operate in different system-functions or are
part of a redundant set-up.
• Similar can be stated for the load difference (Reference hinge example). This often occurs when
similar components are used in different system-functions.
• What we refer to as ‘condition’ is the level of degradation of a component compared to its ‘new’
state. This is called the “PF-interval”. “PF” stands for Potential to Failure in this case. PF indicates
where a component is on its degradation path condition.
Being able to plot a measured degradation value in a PF graph is the ideal world for condition based
monitoring.
PvA 06-2015 4
Technical Bulletin # 15
Figure 2: the PF graph.
If we apply this to a group of similar components for one degradation mechanism than the spread in
‘degradation speed’ is a measure of how closely all components are together in their condition or on their
way to failure. (Reference to aspect 12 in the above bullets).
If several components are close together it means that they degrade equally fast, making i.e. maintenance
for such ‘population’ easier to prevent from failures. A huge spread in degradation speed makes
maintenance application more difficult. Differentiation in degradation may occur due to usage as
described above.
This is one of the pitfalls of condition based maintenance: it is often more effective but also more difficult
from a planning and execution point of view. Condition based maintenance revolves around individual
components and more than often it is assumed that there is a constant and equal usage and degradation
causing maintenance activities to be less effective.
In figure 3 a set of several identical components are plotted to show that they degrade differently and
therefore have a different time to failure.
Figure 3: difference in degradation speed: different PF.
PvA 06-2015 5
Technical Bulletin # 15
The complexity rises due to the combination of all mentioned aspects.
On top of the previously listed aspects there is:
• Similar and non-similar components in every operating system with equal damage mechanism.
• Similar components with equal degradation mechanisms have a different degradation speeds
(figure 2) due to different usage (or load - different system-functions)
• Non-similar components may have similar degradation mechanisms but at different speeds of
degradation.
• For component failure anticipation in systems it is important to be able to ‘look ahead” into the
next timeframe in which that component needs to perform. Can a component with a certain level
of degradation continue its function for the next X days without going towards a functional failure?
By adding all this together it is not difficult to imagine that the amount of data that needs to be assessed,
measured and translated or calibrated grows exponentially with the eagerness of trying to measure all in
order to detect condition or degradation.
Why would we use real time for condition degradation leading to functional failures?
Application of condition based monitoring and maintenance instead of i.e. the time-based version already
gives a major growth in data to be managed, application of real time monitoring for that (without
automation/programming and proper reaction time on potential failure) could make it unmanageable.
Why is reaction time important?
Going back to the PF interval we see that degradation takes time. This time to a failure (the PF interval)
can be used to react. Reaction can be ordering parts, prepare an operational stop, prepare overhaul or
repair. The reaction time is an important value as it allows reacting on the measured degradation.
If the PF interval (degradation speed) is fast there is little time to react, if the PF interval is longer there
is more time to react.
The shorter the PF interval or the higher the speed of degradation the more often one needs to measure in
order to allow for proper reaction time on the coming functional failure. Speed of degradation is
therefore directly linked to the speed of measurement. The speed of measurement is often called ‘sample
time’.
If a short sample time is adding value is also depending on the risk of component failure. As described the
risk of component failure is dependent on the risk of the seizure of the system or function that component
is operating in.
• Whether or not measuring degradation ‘real time’ (continuous; = a very short sample time,
milliseconds or shorter) depends on the speed of that degradation (the steepness of a PF graph)
and the time needed to react on occurrence of extremely fast degradation and upcoming failure.
• In most cases degradation does not go that fast and we do not have a very fast reaction required.
• The risk of component failure adds an extra aspect to the value of a short sample time, i.e. by real
time condition monitoring.
• Real time condition monitoring application is technically advanced due to the often mutually
applied automation and data-management and can become costly in comparison to what will be
saved on the risk side.
• Producing extreme data amounts without consideration could lead to ‘too much’ data and not
being used at all due to that.
That makes real time monitoring as a start less effective for condition monitoring without good
consideration or immediate application of added automation to filter, trend and calibrate.
Automation adds another layer of complexity and cost to a real time condition monitoring system.
Real time monitoring for process control is different than for condition measurement. Real time for
process control requires continuous input in order to control the process properly.
PvA 06-2015 6
Technical Bulletin # 15
4 System and functions: multiple components, degradation speed and effect of failure.
So far this document focused on single component degradation. A complete system is performing by means
of several different functions.
Each function is designed to perform a part of the total system performance. That functional performance
is enabled by means of technical components that are ordered in a logical way to enable the functional
performance together. With this in mind the applicable components are brought together by a design
engineer.
Similar components can be part of different functions.
Different functions mean different condition degradation and different failure pattern, effect and
probability and therefore risk of failure.
If this is applied to i.e. a subsea Blow Out Preventer (BOP) there a multitude of components, component
groups (i.e. “hoses”, or “Solenoid valves”) that together allow all different BOP functions to perform.
The regulatory effect of a component failure on a specific function and on the whole BOP system is
determined by means of LRED’s “BOP risk model” and expressed in such a way that it assists in the
decision whether or not a BOP can stay in operation or needs to be repaired to remain compliant with
the applicable regulations (known as the risk model ‘risk criteria’).
In order to be able to use condition degradation to determine a risk level it needs to be identified:
• What is the failure effect of component failure (in order to assess the impact of condition
degradation)
• What the failure modes are.
• What causes that failure mode: the degradation mechanism?
• What is the minimum acceptable condition level for a component in a specific function? (This
differs per function and as explained is affected by the reaction time versus the degradation time).
o Once this is documented a measuring application can be chosen in an economic and
technically feasible way. Of course the measurement needs to be ‘calibrated’ to become
meaningful; this is regarded as being part of the technical feasibility.
• A selection of major critical component and their degradation should be studied to identify and
document the economic and technical feasibility of condition monitoring based on the above and
the cost of / feasibility to measure and calibrate the measurement.
It must be realized that not only the current condition status is important but more so the ability to
forecast condition and potential failure in the coming period: in BOP-terms: can a BOP operate
theoretically in the next well to drill without critical component failures, and if not what needs to be done
now? Due to the modus of BOP operations it is important to assess the level of degradation against the
next operating window to evaluate if a component would survive that next operating period without
functionally failing.
Condition monitoring would than serve 2 purposes:
• Condition assessment to assist operations with current risk level after component failure,
• Enable the needed forecast for a next well to drill.
The following figure shows the current coverage of the LRED’s BOP risk model (the brown-dashed line area)
PvA 06-2015 7
Technical Bulletin # 15
Figure 4
The risk model coverage area is the ‘end result’ of a component failure on regulatory compliance.
• To build an effective condition monitoring application it may be sensible to evaluate systems that
have condition monitoring in their design and are able to forecast on condition data more so than
systems that are designed to deal with operations decision support.
• Whether or not Real Time monitoring of component condition is economic applicable depends as
explained on the speed of degradation, reaction time on upcoming failure (in combination with the
operating modus) and component criticality: failure effect.
An important step between condition degradation and risk is the calculation of component reliability.
Why?
Reliability calculations allow for the forecast where the risk aspect shows the current status. (Even in the
case of risk- ‘what-if’ scenarios since these do not address the level of condition degradation or reliability
it just considers being failed).
Reliability is also a means to express the ability of a component to perform.
In order to be effective in that it must be evaluated and documented what the minimum reliability level
is: it has little meaning to know that a component has a reliability of 88% if you do know what the
minimum requirement is for that component in that function and have no idea about how this 88% came
about: how long did it take, how was it used etc.
This is required to evaluate future ‘survival’ or ‘failure’ of that component.
In order to become aware of that takes some time: after several measured conditions and calculated
reliability levels awareness will grow on what is acceptable and what is not acceptable.
Adding mean or average failure probability for whole component groups such as MTTF, MTBF, and MUTF
will provide an average idea of component reliability. Without the ability to forecast failure these values
have very little value as you want to know how close you are to a potential failure based on condition
information.
Forecasting depends on the degradation profile and the way that profile is mathematically described.
(There are several different degradation distributions).
A Mean Time to Failure (MTTF) (also for MTBF and MUTF) is the average or mean of a component group or
‘population’ failures. Being a mean implies that a lot of components will fail sooner or later due the
difference in degradation and therefore condition after “x” time or usage. This will happen in real life ina
real operating system.
This brings us to a third important aspect that will be addressed with component condition monitoring
(real time or not) however with a longer lead time to become apparent. That aspect is that based on all
measured and calibrated condition degradation information of similar components the (most likely-) in an
earlier stage of risk and reliability management chosen MTTF / MTBF / MUTF can now be verified and
PvA 06-2015 8
Technical Bulletin # 15
made optimized to that specific system leading to a higher accuracy in both operational decision making
and forecast.
Having these data available for use would also put LRED up front of current competitors.
A graphical view on how this looks is shown in figure 5
Figure 5
From individual component degradation curves using a Weibull distribution to a bell-curve distribution and
MTBF for the evaluated component population and different degradation speeds.
PvA 06-2015 9
Technical Bulletin # 15
5 Conclusion.
• Any application that will be evaluated to be used for the whole series of steps must be able to
handle all these steps and the belonging information and calculation in a controlled, logical and
transparent fashion. (steps: see figure 1)
• It is by now believed that it is unlikely that a single system will be capable of handling all the
required aspects. Best of breed systems with easy interfaces may need to be made.
• LRE does own several systems that when combined will be able to address all mentioned aspects.
• Although combining has not yet been successful in the above respect it makes sense to continue
putting effort in this.
• Can real time monitoring for condition measurements add value? Yes, but one needs to be
considerate and with an economical mindset. Real Time monitoring of component condition is
economic applicable depending on the speed of degradation, reaction time on effect of upcoming
failure (in combination with the operating modus).
• Condition monitoring in any form and data collection serves 3 purposes:
o Condition assessment to assist operations with current risk level after component failure,
o Enable the needed forecast for a next well to drill,
o Allow for Mean Failure probability optimization (competitors advantage if collected and
made re-usable).
A simple means to improve reaction time is to set the functional failure limit higher which gives more
reaction time (a tighter threshold for condition degradation).
6 The following was posted on OilPro as a post on a preventive maintenance topic:
Time based maintenance is not in all occasion a bad idea. It depends on the critically of a component’s
function and the ability to be maintained in any other way than on a time schedule.
Regulations are most often also time based as that is easier to schedule and enforced.
I am not defending time-based maintenance in all occasions but in some specific areas there is hardly
another choice.
A good option is to start growing from time based to usage based that is a major step for most asset
operators. It does not only step away from the ‘mean’ failure time for a whole component population,
but allows differentiating between different functions with similar components.
A next step is to grow from usage to condition based. In that case the individual component in a circuit is
monitored for its condition. That condition is “translated” or “calibrated” with the level of degradation
for a single failure mode: we ask the question: how does it fail and how can we measure it?
The level of accuracy grows with each steps above and also the amount of data we need to crunch. Saying
that, means we need to be careful to our feed on our eagerness as technicians to measure everything as it
can become very costly, and since the data input will explode, managing it to become useful information
gets harder every extra GB. Real time monitoring for example?
Is that really what we want: what does it bring, or is an ‘almost’ real time also OK? Let’s say a
measurement every minute instead of every second gives already 60 times less data points….but no
different results I think. (only when the PF-interval is much less than a minute which is very unlikely)
Think about the effect of a measurement every millisecond….
Time based may be a very effective way of maintenance, given the time base is calibrated with what one
wants to achieve.
Usage based and more so for Condition based maintenance requires a lot more planning, preparation and
schedule alignment and is therefore more costly from an operations point of view, in need of a lot of
extra resources for data analysis and costs and may not always be in line with the risk that is linked to a
specific component.
All maintenance strategies, incl. time based, are in the mix when a whole maintenance system is taken
into account. Using condition based is OK but apply with care and it must be feasible, cost vs. risk
balanced and has quite some implications on a maintenance department’s resources

Contenu connexe

En vedette (18)

Cartilla mastitis revision correccion2a1MSD Salud Animal Salud Lechera
Cartilla mastitis revision correccion2a1MSD Salud Animal Salud LecheraCartilla mastitis revision correccion2a1MSD Salud Animal Salud Lechera
Cartilla mastitis revision correccion2a1MSD Salud Animal Salud Lechera
 
ABS Certificate
ABS CertificateABS Certificate
ABS Certificate
 
Atest Paparazzi
Atest PaparazziAtest Paparazzi
Atest Paparazzi
 
V.BASIC
V.BASICV.BASIC
V.BASIC
 
EF ACADEMIC YEAR ABROAD DIPLOMA
EF ACADEMIC YEAR ABROAD DIPLOMAEF ACADEMIC YEAR ABROAD DIPLOMA
EF ACADEMIC YEAR ABROAD DIPLOMA
 
Jorge soto 09 30
Jorge soto 09 30Jorge soto 09 30
Jorge soto 09 30
 
Kinderopvang op Curacao
Kinderopvang op CuracaoKinderopvang op Curacao
Kinderopvang op Curacao
 
Tutor Flyer
Tutor FlyerTutor Flyer
Tutor Flyer
 
7
77
7
 
Patrick Galenza Scholarship Reference Letter
Patrick Galenza Scholarship Reference LetterPatrick Galenza Scholarship Reference Letter
Patrick Galenza Scholarship Reference Letter
 
Miedoalalibertaderichfromm
MiedoalalibertaderichfrommMiedoalalibertaderichfromm
Miedoalalibertaderichfromm
 
Los tiempos de jesús
Los tiempos de jesúsLos tiempos de jesús
Los tiempos de jesús
 
Asturias
AsturiasAsturias
Asturias
 
Fotonovela
FotonovelaFotonovela
Fotonovela
 
Pbl2008
Pbl2008Pbl2008
Pbl2008
 
Fuerzasdelanatura Gbrl
Fuerzasdelanatura GbrlFuerzasdelanatura Gbrl
Fuerzasdelanatura Gbrl
 
Kaartje1
Kaartje1Kaartje1
Kaartje1
 
Oapee.PresentacióN Feb 2008
Oapee.PresentacióN Feb 2008Oapee.PresentacióN Feb 2008
Oapee.PresentacióN Feb 2008
 

TechnicalBulletin--Real Time Monitoring 4

  • 1. PvA 06-2015 1 Technical Bulletin # 15 Real Time Monitoring of component condition degradation June 2nd 2015 1 Introduction. The term “Real Time Monitoring” is often used nowadays. It is however not always clear what is monitored and what is done with the gathered information. Process real time monitoring and component condition real time monitoring may get mixed up although these two applications of real time monitoring are not the same in their character and usage. This bulletin will focus on monitoring of component condition and condition degradation of physical components operating together in a system and what “Real Time Monitoring” in that case means. By putting focus on the usage of real time monitoring for component condition this document does not address the application of real time monitoring for operational usage such as for automated process control or process operations support. 2 What is Real Time Monitoring? Real time monitoring is the application of, often sensor based, continuous measurements of a unit that can be measured and ‘calibrated’ to provide meaningful information about what is measured: the meaning is in the fact that it is possible to interpret the reading, manual or automated, in such a way to determine trends or abrupt changes. A reading has a set of boundaries or acceptance and rejection criteria that allow to assess whether within process or beyond pre-set process limits. An example is measurement of i.e. pressure. It can be measured in a 4 – 20 mA range where the calibration tells us humans that 4 mA is i.e. 120 PSI and 20 mA is 1500 PSI. As different pressure ranges may be measured in similar way the calibration is an inevitable step into the ‘translation’ to meaningful information. Without calibration the 4 – 20 mA signal has no meaning as it does not by itself give a pressure indication. Process limits may be set between 140 PSI as lower rejection criterion and 1000 PSI as upper rejection criterion. The translated mA measurement will tell if the pressure is within the defined process boundaries (threshold of operation or operation window). Pressure is an operational or process value. Continuous monitoring of a process value is meaningful as it may provide a means to control the process, manually or automated. Loss of the signal can mean that the process has to stop or cannot be controlled properly. 3 Real time monitoring for condition – (degradation) assessment. When applying real time monitoring with the above in mind to the condition of a physical component one need to answer the questions: what affects the condition, how can that be measured and how can that measurement being translated or calibrated to provide meaningful information about the measured condition. Equal to the above it is important to set a threshold for what is acceptable for condition of that component. This requires the question to be asked: ‘why do we want to know a components’ condition?”. This goes back to the fact that for a lot of components the condition gives an indication on the ability to perform according its specifications and for successful system functions. Without going into mathematical detail: when a component is degrading its condition related to that performance degrades. Assuming that components degrade over time it makes sense that a component will lose its ability to perform as expected or required due to the degradation and leading to what is referred to as ‘functional failure’: the component has lost its ability to function according what is required from it. That brings us closer to what adds most value to be known: the reliability of a component in the event of usage in a process. What we are interested in is how far the degradation has progressed in order to determine for how long it still makes sense to operate the component without the risk of failure and system seizure as result. Reliability is the defined ability of a system or component to perform its required functions under stated conditions for a specified period of time (source: www.reliabilityweb.com) Process parameters may inform us about condition problems in some components that are used to operate that process but do not give detailed information about what or how it is degrading, which component it
  • 2. PvA 06-2015 2 Technical Bulletin # 15 is and how far it is from complete failure. Process information is also lagging: the indication we get from process parameters are the end result of the failing component, it is the effect not the cause. Component condition monitoring is important to assess the reliability of a component in order to be able to react in time to prevent a component failure from happening with system seizure as result. Economic aspects can also be taken into account when a system stops or needs stopped due a component failure The following string indicates the logic of aspects we can link: Figure 1 In this string there are no process parameters. It is possible to link the process parameters to the risk level side of this string, coming from “the other side” in that way, so it becomes clear what the process risk is if a single component is failing. Component failure itself will affect the function in which that component is operating and the functional effect on the whole system of which that function is a part will depict the severity of that failing component and the risk level it inflicts. Component failure of a similar component in a different function within a system can have a different effect and lead to a different risk level. Maintenance, i.e. reliability informed, focuses on affecting the degradation of components and therefore has the intent to affect the reliability, condition and risk of failure by preventing a component from failure. Maintenance and maintenance effects are not described in this document but with the above in mind it becomes apparent that maintenance must affect a degradation mechanism in order to be successful and therefore effective. A component can fail in more than one way, hence there are more than one failure modes leading to failure and there is also multiple ways of degradation that eventually lead to a failed state. In order to be complete and exact we should measure every degradation mechanism leading to all the failure modes. This is perhaps not feasible, economically and technically and may also not be a very effective way. Without knowing what we want to measure we can easily end up with a massive amount of (calibrated-) data that makes it impossible to trend towards a condition or reliability or risk level. Evaluation and determination of the most dominant failure mode is a must-do at the beginning of component condition monitoring. Coming from the risk side: which failure causes the highest risk. That creates focus on where we should (practically) start. An often used method is an Failure Mode and Effect Analysis (FMEA). When done fully and correct the FMEA lists all known functional failure modes which are an indication of the damage mechanisms. The list of all damage mechanism can be filtered to assess the most dominant one. As an example: if wear of a component is the main degradation mechanism (multiple times faster than other known degradation mechanisms for that component such as i.e. creep, or corrosion) than it has value to measure the wear first and more often and not so much the corrosion. Over time corrosion may however reach its threshold limit of condition and reliability impact so it must also be taken into account, an
  • 3. PvA 06-2015 3 Technical Bulletin # 15 inspection scheme based on the fastest degradation mechanism will be capable to keep track of the corrosion as condition parameter. Saying this brings us to the degradation speed. Not all degradation goes equally fast, and not every component will experience a similar speed in degradation of similar degradation. A simple example for this are the wear of door hinges: depending on the weight distribution over the hinges one will wear more than the other although both are equally often used in opening and closing. Although both components in this example are of a similar type their condition will differ over time. In order to gain the highest effect of maintenance for these hinges the interval should be different depending on the speed of degradation, the applied effort of maintenance (i.e. greasing in this case) stays the same. The usage of components in a function is also an aspect that comes into play when addressing condition measurements. Often a system is built from redundant components or components that are only used on- demand. When one component is in operation the other one(s) are not operating, or a component is only in operation when it is required to operate at limited moments in time. Identical components, with identical degradation and damage mechanism operating in an identical system may differ in their condition due to the difference in usage. Measuring usage (or ‘units of operation’) together with the condition degradation is therefore important. • Components have different failure modes  degradation or damage mechanism for those failure modes. • One degradation mechanism has one best practice maintenance activity but the application frequency may differ due to the speed (or ‘degradation profile’) of that degradation mechanism. • One degradation mechanism leads to one functional failure but due to speed of degradation some components may fail sooner than others. • There is a most dominant degradation mechanism that often goes multiple times faster than others and leads more often to functional failure of that component. A proper executed and documented FMEA provides valuable information for this. • A functional failure (failed component) will therefore occur sooner in one case than in the other • A component (functional) failure may cause the system it operates in to stop: the risk of component failure depends on the risk of that system stop. • Every degradation mechanism has its own way of most effective measuring (i.e. sensor type). • Every (sensor-) measurement needs to be calibrated to be meaningful in degradation assessment • It is not feasible or effective to measure all degradation mechanisms. • Degradation itself needs to be translated onto an agreed condition level and the level of its functional failed state. • Every measurement has its own acceptance threshold. • Different usage causes a different degradation and condition even for identical components. Usage difference can occur i.e. when identical components operate in different system-functions or are part of a redundant set-up. • Similar can be stated for the load difference (Reference hinge example). This often occurs when similar components are used in different system-functions. • What we refer to as ‘condition’ is the level of degradation of a component compared to its ‘new’ state. This is called the “PF-interval”. “PF” stands for Potential to Failure in this case. PF indicates where a component is on its degradation path condition. Being able to plot a measured degradation value in a PF graph is the ideal world for condition based monitoring.
  • 4. PvA 06-2015 4 Technical Bulletin # 15 Figure 2: the PF graph. If we apply this to a group of similar components for one degradation mechanism than the spread in ‘degradation speed’ is a measure of how closely all components are together in their condition or on their way to failure. (Reference to aspect 12 in the above bullets). If several components are close together it means that they degrade equally fast, making i.e. maintenance for such ‘population’ easier to prevent from failures. A huge spread in degradation speed makes maintenance application more difficult. Differentiation in degradation may occur due to usage as described above. This is one of the pitfalls of condition based maintenance: it is often more effective but also more difficult from a planning and execution point of view. Condition based maintenance revolves around individual components and more than often it is assumed that there is a constant and equal usage and degradation causing maintenance activities to be less effective. In figure 3 a set of several identical components are plotted to show that they degrade differently and therefore have a different time to failure. Figure 3: difference in degradation speed: different PF.
  • 5. PvA 06-2015 5 Technical Bulletin # 15 The complexity rises due to the combination of all mentioned aspects. On top of the previously listed aspects there is: • Similar and non-similar components in every operating system with equal damage mechanism. • Similar components with equal degradation mechanisms have a different degradation speeds (figure 2) due to different usage (or load - different system-functions) • Non-similar components may have similar degradation mechanisms but at different speeds of degradation. • For component failure anticipation in systems it is important to be able to ‘look ahead” into the next timeframe in which that component needs to perform. Can a component with a certain level of degradation continue its function for the next X days without going towards a functional failure? By adding all this together it is not difficult to imagine that the amount of data that needs to be assessed, measured and translated or calibrated grows exponentially with the eagerness of trying to measure all in order to detect condition or degradation. Why would we use real time for condition degradation leading to functional failures? Application of condition based monitoring and maintenance instead of i.e. the time-based version already gives a major growth in data to be managed, application of real time monitoring for that (without automation/programming and proper reaction time on potential failure) could make it unmanageable. Why is reaction time important? Going back to the PF interval we see that degradation takes time. This time to a failure (the PF interval) can be used to react. Reaction can be ordering parts, prepare an operational stop, prepare overhaul or repair. The reaction time is an important value as it allows reacting on the measured degradation. If the PF interval (degradation speed) is fast there is little time to react, if the PF interval is longer there is more time to react. The shorter the PF interval or the higher the speed of degradation the more often one needs to measure in order to allow for proper reaction time on the coming functional failure. Speed of degradation is therefore directly linked to the speed of measurement. The speed of measurement is often called ‘sample time’. If a short sample time is adding value is also depending on the risk of component failure. As described the risk of component failure is dependent on the risk of the seizure of the system or function that component is operating in. • Whether or not measuring degradation ‘real time’ (continuous; = a very short sample time, milliseconds or shorter) depends on the speed of that degradation (the steepness of a PF graph) and the time needed to react on occurrence of extremely fast degradation and upcoming failure. • In most cases degradation does not go that fast and we do not have a very fast reaction required. • The risk of component failure adds an extra aspect to the value of a short sample time, i.e. by real time condition monitoring. • Real time condition monitoring application is technically advanced due to the often mutually applied automation and data-management and can become costly in comparison to what will be saved on the risk side. • Producing extreme data amounts without consideration could lead to ‘too much’ data and not being used at all due to that. That makes real time monitoring as a start less effective for condition monitoring without good consideration or immediate application of added automation to filter, trend and calibrate. Automation adds another layer of complexity and cost to a real time condition monitoring system. Real time monitoring for process control is different than for condition measurement. Real time for process control requires continuous input in order to control the process properly.
  • 6. PvA 06-2015 6 Technical Bulletin # 15 4 System and functions: multiple components, degradation speed and effect of failure. So far this document focused on single component degradation. A complete system is performing by means of several different functions. Each function is designed to perform a part of the total system performance. That functional performance is enabled by means of technical components that are ordered in a logical way to enable the functional performance together. With this in mind the applicable components are brought together by a design engineer. Similar components can be part of different functions. Different functions mean different condition degradation and different failure pattern, effect and probability and therefore risk of failure. If this is applied to i.e. a subsea Blow Out Preventer (BOP) there a multitude of components, component groups (i.e. “hoses”, or “Solenoid valves”) that together allow all different BOP functions to perform. The regulatory effect of a component failure on a specific function and on the whole BOP system is determined by means of LRED’s “BOP risk model” and expressed in such a way that it assists in the decision whether or not a BOP can stay in operation or needs to be repaired to remain compliant with the applicable regulations (known as the risk model ‘risk criteria’). In order to be able to use condition degradation to determine a risk level it needs to be identified: • What is the failure effect of component failure (in order to assess the impact of condition degradation) • What the failure modes are. • What causes that failure mode: the degradation mechanism? • What is the minimum acceptable condition level for a component in a specific function? (This differs per function and as explained is affected by the reaction time versus the degradation time). o Once this is documented a measuring application can be chosen in an economic and technically feasible way. Of course the measurement needs to be ‘calibrated’ to become meaningful; this is regarded as being part of the technical feasibility. • A selection of major critical component and their degradation should be studied to identify and document the economic and technical feasibility of condition monitoring based on the above and the cost of / feasibility to measure and calibrate the measurement. It must be realized that not only the current condition status is important but more so the ability to forecast condition and potential failure in the coming period: in BOP-terms: can a BOP operate theoretically in the next well to drill without critical component failures, and if not what needs to be done now? Due to the modus of BOP operations it is important to assess the level of degradation against the next operating window to evaluate if a component would survive that next operating period without functionally failing. Condition monitoring would than serve 2 purposes: • Condition assessment to assist operations with current risk level after component failure, • Enable the needed forecast for a next well to drill. The following figure shows the current coverage of the LRED’s BOP risk model (the brown-dashed line area)
  • 7. PvA 06-2015 7 Technical Bulletin # 15 Figure 4 The risk model coverage area is the ‘end result’ of a component failure on regulatory compliance. • To build an effective condition monitoring application it may be sensible to evaluate systems that have condition monitoring in their design and are able to forecast on condition data more so than systems that are designed to deal with operations decision support. • Whether or not Real Time monitoring of component condition is economic applicable depends as explained on the speed of degradation, reaction time on upcoming failure (in combination with the operating modus) and component criticality: failure effect. An important step between condition degradation and risk is the calculation of component reliability. Why? Reliability calculations allow for the forecast where the risk aspect shows the current status. (Even in the case of risk- ‘what-if’ scenarios since these do not address the level of condition degradation or reliability it just considers being failed). Reliability is also a means to express the ability of a component to perform. In order to be effective in that it must be evaluated and documented what the minimum reliability level is: it has little meaning to know that a component has a reliability of 88% if you do know what the minimum requirement is for that component in that function and have no idea about how this 88% came about: how long did it take, how was it used etc. This is required to evaluate future ‘survival’ or ‘failure’ of that component. In order to become aware of that takes some time: after several measured conditions and calculated reliability levels awareness will grow on what is acceptable and what is not acceptable. Adding mean or average failure probability for whole component groups such as MTTF, MTBF, and MUTF will provide an average idea of component reliability. Without the ability to forecast failure these values have very little value as you want to know how close you are to a potential failure based on condition information. Forecasting depends on the degradation profile and the way that profile is mathematically described. (There are several different degradation distributions). A Mean Time to Failure (MTTF) (also for MTBF and MUTF) is the average or mean of a component group or ‘population’ failures. Being a mean implies that a lot of components will fail sooner or later due the difference in degradation and therefore condition after “x” time or usage. This will happen in real life ina real operating system. This brings us to a third important aspect that will be addressed with component condition monitoring (real time or not) however with a longer lead time to become apparent. That aspect is that based on all measured and calibrated condition degradation information of similar components the (most likely-) in an earlier stage of risk and reliability management chosen MTTF / MTBF / MUTF can now be verified and
  • 8. PvA 06-2015 8 Technical Bulletin # 15 made optimized to that specific system leading to a higher accuracy in both operational decision making and forecast. Having these data available for use would also put LRED up front of current competitors. A graphical view on how this looks is shown in figure 5 Figure 5 From individual component degradation curves using a Weibull distribution to a bell-curve distribution and MTBF for the evaluated component population and different degradation speeds.
  • 9. PvA 06-2015 9 Technical Bulletin # 15 5 Conclusion. • Any application that will be evaluated to be used for the whole series of steps must be able to handle all these steps and the belonging information and calculation in a controlled, logical and transparent fashion. (steps: see figure 1) • It is by now believed that it is unlikely that a single system will be capable of handling all the required aspects. Best of breed systems with easy interfaces may need to be made. • LRE does own several systems that when combined will be able to address all mentioned aspects. • Although combining has not yet been successful in the above respect it makes sense to continue putting effort in this. • Can real time monitoring for condition measurements add value? Yes, but one needs to be considerate and with an economical mindset. Real Time monitoring of component condition is economic applicable depending on the speed of degradation, reaction time on effect of upcoming failure (in combination with the operating modus). • Condition monitoring in any form and data collection serves 3 purposes: o Condition assessment to assist operations with current risk level after component failure, o Enable the needed forecast for a next well to drill, o Allow for Mean Failure probability optimization (competitors advantage if collected and made re-usable). A simple means to improve reaction time is to set the functional failure limit higher which gives more reaction time (a tighter threshold for condition degradation). 6 The following was posted on OilPro as a post on a preventive maintenance topic: Time based maintenance is not in all occasion a bad idea. It depends on the critically of a component’s function and the ability to be maintained in any other way than on a time schedule. Regulations are most often also time based as that is easier to schedule and enforced. I am not defending time-based maintenance in all occasions but in some specific areas there is hardly another choice. A good option is to start growing from time based to usage based that is a major step for most asset operators. It does not only step away from the ‘mean’ failure time for a whole component population, but allows differentiating between different functions with similar components. A next step is to grow from usage to condition based. In that case the individual component in a circuit is monitored for its condition. That condition is “translated” or “calibrated” with the level of degradation for a single failure mode: we ask the question: how does it fail and how can we measure it? The level of accuracy grows with each steps above and also the amount of data we need to crunch. Saying that, means we need to be careful to our feed on our eagerness as technicians to measure everything as it can become very costly, and since the data input will explode, managing it to become useful information gets harder every extra GB. Real time monitoring for example? Is that really what we want: what does it bring, or is an ‘almost’ real time also OK? Let’s say a measurement every minute instead of every second gives already 60 times less data points….but no different results I think. (only when the PF-interval is much less than a minute which is very unlikely) Think about the effect of a measurement every millisecond…. Time based may be a very effective way of maintenance, given the time base is calibrated with what one wants to achieve. Usage based and more so for Condition based maintenance requires a lot more planning, preparation and schedule alignment and is therefore more costly from an operations point of view, in need of a lot of extra resources for data analysis and costs and may not always be in line with the risk that is linked to a specific component. All maintenance strategies, incl. time based, are in the mix when a whole maintenance system is taken into account. Using condition based is OK but apply with care and it must be feasible, cost vs. risk balanced and has quite some implications on a maintenance department’s resources