A multi phase decision on reliability growth with latent failure modes

A Multi-Phase Decision on
Reliability Growth with Latent
Failure Modes
(下一代制造业可靠性增长计
划的多级决策)
Tongdan Jin, Ph.D.
©2014 ASQ
http://www.asqrd.org

2
A Multi-Phase Decision on Reliability Growth
with Latent Failure Modes
Tongdan Jin, Ph.D.
Ingram School of Engineering
Texas State University, TX 78666, USA
6pm Pacific Time on Feb. 9, 2014

3
Contents
• The Needs for Reliability Growth Planning
• Reliability Growth considering Latent Failure Modes
• Multi-Phase Reliability Growth Management
• Applications to Electronic Equipment
• Conclusion

4
Topic I:
Reliability Growth Test (RGT)
Vs.
Reliability Growth Planning (RGP)

5
Reliability Growth for Capital Equipment
• Large and complex capital goods
• Long service time
• Prohibitive downtime cost
• Expensive in maintenance, repair, and overhaul
(MRO)
• Integrated product-service system

6
Reliability Growth Management
Design and
Development
Prototype and
Pilot Phase
Volume Production, Field Use and
After-Sales Support
Product Life Cycle
Reliability Growth Testing (RGT)
Reliability Growth Planning (RGP)

7
Why Need GRP?
• Shorter Time-To-Market
• Cut-off in Testing Budget
• Dispersed Design, Manufacturing, and Integration
• Usage Diversity
• Variable System Configuration
Basic subsys 1
Basic subsys 2
time
Basic subsys 3
Basic design Volume manufacturing and shipping
Adv. subsys 4
Adv. subsys 5
Adv. subsys 6
t1
t2 t3 t4t0
Figure 3 Compressed System Design Cycle

8
Reliability Post New Product Introduction
MTBF
System Install Base
SystemMTBF
FieldSystemPopulations
Chronological Time
Target MTBF

9
Different MTBF Scenarios
TimeTime (month)
Forecasted
Observed by OEM
Experienced by customer

10
System Failure Mode Categories
Failures Breakdown by Root-Cause Catagory
0%
10%
20%
30%
40%
50% Hardware
Design
Mfg
Process
Software
NFF
Four different modules,
Data from >100 systems
shipped within one year.
A
B
C
D

11
RGP Program: A Synergy of ECO and CA
Product Design
& Manufacturing
In-service
Systems
Spare
Inventory
Retrofit Loop
ECO Loop1. Failure mode analysis
2. Reliability growth
prediction
3. CA implementations
Spare
Batch
Repair
Center
Retrofit
Team
New System Shipping and Installation
ECO=Engineering Change Order
CA=Corrective Actions

12
Topic II:
Reliability Prediction Based on
Surfaced Failure Modes

13
Failure Intensity Rate w/o Latent Failures


n
i
iB
m
i
iAcs
tttt
1
,
1
,
)()()|( 
A,i(t)= failure intensity for failure mode i in A
B,i(t)= failure intensity for failure mode i in B
m = number of failure modes in A by time tc
n= number of failure modes in B by time tc
Where:
Time t0
Failureintensity
No trends
1(t)
2(t)
3(t)
4(t)
Trends
tc
a
b
c
d

14
Crow/AMSAA Growth Model
 









N
i
i
s
t
t
N
1
ln
ˆ 
 ˆ
ˆ
s
t
N

1ˆ
ˆˆ 
 
 tFailure Intensity:
2
2/1,2
ˆ
2



 N
N 2
2/,2
ˆ
2


N
N
Reject H0
Where
Hypothesis Testing:
H0: β=1, HPP
H1: β1, NHPP
or
0
1
2
3
4
5
6
0 1 2 3 4 5
FailureIntensity
Time
Various FailureIntensity Models
beta 1
beta 0.5
beta 1.5
=1 for all
ts=termination time, ti=ith failure arrival time
HPP=Homogenous Poisson Process
NHPP=Non-homogenous Poisson Process

15
Failure Intensity Function
 



n
i
ii
m
i
ics
i
ttt
1
1
1
)|( 

Constant Crow/AMSSA
Eq. (2)
)(2 t

16
Topic III:
Reliability Prediction considering
Latent Failure Modes

17
What is the Latent Failure Mode
1. Also known as dormant failure mode
2. Hibernated
3. Depending on customer usage
4. May caused by design weakness
5. Software bugs, and
6. Electro-statistic discharge (ESD)
7. Others ….

18
Surfaced & Latent Failure Modes for a Product
Days 7 14 21 84 105 161 168 210 231 266 287 315 343 350
Open Diode 1 1 2 6 7 7 7 9 15 16 16 17 17 17
Power Supply 0 1 1 1 2 4 4 4 4 4 6 6 7 7
Corupt ID Prom 0 2 2 2 2 2 2 2 2 2 2 2 2 2
Cold Solder 0 0 1 1 1 1 1 1 1 1 1 1 1 1
NFF 0 0 0 1 1 2 3 3 4 6 6 6 6 6
FluxContam 0 0 0 1 1 1 1 1 1 1 1 1 1 1
SMC Limit Table 0 0 0 0 1 1 1 1 1 1 1 1 1 1
Capacitor 0 0 0 0 0 1 1 1 1 1 1 1 1 1
PPMU 0 0 0 0 0 0 1 1 1 1 1 2 2 2
missing solder 0 0 0 0 0 0 1 1 1 1 1 1 1 1
Mfg defect 0 0 0 0 0 0 0 1 1 1 1 1 1 1
Bad ASIC 0 0 0 0 0 0 0 0 1 1 1 1 1 1
Fuse 0 0 0 0 0 0 0 0 0 1 1 1 1 1
Open Trace 0 0 0 0 0 0 0 0 0 0 1 1 1 1
Op-Amp 0 0 0 0 0 0 0 0 0 0 0 2 2 2
Timing Generator 0 0 0 0 0 0 0 0 0 0 0 1 1 1
Solder Short 0 0 0 0 0 0 0 0 0 0 0 0 1 1
Total per week 1 4 6 12 15 19 22 25 33 37 40 45 47 47
Number in the cell represent the failures observed between two consecutive time window.
FailureMode

19
Surfaced and Latent Failure Modes
Surfaced
Latent
Latent
A latent failure mode becomes a surfaced once it occurred.

20
Reliability Model with Latent Failure Modes




k
j
j
n
i
ii
m
i
ics
tttt i
11
1
1
)()|(  
• k=the number of new latent failure modes occurred in T.
• γj(t) =the failure intensity for the jth latent failure mode.
• Where t>tc.
Where
Projected latent failure
intensity after tc.

21
Estimate Cumulative Latent Failure Intensity
)()|(
1
1
1
tttt a
n
i
ii
m
i
ics
i
  








k
j
j
n
i
ii
m
i
ics
tttt i
11
1
1
)()|(  
 

ck
j
cj
c
a
k
j
j Tt
T
T
tt
11
)()()( 
(kc=# of latent failure
modes occurred in Tc)
where















c
c
c
cc
T
Tk
tt
ttk
k
0
)(
Eq. (4)
Eq. (5)
Eq. (3)

22
Summary of Latent Failure Mode Prediction
• Step 1: Estimate i(t) for surfaced failure mode i at tc
using Crow/AMSAA model
• Step 2: Obtain s(t|tc) using Eq. (2) on slide 14.
• Step 3: Estimate k and Γa(t) using Eq. (4) and (5)
• Step 4: Obtain the reliability growth model Eq. (3)
For more details, please also refer to T. Jin, H. Liao, M. Kilari, “Reliability growth modeling
for in-service systems considering latent failure modes,” Microelectronics Reliability, vol. 50,
no. 3, 2010, pp. 324-331.

23
Topic IV:
Reliability Growth Planning
Under
Budget/Cost Constraints

24
Recourses ($)
Spent on CA due to
1. Retrofit
2. ECO
Links:
$ of CA and
% reduction of a
failure mode
CA
Effectiveness
Function
Why Need the CA Effectiveness Estimate

25
0 c
x
1
effectiveness
b
c
x
xh 





)(
h(x)
CA budget ($)
Effectiveness Model
b>1
b=1
b<1
Modeling CA (or Fix) Effectiveness
b and c to be determined
Effectiveness=
Failure rate before CA – Failures rate after CA
Failure rate before CA
For more details on effectiveness function, please refer to T. Jin, Y. Yu, and F. Belkhouche, “Reliability growth using retrofit or
engineering change order-a budget-based decision making,” in Proceedings of IERC Conference, 2009, pp. 2152-2157.

26
An Example: ECO or Retrofit
A type of relays used on a PCB module fails constantly due to
a known failure mechanism. Two options available for
corrective actions
1. Replace all on-board relays upon the failure return of the
module
2. Pro-actively recall all modules and replace with new types
of relays having much higher reliability
CA Option Cost ($) CA Effectiveness
ECO Low Low
Retrofit High High

27
An Illustrative Example
The current failure rate a type of relay is 210-8 faults per
hour. Upon the implementation of CA, the rate is reduced to
510-9.
The CA effectiveness can be expressed as 0.75, that is
75.0
102
105102
8
98






28
Incorporate h(x) into
b
c
x
xh 





)(
)|( cs tt
)(11);(
11
11
tt
c
x
c
x
t a
c
x
ii
n
i
b
i
i
m
i
i
b
i
i
s
i
ib
i
i
ii























































 x
)()|(
1
1
1
tttt a
n
i
ii
m
i
ics
i
  





29
Optimization Formulation
Min:
Subject to:
xi0 for i=1, 2, …., m
Where
,
}


m
i
ixg
1
)(x
0);(  ts x
xi=CA budget for failure mode i, for i=1, 2, …, m.
0= target system failure intensity
RGP budget
Target reliability

30
Topic V:
Numerical Example
(Driving Electronic Equipment
Reliability)
The example is taken from the following paper:
T. Jin, Y. Yu, H.-Z. Huang, “A multiphase decision model for reliability growth considering
stochastic latent failures,” IEEE Transactions on Systems, Man and Cybernetics, Part A,
vol. 43. no. 4, 2013, pp. 958-966.

31
Overview of The Planning Horizon
Phase 1
Day 1-90
Phase 2
Day 91-220
Phase 3
Day 221-350
• Collect field data
• Identify surface failure
modes
• Reliability prediction for
Phase 2
• Resource allocation for
Phase 2
• Identify latent failure
modes
• Reliability prediction
• Implement CA/ECO
Phase 3
• Identify new latent failure
modes
• Reliability prediction
• Implement CA/ECO
Phase 4 (next)

32
Failure Inter-Arrival Times in Phase 1
i Days 7 14 15 21 84 85 87 89
1 Open Diode 1 1 1 1 1 1
2 Power Supply 1
3 EEPROM 1 1
4 Cold Solder 1
5 NFF 1
6 Flux Contam 1
FailureMode
Note: Numbers in the cell represents the failure quantity.

33
i Failure Mode
1 Open Diode 1.29E-6 1.413 2.28E-4 5.2174E-10
2 Power Supply 1.91E-5 1.00 1.91E-5 3.6398E-12
3 EEPROM 5.40E-3 0.544 1.42E-5 2.0126E-12
4 Cold Solder 1.91E-5 1.00 1.91E-5 3.6398E-12
5 NFF 1.91E-5 1.00 1.91E-5 3.6398E-12
6 Flux Contam 1.91E-5 1.00 1.91E-5 3.6398E-12
7 Latent Failures 2.39E-4 1.021 3.07E-4 9.4433E-10
iˆ
iˆ )](ˆ[ tE i ))(ˆvar( ti
Reliability Forecasting for Phase 2FailureMode

34
Optimal CAAllocation in Phase 2
i Failure Mode ci ($) bi xi ($)
1 Open Diode 430,000 1 412,790
2 Power Supply 150,000 1 0
3 EEPROM 250,000 1 0
4 Cold Solder 75,000 1 19,510
5 NFF 370,000 1 0
6 Flux Contamination 45,000 1 27,700
7 Latent Failures (Phase 2)
N/A N/A N/A
N/A=not applicable
FailureMode

35
Failure Inter-arrival Times in Phase 2
i Days 105 161 162 168 209 210
1 Open Diode 1 1 1
2 Power Supply 1 1 1
3 EEPROM
4 Cold Solder
5 NFF 1 1
6 Flux Contam
7 SMC Limit Table 1
8 Capacitor 1
9 PPMU 1
10 Missing Solder 1
11 Mfg Defects 1
FailureMode

36
i Failure Mode
1 Open Diode 2.30E-4 0.903 6.40E-5 4.09E-11
2 Power Supply 2.66E-5 1.02 3.40E-5 1.16E-11
3 EEPROM 2.51E-2 0.374 4.49E-6 2.02E-13
4 Cold Solder 8.27E-6 1.00 8.27E-6 6.83E-13
5 NFF 4.22E-11 2.14 9.46E-5 8.94E-11
6 Flux Contam 8.27E-6 1.00 8.27E-6 6.83E-13
7 SMC Limit Table 8.27E-6 1.00 8.27E-6 6.83E-13
8 Capacitor 8.27E-6 1.00 8.27E-6 6.83E-13
9 PPMU 8.27E-6 1.00 8.27E-6 6.83E-13
10 Missing Solder 8.27E-6 1.00 8.27E-6 6.83E-13
11 Mfg Defects 8.27E-6 1.00 8.27E-6 6.83E-13
Latent Failure in
Phase 3 8.46E-6 1.208 1.07E-4 1.15E-10
iˆ
iˆ )](ˆ[ tE i ))(ˆvar( ti
Reliability Forecasting for Phase 2

37
Optimal CA Budget Allocation in Phase 3
i Failure Mode ci ($) bi xi ($)
1 Open Diode 430,000 1 0
2 Power Supply 150,000 1 29,996
3 EEPROM 0 0 0
4 Cold Solder 0 0 0
5 NFF 370,000 1 250,004
6 Flux Contam 0 0 0
7 SMC Limit Table 20,000 1 0
8 Capacitor 23,000 1 0
9 PPMU 310,000 1 0
10 Missing Solder 9,000 1 0
11 Mfg Defects 12,000 1 0
Latent Failure in Phase 3 N/A N/A N/A

38
Prediction vs. Actual
0.0000
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0 50 100 150 200 250 300 350
Failures/hour
days
System Failure Intensity Function and its Prediction
Actual failure intensity
Prediction for Phase 2
Prediction for Phase 3
Phase 1 Phase 2 Phase 3

39
Conclusions
1. New designs are often subject to both components (hardware)
and non-components failures. Some failure modes are dormant.
2. RGP is a multi-disciplinary cross-function team effort as it
involves design, manufacturing, testing, operation,
maintenance as well as latent failures.
3. We proposes a CA effectiveness function and further integrates
it into the RGP model to achieve reliability target a lower cost.
4. An accurate reliability growth prediction is useful, yet it is
more beneficial to industry as when to reach the reliability goal
and how much resource (labor and budget) is required.

40
References
1. D. S. Jackson, H. Pant, M. Tortorella, “Improved reliability-prediction and field-reliability-data analysis for field-
replaceable units,” IEEE Transactions on Reliability, vol. 51, no. 1, 2002, pp. 8-16.
2. J. T. Duane, “Learning curve approach to reliability monitoring,” IEEE Transactions on Aerospace, vol. 2, no. 2, 1964, pp.
563-566.
3. L. H. Crow, “Reliability analysis for complex, repairable systems,” SIAM Reliability and Biometry, 1974, pp. 379-410.
4. M. Xie, M. Zhao, “Reliability growth plot-an underutilized tool in reliability analysis,” Microelectronics and Reliability,
vol. 36, no. 6, 1996, pp. 797-805.
5. D. W. Coit, “Economic allocation of test times for subsystem-level reliability growth testing,” IIE Transactions on Quality
and Reliability Engineering, vol. 30, no. 12, 1998, pp. 1143-1151.
6. M. Krasich, J. Quigley, L. Walls, “Modeling reliability growth in the system design process,” in Proceedings of Annual
Reliability and Maintainability Symposium, 2004, pp. 424-430.
7. S. Inoue, S. Yamada, “Generalized discrete software reliability modeling with effect of program size,” IEEE Transactions on
Systems, Man and Cybernetics, Part A, vol. 37, no. 2, 2007, pp. 170-179.
8. P. M. Ellner, J. B. Hall, “An approach to reliability growth planning based on failure mode discovery and correction using
AMSAA projection methodology,” in Proceedings of Annual Reliability and Maintainability Symposium, 2006, pp. 266-
272.
9. T. Jin, H. Liao, M. Kilari, “Reliability growth modeling for in-service systems considering latent failure modes,”
Microelectronics Reliability, vol. 50, no. 3, 2010, pp. 324-331.
10. T. Jin, Y. Yu, H.-Z. Huang, "A multiphase decision model for reliability growth considering stochastic latent failures," IEEE
Transactions on Systems, Man and Cybernetics, Part A, vol. 43. no. 4, 2013, pp. 958-966.
11. L. Attardi, G. Pulcini, “A new model for repairable systems with bounded failure intensity,” IEEE Transactions on
Reliability, vol. 54, no. 4, 2005, pp. 572-582.
12. M. S. Bazaraa, C. M. Shetty, Nonlinear Programming: Theories and Applications, 3rd edition, 2006, John Wiley & Sons,
New York.

41
Thank you
And
Questions ?
Email: tj17@txstate.edu

A multi phase decision on reliability growth with latent failure modes

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à A multi phase decision on reliability growth with latent failure modes

Similaire à A multi phase decision on reliability growth with latent failure modes (20)

Plus de ASQ Reliability Division

Plus de ASQ Reliability Division (20)

Dernier

Dernier (20)

A multi phase decision on reliability growth with latent failure modes