SlideShare une entreprise Scribd logo
1  sur  30
Risk Implications of Digital RPS
     Operating Experience
                      For Presentation at
   IAEA Technical Meeting on Common-Cause Failures in
       Digital Instrumentation and Control Systems of
                     Nuclear Power Plants

                    June 19-21, 2007
                Bethesda, Maryland, USA



             Dr. John H. Bickel
     Evergreen Safety & Reliability Technologies, LLC

                                                        1
Motivations for this work:

No prior risk or importance analysis of
existing digital RPS failure experience exists
Prior NRC Research reports concluded LER
data too sparse to use
– Only found: 18 microprocessor failures, 4 software failures
– Suggested need to consider data from aerospace, medical, transport
  systems
Lack of data implied: inability to risk-inform
digital I&C applications and issues
My belief:
     Much more data actually exists on CE CPCS
     Risks from CPCS experience should be assessed
                                                                 2
                         JHBickel - ESRT, LLC
CE Digital Core Protection Calculator Basics:
CE High LPD, Low DNBR RPS design switched from analog
Thermal Margin/Low Pressure Trip to digital Core Protection
Calculators in mid 1970’s
Used 6 specially qualified minicomputers running stored
computer software and addressable constants
CPCS performs static/dynamic projections of local power
density and DNBR based upon:
     Ex-core neutron flux
     Pressurizer pressure
     Reactor Tcold, Thot
     RCP pump speed
     Control rod positions
CPCS generates: alarms, pre-trip, and trip safety actions
Original system was licensed on ANO-2 in 1978
Subsequently utilized at: SONGS-2/3, Waterford-3, Palo
Verde-1/2/3 …. and Korean Standard NPPs
                                                            3
                             JHBickel - ESRT, LLC
CE Digital Core Protection Calculator Basics:
CPCS credited for reactor trip for following events:
     Uncontrolled Control Rod withdrawal from critical (>10-4 power)
     Uncontrolled Boron Dilution from critical (>10-4 power)
     Uncontrolled Control Rod withdrawal from power operation
     Dropped, or mis-positioned Control Rods
     Ejected Control Rods
     Single RCP loss of flow
     Single RCP shaft seizure
     4-RCP loss of flow
     Electrical grid under-frequency
     Excess secondary steam flow (including turbine bypass valve
     malfunction)
     Excess feedwater flow
     Loss of feedwater heater
     Steam line break
     Single MSIV closure
     Rapid increase in local power                                  4
CE Digital CPCS Software Basics:




Software Design: “One Good Version” not “N-Version”
                                                      5
                    JHBickel - ESRT, LLC
CE Digital CPCS Interchannel Communications Basics:




 4 CPC computers evaluate: LPD, DNBR– using neutron flux, temperature,
 RCS flow and control rod position inputs in each quadrant
 2 CEA computers (CEACs) monitor all quadrants for CEA deviations within
 groups and generate Penalty Factors transmitted to all 4 CPCs
 CEACs communicate to CPCs via one-way “simplex” communication links
                                                                     6
                             JHBickel - ESRT, LLC
CE Reactor Protection System PRA Basics:
                                       PRA Assessments of overall
                                       CE RPS have existed for some
                                       time (2001)
                                       Component unavailabilities
                                       based on “time averaged”
                                       values
                                       NUREG/CR-5500 Vol.10:
                                             QRPS = 7.2E-6 (Digital
                                             CPCS, w/o Operator
                                             Action)
                                             QRPS = 1.6E-6 (Digital
                                             CPCS, w/ Operator
                                             Action)
                                       Relay and breaker CCF
                                       dominates predicted QRPS :
                                             CCF of master trip relays
                                             (K-1 through K-4)
                                             CCF of reactor trip
                                             breaker is not as
                                             significant on CE design
                                             due to configuration

                                                                  7
                JHBickel - ESRT, LLC
How This Study was Carried Out:
Failure experience from on-line NRC LER data base
currently goes back to 1984
   (NOTE: misses first 6 years ANO-2 experience)

Post-1984 CPC LERs on CE plants were evaluated
CPCS Failure experience categorized by subsystem
Size of operating experience pool:
     141 LERs (1984 – 2005)
     ~145.5 Rx years (or: 1.27x106 Rx hr)
     70 actual CPC reactor trip demands
     26 events involving latent CCF (including: 1 latent software CCF)
Subsystem failure rates calculated via Bayesian
estimation using Jeffrey’s non-informative prior
CCDP risk estimated via ASP approach
     Method highlights CCDP impact of “higher” than average unavailability
                                                                         8
                            JHBickel - ESRT, LLC
How Component Population Was Estimated:




Total CPCS subsystem operating time estimation was based upon
above component inventory per plant
Total CPCS operating time (for 4/4 Channel CCF estimation) was simply
total plant operating time.

                                                                  9
                          JHBickel - ESRT, LLC
How Subsystem Operating Time Was Estimated




   Each of 4 CPC Computers and 2 CEAC Computers contain: 1 processor
   board, 1 memory board, 1 multiplexer board, 1 external Watchdog Timer
   Each of 4 CPC Channels contains: 1 PZR pressure sensor, 3 ex-core
   neutron flux inputs, 4 RCP speed sensors, 2 Tcold and 2 Thot inputs
                                                                           10
                              JHBickel - ESRT, LLC
Subsystem failure rates were calculated via Bayesian
  estimation using Jeffrey’s non-informative prior




   Technique allows bounding failure rate estimation for “0” observed failures
                                                                                 11
                                JHBickel - ESRT, LLC
Failure Rate and Unavailability Estimation Issues

  Data Needs for Risk Estimation Process:
  Ability to estimate CCDP given specific event demands and
  event-conditional system unavailabilities (such as RPS)
  Includes conditional unavailability due to specific
  combinations of input conditions to digital system
       Certain software “bugs” only triggered by unusual input sets
  Overall RPS unavailability must consider combinations of
  random and CCF events
  Operating experience estimates failure rates: λ
  Conversion to RPS unavailability uses estimate of time to
  detect and restore: P = λ x (fault duration)
       In many cases for latent Digital CCFs fault durations are many months

                                                                          12
                              JHBickel - ESRT, LLC
Actual Design Basis CPCS Trip Demands




                                        13
              JHBickel - ESRT, LLC
Estimated CPCS Single Subsystem Failure Rates




                                         14
                  JHBickel - ESRT, LLC
CPCS Single Subsystem Failure Rates
Also important to note:
Failure modes of recent regulatory concern which have not occurred in
population exposure time
Recall failure rates can be estimated as: λ ~ 0.5/T
Faults propagated via inter-channel communication:
2 events noted involving loss of CPCS -> Plant Computer communications
link that resulted in failure to perform Tech. Spec. required cross-checks,
      λ = 2.5 / ( 6 x 1.27x106 hours) = 3.3 x10-7/hr
Other events in which communication link failure occurred without
operation impairment likely occurred but not reported in LER data base
Events involving a failure propagating to CPC or CEAC would be in LER
data base if they occurred
“0” events noted in which a communication link failure caused corruption to
CPC or CEAC channel, λ ~ 0.5 / ( 6 x 1.27x106 hours), or: ~ 6.6 x10-8/hr


                                                                       15
                              JHBickel - ESRT, LLC
Estimated CPCS Double Event Failure Rates




                                       16
                JHBickel - ESRT, LLC
Estimated CPCS System CCF Failure Rates




                                      17
               JHBickel - ESRT, LLC
Results: CPCS System CCF Failure Rates
                                       Computer Technicians insert Wrong Data Sets to all 4 CPCS Channels
Breakdown of Common Mode Failures
                                       Reactor Vendor supplies Erroneous Data Sets
                                       input to all 4 CPCS Channels
                                       Reactor Vendor Supplies Software Update Containing
      4% 4%                            Latent Software Error
                 11%
    4%                                 Operators Fail to Confirm ASI in all four CPCS Channels
   4%                                   when Reactor Power > 20%
                       8%              Incorrect Acceptance Criteria Used for
  4%
                                       Excore Data Set Calibration Checks >80%
                       4%              Inaccurate Cross Calibration of Excore Data Sets
 8%                                    (Cross Channel, COLSS, etc.)
                       8%              High Log Power Bypass Removal Setpoints (1E-4) Incorrect

                                       Inaccurate Cross Calibration of RCS Flow Data Sets
 11%                   4%
                                       (Cross Channel, COLSS, etc.)
                                       Operators Fail to Perform 12hr Auto-RESTART Surveillance
                                       on all CPCS Channels
             26%                       Operators Fail to Perform Refueling Interval Surveillance
                                       on all CPCS Channels
                                       Communication Data Link Failure to Plant Computer
                                       results in Missed Surveillances on both CEAC Channels
                                       2 of 2 CEACs Inoperable

                                       3 of 4 CPCS Neutron Flux Cross Channel Calibrations OOT


       The issue of latent software CCF represents only 4% of the CCF experience
       Calibration, generating, loading of incorrect data sets are the dominant
       sources of CCF
                                                                                                      18
                                    JHBickel - ESRT, LLC
Types of Observed CCF Events:

Inaccurate cross-calibration of all Ex-core neutron flux
(7 events) or all RCS flow channels (2 events)

Computer technicians insert wrong addressable
constant data sets into all 4 CPCS channels (3 events)
Swapping addressable data sets between units
CE supplies erroneous data sets (2 events)
Software update provided to plant with incorrect logic
for processing of indicated failed sensors (1 event)



                                                    19
                      JHBickel - ESRT, LLC
Risk significance of this failure experience?

None of actual CCF events resulted in core damage
(all were latent faults missing “triggering event”)
Need to consider CCDP implications of specific
failure modes
Intent: apply risk screening process similar to NRC
ASP program which focuses on higher than
average values of system unavailability
Use: ASP-type failure rate data, SPAR plant specific
risk models, actual observed unavailability
                  CCDP = Σ λi x PCPCS-CCF x HEPNR
                                   CPCS-
                  PCPCS-CCF = λCPCS-CCF x (duration of latent fault)
                   CPCS-       CPCS-


First: How sensitive is CCDP to RPS Logic CCF ?
                                                                       20
                                 JHBickel - ESRT, LLC
How sensitive is CCDP to RPS Logic CCF ?

RPS failure considers:
– Mechanical CCF jamming of
  control rods
– Relay/Breaker CCF failure
– RPS Logic CCF
– Operators fail to manually trip
– Operators fail to trip MG sets
Loss of Offsite Power
generates reactor trip
without RPS
Sensitivity studies
conducted using NRC
SPAR PRA models

                                                       21
                                JHBickel - ESRT, LLC
How sensitive is CCDP to RPS Logic CCF ?




Variations in RPS-LOGIC-CCF are not risk significant until > 1x10-3
                                                                      22
                            JHBickel - ESRT, LLC
Some example risk assessments of
   actual Digital CCF events




                                   23
            JHBickel - ESRT, LLC
1995 SONGS 2-3 Addressable Data Swapped
 Rod shadowing constants (on data disks) were swapped
 between adjacent SONGS units for 10,968 hours.
 Units at different power and burnup history, rod shadowing
 corrections thus different.
 Rod shadowing constants only impact power density
 predictions when control rods dropped, or partially inserted.
 PCPCS-CCF = 2.75x10-6/hr x 10,968 hr = 3.0 x10-2
 Summing over all initiating events involving dropped control
 rods and rod cycling tests, yields:
 CCDP < 0.488/yr x 3.0 x10-2 x 0.01 = 1.5 x 10-4
 This represents bounding conservative estimate because
 better knowledge of duty cycle of rod cycling tests would
 likely reduce by factor of 10 or more.


                                                            24
                         JHBickel - ESRT, LLC
1984 Erroneous Fx,y factors supplied by CE
        and uploaded to SONGS-2
 Incorrect Fx,y factors generated by CE and used for CPCS
 LPD calculations from 2-7-84 to 3-20-84 (1,032 hrs).
 Events such as this have occurred twice.
 PCPCS-CCF = 1.96x10-6/hr x 1,032 hr = 2.0 x10-3
 CCDP = 0.488/yr x 2.0 x10-3 x 0.01 = 1.5 x 10-4




                                                       25
                      JHBickel - ESRT, LLC
2005 Software Design Error in Software
    Upgrade at Palo Verde 2 for 2,736 hrs.
Original software design:
     Trip CPC channel if sensor detected to be “Failed – Out of Range”
Software hardware upgrade:
     Use inputs from two sets of instruments and multiplexers (primary and
     secondary)
Out of Range Sensor Failure:
     Primary detected sensor failure results in switchover to secondary.
     Out of Range Failure on secondary reverts to “last stored good value”
CCF of all sensors of one type could result in continuous use of
“last good value” in all 4 CPCS channels rather than TRIP.
PCPCS-CCF = 8 x PSensor-CCF x 2.75x10-6/hr x 2,736 hr
          =8 x 8.4 x10-4 x 2.75x10-6/hr x 2,736 hr = 5.0 x10-5
Given CCF of instruments, no credit for operators, HEP=1.0
CCDP = 0.289/yr x 5.0 x10-5 x 1.0 = 1.44 x10-5
                                                                             26
                              JHBickel - ESRT, LLC
PRPS-CCF values from
single events span many
decades
Fault duration times drive
PRPS-CCF values
Latent data uploading
errors are dominant
unavailability contributors
Data uploading errors
larger than relay and
breaker CCF found in
NUREG/CR-5500 (which
used time-averaged
values)

                      27
Event specific CCDP
also dominated by
data uploading errors
Latent software CCF
event is smaller due to
unlikelihood of
triggering condition.




                   28
Observations from this “Total Picture of RPS”:
Designers of Digital I&C not particularly surprised by relative
dominance of:
      Calibration problems and human errors uploading wrong data sets
      CCF due to errors by vendor in generating data sets
      These failure modes also existed in NPPs with Analog I&C
CCF Unavailability and event CCDP estimates from
operating experience are dominated by latent events with
very long fault duration intervals
Software-related CCF, while important, isn’t dominant CCF
source when actual operating experience is evaluated
      Likely because: software V&V processes more rigorous than
      operational controls after deployment at NPP
      Most-obvious software “bugs” generally caught by burn-in testing
      and qualification programs
      Software “bugs” triggered by highly unlikely input combinations are
      not key sources of RPS unavailability or CCDP risk
                                                                     29
                           JHBickel - ESRT, LLC
What is Concluded from all this?
To Digital I&C risk it’s necessary to view Total Picture of RPS
– not just “software” or : “microprocessors”:
      Final trip relays and trip breakers - will still be there
      Problems cross calibrating nuclear with thermal - will still be there
      Human errors inputting set-points and coefficients - will still be there
When this is done - Total Picture of RPS risk emerges
NPPs with CPCS have been operating since 1978 in typical,
controlled, nuclear operations environment, which includes:
      Vendor generation of cycle specific constants, set-points
      Routine hardware, software upgrades developed and installed
      Routine operation, trouble alarms, and alarm response
      Impact of Technical Specifications, Testing, Calibrations
Actual nuclear field reliability experience is better source of data
than non-nuclear sources or theoretical models
Ability to estimate, or bound risks of specific Digital I&C CCF
failure modes thus: clearly exists
                                                                          30
                              JHBickel - ESRT, LLC

Contenu connexe

Tendances

Making of a PD Data Acqusition System
Making of a PD Data Acqusition SystemMaking of a PD Data Acqusition System
Making of a PD Data Acqusition System
Vishal Mathur
 
1.training lte ran kpi &amp; counters rjil
1.training lte ran kpi &amp; counters rjil1.training lte ran kpi &amp; counters rjil
1.training lte ran kpi &amp; counters rjil
Satish Jadav
 
60936529 55241452-kpi-3 g-3[1]
60936529 55241452-kpi-3 g-3[1]60936529 55241452-kpi-3 g-3[1]
60936529 55241452-kpi-3 g-3[1]
picaraza9
 
An_FPGA_Based_Passive_K_Delta_1_Sigma_Modulator
An_FPGA_Based_Passive_K_Delta_1_Sigma_ModulatorAn_FPGA_Based_Passive_K_Delta_1_Sigma_Modulator
An_FPGA_Based_Passive_K_Delta_1_Sigma_Modulator
Matthew Albert Meza
 
OCP Server Memory Channel Testing DRAFT
OCP Server Memory Channel Testing DRAFTOCP Server Memory Channel Testing DRAFT
OCP Server Memory Channel Testing DRAFT
Barbara Aichinger
 

Tendances (14)

Key Performance Indicators (KPI)
Key Performance Indicators (KPI)Key Performance Indicators (KPI)
Key Performance Indicators (KPI)
 
3 g ibs walk test report dhk_v1415_tems
3 g ibs walk test report dhk_v1415_tems3 g ibs walk test report dhk_v1415_tems
3 g ibs walk test report dhk_v1415_tems
 
Making of a PD Data Acqusition System
Making of a PD Data Acqusition SystemMaking of a PD Data Acqusition System
Making of a PD Data Acqusition System
 
1.training lte ran kpi &amp; counters rjil
1.training lte ran kpi &amp; counters rjil1.training lte ran kpi &amp; counters rjil
1.training lte ran kpi &amp; counters rjil
 
60936529 55241452-kpi-3 g-3[1]
60936529 55241452-kpi-3 g-3[1]60936529 55241452-kpi-3 g-3[1]
60936529 55241452-kpi-3 g-3[1]
 
An_FPGA_Based_Passive_K_Delta_1_Sigma_Modulator
An_FPGA_Based_Passive_K_Delta_1_Sigma_ModulatorAn_FPGA_Based_Passive_K_Delta_1_Sigma_Modulator
An_FPGA_Based_Passive_K_Delta_1_Sigma_Modulator
 
3 gpp lte-rlc (1)
3 gpp lte-rlc (1)3 gpp lte-rlc (1)
3 gpp lte-rlc (1)
 
My Profile - SHI
My Profile - SHIMy Profile - SHI
My Profile - SHI
 
Ijeet 06 08_008
Ijeet 06 08_008Ijeet 06 08_008
Ijeet 06 08_008
 
An access point based fec mechanism for video transmission over wireless la ns
An access point based fec mechanism for video transmission over wireless la nsAn access point based fec mechanism for video transmission over wireless la ns
An access point based fec mechanism for video transmission over wireless la ns
 
LTE KPIs and Formulae
LTE KPIs and FormulaeLTE KPIs and Formulae
LTE KPIs and Formulae
 
Machine Learning Based Session Drop Prediction in LTE Networks and its SON As...
Machine Learning Based Session Drop Prediction in LTE Networks and its SON As...Machine Learning Based Session Drop Prediction in LTE Networks and its SON As...
Machine Learning Based Session Drop Prediction in LTE Networks and its SON As...
 
Introduction to Genex Assistance
Introduction to  Genex AssistanceIntroduction to  Genex Assistance
Introduction to Genex Assistance
 
OCP Server Memory Channel Testing DRAFT
OCP Server Memory Channel Testing DRAFTOCP Server Memory Channel Testing DRAFT
OCP Server Memory Channel Testing DRAFT
 

Similaire à Jh Bickel Risk Implications Of Digital Rps Operating Experience

Soft Error Study of ARM SoC at 28 Nanometers
Soft Error Study of ARM SoC at 28 NanometersSoft Error Study of ARM SoC at 28 Nanometers
Soft Error Study of ARM SoC at 28 Nanometers
Wojciech Koszek
 
LTE KPI Optimization - A to Z Abiola.pptx
LTE KPI Optimization - A to Z Abiola.pptxLTE KPI Optimization - A to Z Abiola.pptx
LTE KPI Optimization - A to Z Abiola.pptx
ssuser574918
 
AndreaPetrucci_ACAT_2007
AndreaPetrucci_ACAT_2007AndreaPetrucci_ACAT_2007
AndreaPetrucci_ACAT_2007
Andrea PETRUCCI
 
System on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsSystem on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power Electronics
journalBEEI
 
Arm7 microcontroller based fuzzy logic controller for liquid level control sy...
Arm7 microcontroller based fuzzy logic controller for liquid level control sy...Arm7 microcontroller based fuzzy logic controller for liquid level control sy...
Arm7 microcontroller based fuzzy logic controller for liquid level control sy...
IAEME Publication
 
ITER-India_Hitesh.ppt
ITER-India_Hitesh.pptITER-India_Hitesh.ppt
ITER-India_Hitesh.ppt
AshokSharma541535
 
Umts call-flow-scenarios overview
Umts call-flow-scenarios overviewUmts call-flow-scenarios overview
Umts call-flow-scenarios overview
aritra321
 
Mi rna data analysis 2013
Mi rna data analysis 2013Mi rna data analysis 2013
Mi rna data analysis 2013
Elsa von Licy
 

Similaire à Jh Bickel Risk Implications Of Digital Rps Operating Experience (20)

Soft Error Study of ARM SoC at 28 Nanometers
Soft Error Study of ARM SoC at 28 NanometersSoft Error Study of ARM SoC at 28 Nanometers
Soft Error Study of ARM SoC at 28 Nanometers
 
676.v3
676.v3676.v3
676.v3
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
 
3 gpp lte-pdcp
3 gpp lte-pdcp3 gpp lte-pdcp
3 gpp lte-pdcp
 
Self Organizing Network
Self Organizing NetworkSelf Organizing Network
Self Organizing Network
 
LTE KPI Optimization - A to Z Abiola.pptx
LTE KPI Optimization - A to Z Abiola.pptxLTE KPI Optimization - A to Z Abiola.pptx
LTE KPI Optimization - A to Z Abiola.pptx
 
AndreaPetrucci_ACAT_2007
AndreaPetrucci_ACAT_2007AndreaPetrucci_ACAT_2007
AndreaPetrucci_ACAT_2007
 
FVCAG: A framework for formal verification driven power modelling and verific...
FVCAG: A framework for formal verification driven power modelling and verific...FVCAG: A framework for formal verification driven power modelling and verific...
FVCAG: A framework for formal verification driven power modelling and verific...
 
System on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsSystem on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power Electronics
 
Arm7 microcontroller based fuzzy logic controller for liquid level control sy...
Arm7 microcontroller based fuzzy logic controller for liquid level control sy...Arm7 microcontroller based fuzzy logic controller for liquid level control sy...
Arm7 microcontroller based fuzzy logic controller for liquid level control sy...
 
ITER-India_Hitesh.ppt
ITER-India_Hitesh.pptITER-India_Hitesh.ppt
ITER-India_Hitesh.ppt
 
IRJET- Patient Health Monitoring System using Can Protocol
IRJET- Patient Health Monitoring System using Can ProtocolIRJET- Patient Health Monitoring System using Can Protocol
IRJET- Patient Health Monitoring System using Can Protocol
 
Coca1
Coca1Coca1
Coca1
 
OPAL-RT RT13 Conference: Rapid control prototyping solutions for power electr...
OPAL-RT RT13 Conference: Rapid control prototyping solutions for power electr...OPAL-RT RT13 Conference: Rapid control prototyping solutions for power electr...
OPAL-RT RT13 Conference: Rapid control prototyping solutions for power electr...
 
J05725055
J05725055J05725055
J05725055
 
Psoc
PsocPsoc
Psoc
 
Umts call-flow-scenarios overview
Umts call-flow-scenarios overviewUmts call-flow-scenarios overview
Umts call-flow-scenarios overview
 
Overview of DuraMat software tool development (poster version)
Overview of DuraMat software tool development(poster version)Overview of DuraMat software tool development(poster version)
Overview of DuraMat software tool development (poster version)
 
Mi rna data analysis 2013
Mi rna data analysis 2013Mi rna data analysis 2013
Mi rna data analysis 2013
 
RT15 Berkeley | Introduction to FPGA Power Electronic & Electric Machine real...
RT15 Berkeley | Introduction to FPGA Power Electronic & Electric Machine real...RT15 Berkeley | Introduction to FPGA Power Electronic & Electric Machine real...
RT15 Berkeley | Introduction to FPGA Power Electronic & Electric Machine real...
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Jh Bickel Risk Implications Of Digital Rps Operating Experience

  • 1. Risk Implications of Digital RPS Operating Experience For Presentation at IAEA Technical Meeting on Common-Cause Failures in Digital Instrumentation and Control Systems of Nuclear Power Plants June 19-21, 2007 Bethesda, Maryland, USA Dr. John H. Bickel Evergreen Safety & Reliability Technologies, LLC 1
  • 2. Motivations for this work: No prior risk or importance analysis of existing digital RPS failure experience exists Prior NRC Research reports concluded LER data too sparse to use – Only found: 18 microprocessor failures, 4 software failures – Suggested need to consider data from aerospace, medical, transport systems Lack of data implied: inability to risk-inform digital I&C applications and issues My belief: Much more data actually exists on CE CPCS Risks from CPCS experience should be assessed 2 JHBickel - ESRT, LLC
  • 3. CE Digital Core Protection Calculator Basics: CE High LPD, Low DNBR RPS design switched from analog Thermal Margin/Low Pressure Trip to digital Core Protection Calculators in mid 1970’s Used 6 specially qualified minicomputers running stored computer software and addressable constants CPCS performs static/dynamic projections of local power density and DNBR based upon: Ex-core neutron flux Pressurizer pressure Reactor Tcold, Thot RCP pump speed Control rod positions CPCS generates: alarms, pre-trip, and trip safety actions Original system was licensed on ANO-2 in 1978 Subsequently utilized at: SONGS-2/3, Waterford-3, Palo Verde-1/2/3 …. and Korean Standard NPPs 3 JHBickel - ESRT, LLC
  • 4. CE Digital Core Protection Calculator Basics: CPCS credited for reactor trip for following events: Uncontrolled Control Rod withdrawal from critical (>10-4 power) Uncontrolled Boron Dilution from critical (>10-4 power) Uncontrolled Control Rod withdrawal from power operation Dropped, or mis-positioned Control Rods Ejected Control Rods Single RCP loss of flow Single RCP shaft seizure 4-RCP loss of flow Electrical grid under-frequency Excess secondary steam flow (including turbine bypass valve malfunction) Excess feedwater flow Loss of feedwater heater Steam line break Single MSIV closure Rapid increase in local power 4
  • 5. CE Digital CPCS Software Basics: Software Design: “One Good Version” not “N-Version” 5 JHBickel - ESRT, LLC
  • 6. CE Digital CPCS Interchannel Communications Basics: 4 CPC computers evaluate: LPD, DNBR– using neutron flux, temperature, RCS flow and control rod position inputs in each quadrant 2 CEA computers (CEACs) monitor all quadrants for CEA deviations within groups and generate Penalty Factors transmitted to all 4 CPCs CEACs communicate to CPCs via one-way “simplex” communication links 6 JHBickel - ESRT, LLC
  • 7. CE Reactor Protection System PRA Basics: PRA Assessments of overall CE RPS have existed for some time (2001) Component unavailabilities based on “time averaged” values NUREG/CR-5500 Vol.10: QRPS = 7.2E-6 (Digital CPCS, w/o Operator Action) QRPS = 1.6E-6 (Digital CPCS, w/ Operator Action) Relay and breaker CCF dominates predicted QRPS : CCF of master trip relays (K-1 through K-4) CCF of reactor trip breaker is not as significant on CE design due to configuration 7 JHBickel - ESRT, LLC
  • 8. How This Study was Carried Out: Failure experience from on-line NRC LER data base currently goes back to 1984 (NOTE: misses first 6 years ANO-2 experience) Post-1984 CPC LERs on CE plants were evaluated CPCS Failure experience categorized by subsystem Size of operating experience pool: 141 LERs (1984 – 2005) ~145.5 Rx years (or: 1.27x106 Rx hr) 70 actual CPC reactor trip demands 26 events involving latent CCF (including: 1 latent software CCF) Subsystem failure rates calculated via Bayesian estimation using Jeffrey’s non-informative prior CCDP risk estimated via ASP approach Method highlights CCDP impact of “higher” than average unavailability 8 JHBickel - ESRT, LLC
  • 9. How Component Population Was Estimated: Total CPCS subsystem operating time estimation was based upon above component inventory per plant Total CPCS operating time (for 4/4 Channel CCF estimation) was simply total plant operating time. 9 JHBickel - ESRT, LLC
  • 10. How Subsystem Operating Time Was Estimated Each of 4 CPC Computers and 2 CEAC Computers contain: 1 processor board, 1 memory board, 1 multiplexer board, 1 external Watchdog Timer Each of 4 CPC Channels contains: 1 PZR pressure sensor, 3 ex-core neutron flux inputs, 4 RCP speed sensors, 2 Tcold and 2 Thot inputs 10 JHBickel - ESRT, LLC
  • 11. Subsystem failure rates were calculated via Bayesian estimation using Jeffrey’s non-informative prior Technique allows bounding failure rate estimation for “0” observed failures 11 JHBickel - ESRT, LLC
  • 12. Failure Rate and Unavailability Estimation Issues Data Needs for Risk Estimation Process: Ability to estimate CCDP given specific event demands and event-conditional system unavailabilities (such as RPS) Includes conditional unavailability due to specific combinations of input conditions to digital system Certain software “bugs” only triggered by unusual input sets Overall RPS unavailability must consider combinations of random and CCF events Operating experience estimates failure rates: λ Conversion to RPS unavailability uses estimate of time to detect and restore: P = λ x (fault duration) In many cases for latent Digital CCFs fault durations are many months 12 JHBickel - ESRT, LLC
  • 13. Actual Design Basis CPCS Trip Demands 13 JHBickel - ESRT, LLC
  • 14. Estimated CPCS Single Subsystem Failure Rates 14 JHBickel - ESRT, LLC
  • 15. CPCS Single Subsystem Failure Rates Also important to note: Failure modes of recent regulatory concern which have not occurred in population exposure time Recall failure rates can be estimated as: λ ~ 0.5/T Faults propagated via inter-channel communication: 2 events noted involving loss of CPCS -> Plant Computer communications link that resulted in failure to perform Tech. Spec. required cross-checks, λ = 2.5 / ( 6 x 1.27x106 hours) = 3.3 x10-7/hr Other events in which communication link failure occurred without operation impairment likely occurred but not reported in LER data base Events involving a failure propagating to CPC or CEAC would be in LER data base if they occurred “0” events noted in which a communication link failure caused corruption to CPC or CEAC channel, λ ~ 0.5 / ( 6 x 1.27x106 hours), or: ~ 6.6 x10-8/hr 15 JHBickel - ESRT, LLC
  • 16. Estimated CPCS Double Event Failure Rates 16 JHBickel - ESRT, LLC
  • 17. Estimated CPCS System CCF Failure Rates 17 JHBickel - ESRT, LLC
  • 18. Results: CPCS System CCF Failure Rates Computer Technicians insert Wrong Data Sets to all 4 CPCS Channels Breakdown of Common Mode Failures Reactor Vendor supplies Erroneous Data Sets input to all 4 CPCS Channels Reactor Vendor Supplies Software Update Containing 4% 4% Latent Software Error 11% 4% Operators Fail to Confirm ASI in all four CPCS Channels 4% when Reactor Power > 20% 8% Incorrect Acceptance Criteria Used for 4% Excore Data Set Calibration Checks >80% 4% Inaccurate Cross Calibration of Excore Data Sets 8% (Cross Channel, COLSS, etc.) 8% High Log Power Bypass Removal Setpoints (1E-4) Incorrect Inaccurate Cross Calibration of RCS Flow Data Sets 11% 4% (Cross Channel, COLSS, etc.) Operators Fail to Perform 12hr Auto-RESTART Surveillance on all CPCS Channels 26% Operators Fail to Perform Refueling Interval Surveillance on all CPCS Channels Communication Data Link Failure to Plant Computer results in Missed Surveillances on both CEAC Channels 2 of 2 CEACs Inoperable 3 of 4 CPCS Neutron Flux Cross Channel Calibrations OOT The issue of latent software CCF represents only 4% of the CCF experience Calibration, generating, loading of incorrect data sets are the dominant sources of CCF 18 JHBickel - ESRT, LLC
  • 19. Types of Observed CCF Events: Inaccurate cross-calibration of all Ex-core neutron flux (7 events) or all RCS flow channels (2 events) Computer technicians insert wrong addressable constant data sets into all 4 CPCS channels (3 events) Swapping addressable data sets between units CE supplies erroneous data sets (2 events) Software update provided to plant with incorrect logic for processing of indicated failed sensors (1 event) 19 JHBickel - ESRT, LLC
  • 20. Risk significance of this failure experience? None of actual CCF events resulted in core damage (all were latent faults missing “triggering event”) Need to consider CCDP implications of specific failure modes Intent: apply risk screening process similar to NRC ASP program which focuses on higher than average values of system unavailability Use: ASP-type failure rate data, SPAR plant specific risk models, actual observed unavailability CCDP = Σ λi x PCPCS-CCF x HEPNR CPCS- PCPCS-CCF = λCPCS-CCF x (duration of latent fault) CPCS- CPCS- First: How sensitive is CCDP to RPS Logic CCF ? 20 JHBickel - ESRT, LLC
  • 21. How sensitive is CCDP to RPS Logic CCF ? RPS failure considers: – Mechanical CCF jamming of control rods – Relay/Breaker CCF failure – RPS Logic CCF – Operators fail to manually trip – Operators fail to trip MG sets Loss of Offsite Power generates reactor trip without RPS Sensitivity studies conducted using NRC SPAR PRA models 21 JHBickel - ESRT, LLC
  • 22. How sensitive is CCDP to RPS Logic CCF ? Variations in RPS-LOGIC-CCF are not risk significant until > 1x10-3 22 JHBickel - ESRT, LLC
  • 23. Some example risk assessments of actual Digital CCF events 23 JHBickel - ESRT, LLC
  • 24. 1995 SONGS 2-3 Addressable Data Swapped Rod shadowing constants (on data disks) were swapped between adjacent SONGS units for 10,968 hours. Units at different power and burnup history, rod shadowing corrections thus different. Rod shadowing constants only impact power density predictions when control rods dropped, or partially inserted. PCPCS-CCF = 2.75x10-6/hr x 10,968 hr = 3.0 x10-2 Summing over all initiating events involving dropped control rods and rod cycling tests, yields: CCDP < 0.488/yr x 3.0 x10-2 x 0.01 = 1.5 x 10-4 This represents bounding conservative estimate because better knowledge of duty cycle of rod cycling tests would likely reduce by factor of 10 or more. 24 JHBickel - ESRT, LLC
  • 25. 1984 Erroneous Fx,y factors supplied by CE and uploaded to SONGS-2 Incorrect Fx,y factors generated by CE and used for CPCS LPD calculations from 2-7-84 to 3-20-84 (1,032 hrs). Events such as this have occurred twice. PCPCS-CCF = 1.96x10-6/hr x 1,032 hr = 2.0 x10-3 CCDP = 0.488/yr x 2.0 x10-3 x 0.01 = 1.5 x 10-4 25 JHBickel - ESRT, LLC
  • 26. 2005 Software Design Error in Software Upgrade at Palo Verde 2 for 2,736 hrs. Original software design: Trip CPC channel if sensor detected to be “Failed – Out of Range” Software hardware upgrade: Use inputs from two sets of instruments and multiplexers (primary and secondary) Out of Range Sensor Failure: Primary detected sensor failure results in switchover to secondary. Out of Range Failure on secondary reverts to “last stored good value” CCF of all sensors of one type could result in continuous use of “last good value” in all 4 CPCS channels rather than TRIP. PCPCS-CCF = 8 x PSensor-CCF x 2.75x10-6/hr x 2,736 hr =8 x 8.4 x10-4 x 2.75x10-6/hr x 2,736 hr = 5.0 x10-5 Given CCF of instruments, no credit for operators, HEP=1.0 CCDP = 0.289/yr x 5.0 x10-5 x 1.0 = 1.44 x10-5 26 JHBickel - ESRT, LLC
  • 27. PRPS-CCF values from single events span many decades Fault duration times drive PRPS-CCF values Latent data uploading errors are dominant unavailability contributors Data uploading errors larger than relay and breaker CCF found in NUREG/CR-5500 (which used time-averaged values) 27
  • 28. Event specific CCDP also dominated by data uploading errors Latent software CCF event is smaller due to unlikelihood of triggering condition. 28
  • 29. Observations from this “Total Picture of RPS”: Designers of Digital I&C not particularly surprised by relative dominance of: Calibration problems and human errors uploading wrong data sets CCF due to errors by vendor in generating data sets These failure modes also existed in NPPs with Analog I&C CCF Unavailability and event CCDP estimates from operating experience are dominated by latent events with very long fault duration intervals Software-related CCF, while important, isn’t dominant CCF source when actual operating experience is evaluated Likely because: software V&V processes more rigorous than operational controls after deployment at NPP Most-obvious software “bugs” generally caught by burn-in testing and qualification programs Software “bugs” triggered by highly unlikely input combinations are not key sources of RPS unavailability or CCDP risk 29 JHBickel - ESRT, LLC
  • 30. What is Concluded from all this? To Digital I&C risk it’s necessary to view Total Picture of RPS – not just “software” or : “microprocessors”: Final trip relays and trip breakers - will still be there Problems cross calibrating nuclear with thermal - will still be there Human errors inputting set-points and coefficients - will still be there When this is done - Total Picture of RPS risk emerges NPPs with CPCS have been operating since 1978 in typical, controlled, nuclear operations environment, which includes: Vendor generation of cycle specific constants, set-points Routine hardware, software upgrades developed and installed Routine operation, trouble alarms, and alarm response Impact of Technical Specifications, Testing, Calibrations Actual nuclear field reliability experience is better source of data than non-nuclear sources or theoretical models Ability to estimate, or bound risks of specific Digital I&C CCF failure modes thus: clearly exists 30 JHBickel - ESRT, LLC