By the end of this presentation the attendees will understand the need for an Infrastructure Reliability and Risk Assessment for their critical environment as well as what types of systems should be included in the evaluation, how the evaluation should be performed to ensure tangible results, how it should be reported and ultimately how to interpret and utilize the information presented in the assessment to their advantage.
Presentation Outline
1. What is an Infrastructure Reliability and Risk Assessment and what do I need one for?
2. Who should perform an Infrastructure Reliability and Risk Assessment.
3. What information should be included in an Infrastructure Reliability and Risk Assessment.
4. What building systems should be included. This will be an infrastructure system by system approach.
5. What are the key things to look for when my study is complete?
A. Reliability Level.
B. Single Points of Failure within Critical Systems.
C. Redundancy of Critical Systems.
D. System Integration.
E. Adequacy of Engineered Systems (Exhaust Points).
F. Adequacy of Operations, Maintenance and Testing Programs.
G. Benchmark Findings with Industry Standards.
6. Availability, MTBF Calculations and Probability of Failure Calculations. What are they, who does them, what do they mean?
7. Computational fluid dynamic modeling.
8. How long should a study like this take?
9. Review of a sample study.
2. WHAT YOU NEED TO KNOW
AGENDA
• RISK ASSESSMENT
• INFRASTRUCTURE RELIABILITY
COOLING POWER
Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
3. RISK ASSESSMENTS
• WHY
• SITE EVALUATION
• METRICS
Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
4. Causes of Critical Failures
• Location
• Design
• Redundancy level
• Construction
• Quality of equipment
• Age Lurking Vulnerabilities
• Operations & Maintenance program
• Personnel training
• Level of operator coverage
• Thoroughness of the commissioning program
5
Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
WHY
6. Causes of Critical Failures
• Root cause not always easy to ascertain
• Combination of factors (Cascading Failures)
• Latent failures
• Most occur during change of state events
• More maintenance does not necessarily mean higher availability
• Non-Fault tolerant systems
WHY
FILURES Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
7. Causes of Critical Failures
Commissioning or
Test Deficiency
4%
System Design Equipment
Natural Disaster 20% Design
3% 13%
Maintenance
Oversight
4%
Equipment Failure
28%
Installation Error
10% Human Error
18%
WHY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessment
8. WHY DO RISK ASSESSMENT
• Alignment of business mission and facility performance expectation
• Quantifies the risk and exposure of the critical facilities to failure
• Identifies vulnerabilities and single points of failure
• First step in creating an action plan for site hardening
• Benchmark against the industry
• Assists in developing business case for capital expenditures
RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
10. SITE EVALUATION
STEP 2
• Develop PRA model (Probabilistic Risk Assessment)
• Identify Single Points of Failure within critical systems
• Evaluate redundancy of critical systems
• Capacity and expendability analysis
• Adequacy of Engineered Systems
• Operation and maintenance policies, practices and procedures
• Adequacy of maintenance and testing programs
• Evaluate risks associated with site location
• Overall Risk Analysis
• Evaluate the adequacy of operations and maintenance programs
RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
11. SITE EVALUATION
STEP 2 cont.
• Harmonics analysis
• EMF studies
• Short circuit & coordination studies
• Air flow modeling-CFD
RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
12. SITE EVALUATION
STEP 3
• Perform gap analysis
STEP 4
• Recommendations for upgrade/alteration to optimize facility
performance
• Budget and schedule development
• Assess risk during implementation
• Benchmark findings with industry standards
RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
13. RISK ASSESSMENT METRICS
• Probability of Failure/Reliability
• Availability
• MTTF
• MTTR
• Susceptibility to natural disasters
• Fault tolerance
• Single Points of Failure
• Maintainability
• Operational readiness
• Maintenance program
RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
15. RELIABILITY
• “Reliability” is used as an umbrella definition
• May Refer to Availability, Durability, Quality
• Five 9’s ????
• Reliability = Probability of Successful Operation
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
16. RELIABILITY AND AVAILABILITY
• Reliability predicts how likely is the system to fail.
• Availability is a measure (or a future prediction) of what percentage
of the time the system will operating properly
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
17. AVAILABILITY
Five 9’s refers to Availability
Availability (A) = Average fraction of time Something is in service
and performing intended function.
99.999% availability means:
• 5.3 minutes of downtime each year
or
• 1.77 hours of downtime every 20 years
Availability does not specify how often an outage occurs
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
18. AVAILABILITY
Availability (A) = MTBF/(MTBF + MTTR)
MTTF: Mean Time To Failure
MTBF: Mean Time Between Failures
MTTR: Mean Time to Repair or Downtime
MTBF=MTTF+MTTR
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
19. RELIABILITY BATHTUB CURVE
Failure Rate
early wear-out
life useful life period
0.5
Time (t) Years YEARS 12 14
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
20. RELIABILITY MODELING
• Used to compare system designs and assist in the evaluation of
risk versus the cost to mitigate the risk.
• Failure and Repair data comes from IEEE 493, Recommended
Practice for Design of Reliable Industrial and Commercial Power
Systems (IEEE Gold Book)
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
21. RELIABILITY MODELING
Components used for reliability modeling of the electrical system shown
here:
• Utility power
• Generator
• Circuit breakers
• Switchboards
• Cables
• Automatic Transfer Switch
• UPS module
• Battery
• Static Bypass Switch
• Rack Power
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
23. RELIABILITY MODELING
Shown below are the results of the calculations
Hours Hours
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
24. THE TRADITIONAL CLASSIFICATION SYSTEM
The Uptime Institute
Tier 1 – Basic Non-Redundant Data Center
Single path for power and cooling distribution without redundant
components
Tier 2 – Basic Redundant Data Center
Single path for power and cooling distribution with redundant
components
Tier 3 – Concurrently Maintainable Data Center
Multiple paths for power and cooling distribution with only one path
active and with redundant components
Tier 4 – Fault Tolerant Data Center
Multiple active power and cooling distribution paths with redundant
components and fault tolerant
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
25. Tier Definitions
TIER REQUIREMENTS
Tier I Tier II Tier III Tier IV
1 Active
Number of Delivery Paths 1 1 2 Active
1 Passive
Redundancy N N+1 N+1 2N Minimum
Compartmentalization No No No Yes
Concurrent Maintainability No No Yes Yes
Fault Tolerance No No No Yes
Availability 99.67 99.75 99.982 99.95
Downtime in Hr/Yr 28.8 22 1.6 0.4
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
26. Data Center Cost
From the UI
• Tier I - $10,000 US/kW of Useable UPS Power Output
• Tier II - $11,000 US/kW of Useable UPS Power Output
• Tier III - $20,000 US/kW of Useable UPS Power Output
• Tier IV - $22,000 US/kW of Useable UPS Power Output
• Plus $225 US/SF of Computer Room
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
27. HOW MUCH REDUNDANCY IS ENOUGH?
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
28. Reliability Considerations
Assumptions
• Various configurations examined for single or dual utility feeders, UPS,
Generators, STS’s, single or dual cords
• Compare Reliability at 2000 KW and 4000 KW Load
• 5 Year Probability of Failure
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
38. Reliability Considerations
Emergency Diesel Generators
fail to start
fail after ½ hour
fail after 8 hours
fail after 24 hours
Study Performed by Idaho National Engineering Laboratory – February 1996 at Nuclear Power Plants
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
39. Reliability Considerations
• 2(N+1) UPS/Generator with dual utility feeders - most reliable
topology
• 2(N+1) UPS > 2N UPS by small margin
• 2N > Distributed Redundant by small margin
• Significant improvement if a second utility feeder
is provided
• N+2 and/or 2N generator systems are more reliable than N+1
• Hybrid configuration in a hybrid facility is sometimes the best solution
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
40. Reliability Considerations
• Assess the condition of the mechanical plant in conjunction with the
electrical system
• The facility reliability will be driven by the least reliable component
(typically the electrical infrastructure)
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
41. System Reliability Block
Electrical System Electrical Mechanical
Electrical systempow ering the Mechanical systemsupporting critical
critical load load
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
42. System Reliability Block
MTBF Availability Pf (3 years)
Electrical system
alone 330,184 0.99999 8.10%
Mechanical system
alone 178,611 0.999943 11.70%
Electrical system
supporting mechanical 108,500 0.999985 21.40%
Overall mechanical
system 70,087 0.999931 29.20%
Combined electrical
mechanical system 57,819 0.999922 36.90%
Electrical System Electrical Mechanical
Electrical system powering the Mechanical system supporting critical
critical load load
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
43. The Cost of Reliability
Reliability
99.9999
99.999
99.99
99.9
99.0
.9
$ $$ $$$ $$$$ $$$$$
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
44. Key Takeaways – Risk Assessment
• What Reliability Level Do you Really Need Based on Your Business
Case?
• Minimize Single Points of Failure
• Concurrent Maintainability?
• Fault Tolerance?
• Ensure Adequacy of Operations, Maintenance and Testing Programs
• How to justify the cost to upgrade from present state?
RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
45. Key Takeaways – Reliability
• Design objective – find optimum compromise between cost and reliability
• Size matters – larger facilities yield lower reliability
• System architecture and design implementation is more important role
than equipment selection
• Segregate system in independent blocks
• Eliminate common source components to minimize fault propagation (i.e.
LBS, hot-tie, manual bus ties)
• Move single points of failures as close to the load as possible
• Always maintain two independent sources of power to the critical load
• Optimize the design of monitoring and controls circuits
• Keep it simple/minimize human intervention/Utilize Automation
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
46. Thank you and please feel
QUESTIONS? free to contact me
Steven Shapiro, PE, ATD
SShapiro@MorrisonHershfield.com
914.420.3213
http://www.linkedin.com/in/stevenshapirope
References:
Uptime Institute White Papers:
Tier Myths and Misconceptions
Data Center Site Infrastructure Tier Standard: Topology
47. Building Areas/Systems Reviewed
׀ General Construction
׀ Electrical
׀ Mechanical
׀ Plumbing And Fire Protection
׀ Operation and Maintenance
׀ Security
׀ Load Density
48
Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
RISK ASSESSMENT
48. Site Reliability
• Is Project Compatible With Zoning
• Natural Environment Issues
׀ Seismic Zone
׀ Geo Technical Reports
׀ Sub Surface Conditions
׀ Tornado/hurricane Risk
׀ Site Flood Potential
׀ Fire Potential
׀ Site Topography
׀ Weather Extremes
• Man‐Made Environment Issues
׀ Power/Data and Communication/Water Supply/Sanitary Sewer Availability
׀ ISP Connectivity to Mirror and DR Sites
׀ Proximity of Hazardous Operational Facilities, i.e. Nuclear Power Plants, Military Bases,
Chemical Plants, Tank Farms, Water/Sewage Treatment Plants, Dams/Reservoirs, Gas
Stations, etc.
׀ Distance to Airports & Freeways
׀ Distance to Emergency Services, i.e. Fire and Police Departments, Hospital
49
Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
RISK ASSESSMENT
49. Building Areas/Systems Reviewed
Building Utilities and Physical Issues
׀General building systems and area characteristics
׀Life safety and environmental
Electrical Systems
׀Utility feeders
׀Service entry
׀Base building electrical distribution system including busways, step‐down
transformers, switchgear and distribution panels
׀ Uninterruptible power supply (UPS) systems
׀ Battery systems
׀ Power Distribution System including the critical computer rooms
׀ Emergency/standby generator and fuel system
׀ Normal/standby power transfer switchgear
׀ Grounding
׀ Emergency Power Off Systems
׀ Lightning protection system
׀ Fire alarm and smoke detection systems
50
Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
RISK ASSESSMENT
50. Building Areas/Systems Reviewed
• Mechanical Systems
׀ Critical Systems Chilled Water Plant: Chillers, pumps, piping distribution system,
controls, etc
׀ Critical Systems Condenser Water System: Cooling towers, pumps, piping, etc
׀ Critical Systems Air Handling Systems
׀ Critical Systems Air Distribution
׀ Critical Systems Secondary Chilled Water Loop
׀ Fuel Oil Systems
׀ Boiler Systems
׀ Compressed Air Systems
• Plumbing Systems
׀ Domestic Water Systems
׀ Natural Gas Systems
׀ Fire Suppression Systems (Water and Gaseous)
• Operation and Maintenance of the Critical Support Systems
׀ Maintenance procedures and programs
׀ Normal operating procedures
׀ Emergency operating procedures
׀ Training programs and methods
׀ Spare parts
51
Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
RISK ASSESSMENT
51. Building Areas/Systems Reviewed
• Building Automation
׀ Building Automation Systems.
׀ Physical Security Systems.
׀ Access control
׀ Intrusion detection
׀ CCTV systems
׀ ID badging systems
׀ Intercom systems
׀ Smoke Purge Systems
• Technology Systems
׀ Entrance Facility Feeds.
׀ Telephone Company Services.
• Systems Integration:
׀ The integration, compatibility and interaction of the above systems with each
other, as well as with the other building elements will be reviewed to ensure that
the systems are compatible and fully integrated.
52
Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
RISK ASSESSMENT