Taking HPC beyond energy efficiency; sustainability, the new measure of success. This document discusses how high performance computing (HPC) centers are moving beyond just energy efficiency to focus on sustainability. It introduces new metrics like Energy Reuse Effectiveness (ERE) and Carbon Usage Effectiveness (CUE) to measure sustainability. HPC workloads and architectures provide unique opportunities to harvest waste energy and calculate the carbon savings from HPC simulations, working towards the goal of developing sustainable exascale supercomputers.
Taking HPC sustainability beyond energy efficiency
1. Taking HPC beyond energy efficiency;
sustainability, the new measure of success
Michael K Patterson, PhD, PE, DCEP
Eco-Technology Program Office
1
2. Why HPC is different
• Procurement and refresh timing
• Workload and Utilization
• HPC and infrastructure link
• Best opportunity for Energy Reuse
• Minimal UPS
• Availability needs lower than Enterprise / Financial / IPDC
• Performance measurements higher than Enterprise /
Financial / IPDC
2
3. PUE – simple and effective
PUE is defined in terms of total annual energy and total
annual IT energy, allowing a more valid site-to-site
3 comparison
4. PUEs: Reported and Calculated
PUE
EPA Energy Star Average 1.91
Intel Jones Farm, Hillsboro 1.41
T-Systems & Intel DC2020 Test Lab, Munich 1.24
Google 1.16
Leibniz Supercomputing Centre (LRZ) 1.15
National Center for Atmospheric Research (NCAR) 1.10
Yahoo, Lockport 1.08
Facebook, Prineville 1.07
National Renewable Energy Laboratory (NREL) 1.06
4
5. PUEs: Reported and Calculated
PUE
It’s all about the “1”!
EPA Energy Star Average 1.91
Intel Jones Farm, Hillsboro
1.06 1.41
Focus on driving the 0.06 down?
T-Systems & Intel DC2020 Test Lab, Munich 1.24
Or work with the 1.0? How?
Google 1.16
Energy reuse
Leibniz Supercomputing Centre (LRZ)
Continued improvement in compute 1.15
performance
National Center for Atmospheric Research (NCAR) 1.10
Yahoo, Lockport 1.08
Facebook, Prineville 1.07
National Renewable Energy Laboratory (NREL) 1.06
5
6. Tick Tock of Energy Efficiency
Moore’s Law and IA Innovation
2 Socket Volume-Server Power vs. Performance
450
2004
400 Intel®
Power: Lower is better
Xeon®
2006
350 3.8GHz
Intel®
2009 2010
2008 Intel® Xeon® Intel® Xeon®
Xeon®
Power (W)
300 5160
Intel® Xeon® X5570 X5670
E5450
250
200
150
100 Higher Efficiency
50 TICK TOCK TICK TOCK TICK
Core 65 45nm Nehalem 45 32nm Sandy Bridge
0
0 200,000 400,000 600,000 800,000 1,000,000
SSJ Ops
Source: SPECpower_ssj2008* 2 socket results from SPEC.org as of August 2010
Performance: Higher is better
Double Efficiency every 16 months (67% CAGR)
Source: SPECpower_ssj2008* 2 socket results from SPEC.org as of August 2010
Performance and power consumption results are based on certain tests measured on specific computer systems. Any difference in system
hardware, software or configuration will affect actual performance. Configurations: Two-socket Systems, Test Results for SPECpower_ssj2008,
6 Testing by Hewlett-Packard. For more information go to http://www.intel.com/performance
7. LRZ-Munich
IBM/Intel supplying the new LRZ-Munich system. 40C water
to the servers. Free cooling 100% of the year. PUE ~ 1.15
Supplier must pay the power bill for the first several years of
the system operations.
Power Bill
K- Euros/TF
€2308
€113
€7
Energy cost / Tflop plummeting, but
Power bill hitting “political” ceilings
7
8. Free cooling in
Atlanta?
Recall that LRZ is
using 40 °C cooling
water
8
9. Ever hear someone say something like:
“I am re-using waste heat from my
data center on another part of my
site and my PUE is 0.8!”
9
10. “I am reusing waste heat from my data center
on another part of my site and my PUE is 0.8!”
• While re-using excess energy from the data center can
be a good thing to do, it should not be rolled into PUE.
The definition of PUE does not allow this. PUE is ALWAYS
greater than or equal to 1.0
•But there is a new metric to do this; ERE
10
11. Energy Reuse Effectiveness
A new energy efficiency metric
Similar to PUE but accounts for
reuse energy
PUE and ERE can both provide
insight
• Different perspectives on
efficiency vs reuse
11
12. ERE – adds energy reuse to the PUE concept
Rejected
Energy
Reused
(a) Cooling (f) Energy
(e)
IT (g)
Utility
(b)
UPS (c) PDU (d)
12
13. ERE Definition
Total Energy
PUE =
IT Energy
Cooling + Power + Lighting + IT
PUE =
IT
Total Energy - Reused Energy
ERE =
IT Energy
Cooling + Power + Lighting + IT - Reused
ERE =
IT
13
14. ERE Alternate Development
Define energy reuse factor
Then:
(ERF) as:
Reuse Energy ERE = (1 - ERF) × PUE
ERF =
Total Energy
And finally:
Cool + Pwr + Light + IT - Reused
ERE = = (1 − ERF) × PUE
IT
ERF and PUE are mathematically related, but differ and
need to defined and reported clearly.
14
15. Comparison with PUE
One view of PUE is that is the “tax” or burden in energy costs you must pay
above the IT load to run the Data Center; ERE allows the same vision
PUE = 1.0 means 100% of the energy you bring in to the data center goes to the
IT
ERE = 1.0 means you only need to bring into the site an amount equal to 100%
of the IT energy to support the Data Center
We need both!
Case 1 Case 2
PUE = 2.0 PUE = 1.2
ERF= 0.55 ERF=0.25
ERE = 0.9 ERE=0.9
Case 1 focus on PUE, Case 2 focus on ERF
15
16. Towards the Net-Zero Data
Center: Development and
Application of an Energy
Reuse Metric
Technical Paper presented last
June: ASHRAE Summer Meeting,
Montreal
16
17. PUE & ERE resorted….
PUE Energy Reuse
EPA Energy Star Average 1.91
Intel Jones Farm, Hillsboro 1.41
T-Systems & Intel DC2020 Test Lab, Munich 1.24
Google 1.16
National Center for Atmospheric Research (NCAR) 1.10
Yahoo, Lockport 1.08
Facebook, Prineville 1.07
Leibniz Supercomputing Centre (LRZ) 1.15 ERE <1.0
National Renewable Energy Laboratory (NREL) 1.06 ERE <1.0
17
18. Water and Carbon
– increasing focus on sustainability
Two new metrics for Data
Center sustainability
Published by The Green Grid
Development of the Metrics will
give better focus on Data Center
sustainability
18
22. Exascale by 2019-2020
Business-as-usual (2X perf/watt
every 16 months) 140 MW
Target 20 MW
22
23. Exascale Challenges
140 MW just won’t work (~$140M / year to operate)
Target is 20 MW!
Even with Moore’s Law like improvements we can’t
get there (20 MW) by 2019-2020
Government / Industry / Academia partnerships
must happen to get us there
If we don’t, HPC growth will stall!
If we do, amazing things can happen…..
23
24. TECHNOLOGY AND THE ENVIRONMENT
Drive Computing to
Be More
~2%
Energy Efficient
~2%
Opportunity
Use Computing to Improve
Energy Savings Outside
Information and
Communications Technology
98% Does HPC have a Carbon ROI?
The Big Opportunity Should we explore it?
24
25. Carbon Accounting
We will talk about a lot of metrics but here is one that we
might want to develop…
ܶ ݁ݏݑ ܶܫ ݕܾ ݀݁ܿݑܴ݀݁ ܾ݊ݎܽܥ ݈ܽݐ
ܱܲܥ =
ܶܶܫ ݕܾ ݀݁ݏܷ ܾ݊ݎܽܥ ݈ܽݐ
25
27.
ORNL ReView “Supercomputers Help Model Cars in Collisions”
27
28. Opportunities for the Sustainable HPC Center
HPC is NOT like your typical data center
Harvest the differences
Focus on the 1.0!
Workload
Architecture
Carbon ROI
Exascale R&D
Beyond Energy Efficiency to Sustainability
Sustainability Triangles
Metrics – ERE, CUE, & WUE
28