Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Energy Efficiency in Large Scale Systems
1. Energy Efficiency in Large Scale Systems Gaurav Dhiman, Raid Ayoub Prof. Tajana ŠimunićRosing Dept. of Computer Science
2.
3. CoolingBy 2010, US electricity bill for powering and cooling data centers ~$7B[1] Electricity input to data centers in the US exceeds electricity consumption of Italy! [1]: Meisner et al, ASPLOS 2008 2
5. Effectiveness of DVFS For energy savings ER > EE Factors in modern systems affecting this equation: Performance delay (tdelay) Idle CPU power consumption (PE) Power consumption of other devices (PE)
6. Performance Delay Lower tdelay=> higher energy savings Depends on memory/CPU intensiveness Experiments with SPEC CPU2000 mcf: highly memory intensive Expect low tdelay sixtrack: highly cache/CPU intensive Expect high tdelay Two state of the art processors AMD quad core Opteron On die memory controller (2.6GHz), DDR3 Intel quad core Xeon Off chip memory controller (1.3GHz), DDR2
7. Performance Delay mcf much closer to best case on Xeon mcf much closer to worst case on AMD Due to on die memory controller and fast DDR3 memory Due to slower memory controller and memory
20. Energy Proportional Computing “The Case for Energy-Proportional Computing,” Luiz André Barroso, Urs Hölzle, IEEE Computer December 2007 Doing nothing well …NOT! Energy Efficiency = Utilization/Power Figure 2. Server power usage and energy efficiency at varying utilization levels, from idle to peak performance. Even an energy-efficient server still consumes about half its full power when doing virtually no work. 17
21. Energy Proportional Computing “The Case for Energy-Proportional Computing,” Luiz André Barroso, Urs Hölzle, IEEE Computer December 2007 It is surprisingly hardto achieve high levelsof utilization of typical servers (and your homePC or laptop is even worse) Figure 1. Average CPU utilization of more than 5,000 servers during a six-month period. Servers are rarely completely idle and seldom operate near their maximum utilization, instead operating most of the time at between 10 and 50 percent of their maximum 18
22. Energy Proportional Computing “The Case for Energy-Proportional Computing,” Luiz André Barroso, Urs Hölzle, IEEE Computer December 2007 Doing nothing VERY well Design for wide dynamic power range and active low power modes Energy Efficiency = Utilization/Power Figure 4. Power usage and energy efficiency in a more energy-proportional server. This server has a power efficiency of more than 80 percent of its peak value for utilizations of 30 percent and above, with efficiency remaining above 50 percent for utilization levels as low as 10 percent. 19
23. Why not consolidate servers? Security Isolation Must use the same OS Solution: Use virtualization!
30. How to Save Energy? VM consolidation is a common practice: Increases resource utilization Turn idle machines into sleep mode Active machines? Active power management: e.g. DVFS less effective in newer line of server processors Leakage, faster memories, low voltage range Make the workload run faster Similar average power across machines Exploit workload characteristics to share resources efficiently 23
61. Migrate if it does not reverse imbalanceVM1 VM2 VM2 VM1 nMPC > nMPCth vgnodenMPCmin 34
62. Implementation Xen 3.3.1 as the hypervisor vgxen implemented as part of the stock Xen credit scheduler vgdom implemented as a driver and application in Domain0 Communicates with vgxen through a shared page No modifications required to the guest OS! Used a testbed of Dual Intel Quad core Xeon based machines as vgnodes Linux based desktop used as vgserv vgdom VM1 Dom0 VM2 Xen vgxen 35
63.
64. Compare against ‘E+’: Eucalyptus + state of the art dynamic VM scheduling algorithms
77. Provides better stabilityReactive approach Lowers cooling savings Cannot minimize the noise level Impacts fan stability Challenge: Design of efficient proactive dynamic cooling aware workload management technique
78.
79. Migrate some of the active threads from the sockets with high fan speed to sockets with lower speed
80. Swap some of the hot threads from sockets with high fan speed with colder threads from sockets with lower speed.VPW VPA VPA VPC VPw VPY VPC VPY VPX VPB VPD VPZ VPZ VPB VPX VPD High speed Low speed Moderate speed Moderate speed
81.
82. If Fan speedM≥Fan speedN, we can swap the hot thread from socket N with colder threads from socket MVPA VPW VPW VPY VPA VPC VPC VPY PW ≤ PC+PD VPB VPD VPD VPX VPB VPX Moderate speed Moderate speed Moderate speed Low speed 46 46
91. Period ~ seconds @ the VP level Mark if savings exist Traverse VMs/VPs Schedule Evaluate Consolidation Savings Mark if savings exist 48
92.
93.
94. Dynamic load balancing minimizes the differences in task queues across various levels49 K. Skadron, et al. Temperature-aware microarchitecture, ISCA 2003.
95.
Notes de l'éditeur
This figure shows a typical fan controller that is based on a classical close-loop approach. The fan controller decides the required fan speed. The output of the controller is fed to the actuator to actually adjust the fan speed. The feedback is collected using thermal sensors (each CPU core has a dedicated thermal sensor) where the fan speed is in proportional to the highest temperature <click> The cooling optimizations techniques up until now focused mainly on the fan controller without including workload management which we show later that including workload management can results in a big cooling savings
Current load balancing do not consider cooling costs <click> Read the example to the audience (stop when you reach the equation) [The figure is the visual representation of the example]. In this figure we show a case of dual sockets (each socket has 4 cores where each runs 1 (thr=workload thread or job)<click> Thermal imbalance leads to cooling inefficiencies due to “cubic relation between fan speed and power” <click> This indicate that better workload assignment can improve the thermal distribution and lower cooling cost. The question is HOW and WHEN to schedule the workload
We utilize the freedom in migrating the workload around to perform cooling aware workload scheduling to minimize the cooling costs<click> The good news is that the migration overhead of the threads between sockets is minor since the temperature change is quite slow (order of sec) compared to the migration time (order of micro sec)<click> In this example we show a case of thermal imbalance between two sockets (one fan run at high speed while the other at low speed)<click> The challenge is which threads to migrate to get a better thermal and cooling balance. Then read the second bullet in the yellow box
The question that we need to answer is “when we should trigger the workload rescheduling”One way is to employ a reactive approach that acts when the system is in cooling inefficiency condition. The problem with this approach is that mitigating the inefficiencies require time (temperature changes slowly) which impacts the cooling savings, noise and may generate instability in the fan system <click> The alternative way is to use proactive researching that predict then avoid cooling inefficiencies at earlier point in time and reschedule accordingly. Read quickly the benefits in the green box<click> Read the challenge sentence
In this slide and the following one we illustrate the fundamental ways to deliver cooling savings: This slide explains “spreading the hot threads” concept to obtain cooling savings through creating a better temperature distribution across the CPU sockets. This technique needs to be applied when there is an imbalance in the heat sink temperature across the CPU sockets. To implement job spreading we can employ either job migration or swapping (read the two bullets briefly). <click> The example in the bottom clearly shows how spreading works. In the left side we have a case of big imbalance. To solve the imbalance we swap the hot threads (C,D) with the colder ones (W,X). The two fans now run at a moderate speed (savings is expected due to the cubic relation between fan power and speed)
Here we illustrate the second way to obtain cooling savings. The motivation is to concentrate more hot threads into fewer sockets while keeping their fan speed in almost the same. We apply this method when the average temperature across sockets is in similar range (it should be noted that consolidation is not opposite to the spreading but it can be applies on top of it)Consolidation can be implemented in two ways:Squeezing more hot jobs to the fan that is running more that what it should be (fan speeds is discrete, e.g 8 or 16 speeds)<click> The other way is to trade a (hot thread) from the socket that have lower fan with (colder threads but have similar total power) from the socket with higher fan speed to maintain temperature balance. This help lowering the fan speed of the socket that receives the cold threads while keeping the higher fan speed almost the same. The example below illustrate this case