SlideShare une entreprise Scribd logo
1  sur  207
CMG‘08 INTERNATIONAL
conference
Dr. Bruce Worthington
Principal Software Development Lead
Windows Server Performance
Microsoft Corporation
CMG‘08 INTERNATIONAL
conference
Server Power Ground Rules
 TANSTAAFL: Everything is a trade-off
 Performance, Power, Functionality, Capacity,
Cost, Reliability, Availability, Manageability,
Maintainability, Usability, Environmental
Impact, Lifetime, Footprint, Security, Morale
 Saving Power Power Efficiency
 More work at fixed power level, or
 Less power at fixed work level
 Shifting component power efficiencies
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 Windows Server 2003  2008
 Windows Server 2008 R2
 Power Diagnostics and Control
 Summary
CMG‘08 INTERNATIONAL
conference
Rising Cost of Ownership
 From 2000 to 2006
 Computing performance:  25x
 Energy efficiency:  8x
 US electricity cost:  1.35x
 Power per $1K of server:  4x
 Server(+) world electricity: >2x
○ >1% of total world production
 Datacenters use 2% of all US electricity
CMG‘08 INTERNATIONAL
conference
Scale: Kilowatts 
Megawatts
 Idle high-performance servers
 50-80% of max power draw
 2-sockets ~ 250 W
 4-sockets ~ 500 W
 8-sockets ~ 1000 W
 25 15Krpm 2.5” disks + SAN = 3U
 ~ 300/450 W (idle/active)
 10,000 2-socket 1U servers ~ 1-3 MW
 Datacenter “container” ~ 0.5 MW
 ~1500 servers + storage + infrastructure
CMG‘08 INTERNATIONAL
conference
Datacenter Energy Demand
 Data centers are energy intensive facilities
 Server racks now designed to carry 25 kW load
 Surging demand for data storage
 Typical facility ~ 1MW, can be > 20 MW (even 200 MW)
 Nationally 1.5% of US Electricity consumption in 2006
○ Doubling every 5 years
 Significant data center building boom,
 Power and cooling constraints in existing facilities
 Growing demand for compute cycles
 Growing computing performance
 Commoditized hardware
 Declining cost of computing
CMG‘08 INTERNATIONAL
conference
15 MW Datacenter Monthly
Costs
“Good” (PUE=1.7) Internet-scale datacenter with DAS
Servers
$3,000,000Infrastructure
$1,800,000
Power
$1,000,000
3 yr server and 15 yr infrastructure amortization
CMG‘08 INTERNATIONAL
conference
Air Movement
12%
Electricity
Transformer/
UPS
10%
Lighting, etc.
3%
Cooling
25%
IT Equipment
50%
Source: EYP Mission Critical Facilities Inc., New York
Other than a common power source they are not connected.
Datacenter Costs Breakdown -
2
CMG‘08 INTERNATIONAL
conference
Datacenter Costs Breakdown -
1
CMG‘08 INTERNATIONAL
conference
Electricity Use by End-Use: 2000 - 2006
CMG‘08 INTERNATIONAL
conference
Environmental Impact
 Governments, businesses, and
organizations are trying to reduce the
production of greenhouse gases
 New EPA Energy Star mandates for
enterprise server power efficiencies
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 ACPI Power States
 Component Power
 Windows Server 2003  2008
 Windows Server 2008 R2
 Power Diagnostics and Control
 Summary
CMG‘08 INTERNATIONAL
conference
ACPI Power State
Definitions
 Performance states (P-states)
 Dynamic voltage and frequency scaling
 More than linear savings (cubic function)
 Throttle states (T-states)
 Linear scaling of CPU clock
 “Power” states (C-states)
 Low-power idle (CPU “sleep”) states
 Turn off increasing amounts of silicon in package
 System sleep states (S-states)
 On, standby, hibernate, off
 MS has not encouraged S-state support for servers
○ Changing with the increased focus on power
CMG‘08 INTERNATIONAL
conference
ACPI Power State State
Machine
• For entire system
○ Global System States (G-States)
○ Sleeping States (S-States)
 Standby (S1), Hibernate (S2), …
 For processor only
 Processor Performance States (P-
States)
○ Different processor frequency and
voltage
 Processor Throttling States (T-States)
○ Processor clock throttling to reduce
processor utilization (and capacity)
 Processor Power States (C-States)
○ Processor is executing instructions
(C0)
○ Processor is idle (C1, C2, …)
 Other devices
 Device Power States (D-States)
○ Similar as C-States, but are for
devices other than processors
G3 -Mech
Off
Legacy
Wake
Event
G0 (S0) -
Working
G1 -
Sleeping
S4
S3
S2
S1
Power
Failure/
Power Off
G2 (S5) -
Soft Off
BIOS
Routine
C0
D0
D1
D2
D3
Modem
D0
D1
D2
D3
HDD
D0
D1
D2
D3
CDROM
C2
C1
Cn
Performance
State Px
Throttling
C0
CPU
CMG‘08 INTERNATIONAL
conference
ACPI Specification Versions
 WS03 complies with ACPI 2.0
 WS08 complies with ACPI 3.0
 Multiprocessor
○ Dependent (ganged) and independent control
○ Independent control w/ dependent behavior
(may transition or not based on other
processors’ states)
 MS has some ideas for ACPI 3.5
CMG‘08 INTERNATIONAL
conference
ACPI Power State
Dependencies
 Dependency Domains for ACPI power states (assumes
S0)
 Logical processors in the same domain should have the same
C-state, P-state, or T-state
 No dependence between a processor’s C-state domain, P-state
domain, or T-state domain
 OS control mechanisms based on dependency
relationships
 Dependent control: Transitioning one processor to a new state
causes other processor(s) to transition to the same state
 Independent control: Transitioning one processor to a new
P-state or T-state does not affect other processors’ power states
 Independent control, dependent behavior: Transitioning one
processor to a new P-state or T-state may or may not transition
other processor(s) to the same state based on the current state
of the other processor()s that share this relationship
CMG‘08 INTERNATIONAL
conference
P-States
 Windows processor performance states are
enabled by default
 Power policy allows flexible use of
performance states
 Values for min / max processor speed
 Expressed as a percentage of maximum
processor frequency
 Windows will round up to the nearest available state
 Processor- and workload-dependent impact
 E.g., one system configuration was determined to
have insignificant perf impact from capping P-states
at P1, but significant power savings
CMG‘08 INTERNATIONAL
conference
Power policy will always use DBS
between the range defined by min / max
frequency
Full range or subset of available P-states
Policy may be set to use only one performance
state (min / max / intermediate)
Will not include linear clock throttle states
CMG‘08 INTERNATIONAL
conference
Example: Processor state power policy
Note: This is the default policy in WS08
Intended to minimize performance hit
State Freq % Type
0 2800 100 Performance
1 2520 90 Performance
2 2142 85 Performance
3 1607 75 Performance
4 964 60 Performance
5 482 50 Performance
Maximum Processor State
Minimum Processor State
CMG‘08 INTERNATIONAL
conference
P-State Policy Settings
 Example: Processor state power policy
 Using a subset of available states
 Can use any contiguous range
 Some performance loss (may not be significant) unless P0
included (targets minimal perf hit)
State Freq % Type
0 2800 100 Performance
1 2520 90 Performance
2 2142 85 Performance
3 1607 75 Performance
4 964 60 Performance
5 482 50 Performance
Maximum Processor State
Minimum Processor State
CMG‘08 INTERNATIONAL
conference
Example: Processor state power policy
Locking processor at one state
Any available state may be selected
Some performance loss (may not be significant) unless P0 is
the state chosen (a la High Perf mode)
State Freq % Type
0 2800 100 Performance
1 2520 90 Performance
2 2142 85 Performance
3 1607 75 Performance
4 964 60 Performance
5 482 50 Performance
Min & Max Processor State
CMG‘08 INTERNATIONAL
conference
Setting P-State Parameters
CMG‘08 INTERNATIONAL
conference
Use Perfmon to Monitor P-State
Processor Performance / % of Max Frequency
CMG‘08 INTERNATIONAL
conference
Linear clock throttle states (T-states)
Compared to P-states, T-states do not save
energy when performing identical workloads
However, throttle states may be useful for
some scenarios (thermal overload)
By default, WS08 uses T-states only if P-
states are unavailable or in case of thermal
overload
No DBS: only the Maximum Processor State
parameter is used
CMG‘08 INTERNATIONAL
conference
Default use of linear throttle states
Performance is directly affected by throttling
State Freq % Type
0 2800 100 Performance
1 2520 90 Performance
2 2380 85 Performance
3 2100 75 Performance
4 1680 60 Performance
5 1400 50 Performance
6 1400 50 Throttle
7 1120 40 Throttle
8 840 30 Throttle
9 560 20 Throttle
DBS
Allowed
No DBS
Allowed
CMG‘08 INTERNATIONAL
conference
Power Capping / Budgeting
 Enforcing per-server power limits (static or dynamic)
 Calculations based on “plate rating” are often over-configured
○ Stranded capacity
 OS may not be able to respond fast enough to enforce hard limits
when power spikes
 Typically lower-power P-states attempted, then T-states engaged
as necessary
○ OS might not get a good estimate of the resulting effective frequency
○ Monitoring applications and diagnostic tools may give incorrect data
○ Opposite strategy from OS, where P-states move towards higher
performance modes when load increases
 Potentially huge (and potentially unexpected) hit in performance
right when it is most vital
○ Sudden hardware throttling should be last resort
CMG‘08 INTERNATIONAL
conference
C-States
 Although hardware may support more than
3 C-states, Windows only utilizes a maximum
of 3. But that doesn’t mean Windows only
uses the first three hardware C-states:
 C1 = hardware C1
 C2 = hardware C?
○ Lowest-power consuming c-state with _CST of type 2
 C3 = hardware Cn
 Wouldn’t expect P-state to affect C-state
power, but it does on some processors
 WS08R2 handles this by providing the capability to
drop to Pn before transitioning to C-state
CMG‘08 INTERNATIONAL
conference
Processor Power Management -
1
 CPUs have increasing number and
ranges of P-states and C-states
 Ballpark expectations per socket:
 A few watts per P-state
 Tens of watts for lowest C-state(s)
 Varying impact to server throughput and
responsiveness
 Mature, reliable technology
 Significant deployments in mobile and
desktops
CMG‘08 INTERNATIONAL
conference
Processor Power Management -
2
 No user intervention required
 Managed by the operating system
 Balances power savings with CPU
utilization
 Kernel selects target P-state based on
processor utilization history, Windows power
policies, thread scheduler, system heuristics,
node/socket/HW thread hierarchy
 Transition processor to “sleep” C-states when
idle (i.e., no thread to run on that processor)
CMG‘08 INTERNATIONAL
conference
Processor Power Management -
3
 Windows’ power policy includes various
parameters that influence how the
kernel chooses target power states
 Low voltage/power processors must be
evaluated and targeted for the right
scenarios
 Reduces OS power management flexibility
 Additional servers are required if the
workload is CPU-bottlenecked
CMG‘08 INTERNATIONAL
conference
Hardware Support
 The correctness of all PPM tools and settings
relies on accurate hardware / firmware support
 Broken BIOSes found in some previous-generation
servers
 Reporting
○ Initialization of ACPI tables (e.g., power states, memory and I/O
controller locations)
○ P-state and C-state monitoring
 Controlling
○ PPM algorithm depends on correct historical information
○ HW should comply/cooperate with OS power state requests
CMG‘08 INTERNATIONAL
conference
Processor Power Management
Working together with OEMs/IHVs -
1
 Hardware must support PPM capabilities
 ACPI namespace must describe capabilities and contain
processor objects
 On a processor there may be multiple independently-
managed power planes, potentially shared between
components, such as:
 Cores, Caches, Memory Controllers, and Bus/Serial interface(s)
to other processors or IO components
 The performance impacts of turning off various pieces of silicon
must be carefully weighed and understood
○ Snooping caches must be flushed before being shut down
○ Memory or IO channels attached to a package must still be
accessible by other packages
○ Bus/Serial interfaces must be running for active caches, memory,
or IO
○ Different components have different power-up delays from the
various power states they support
CMG‘08 INTERNATIONAL
conference
Collaborative Power
Budgeting
 Ideal WS08R2 strategy
 Platform guarantees operation within the
allocated budget (HW Fail-safe)
 OS scales power/perf according to workload
and respects platform notifications
 New R2 Beta option: OS specifies target
utilization and HW selects P-states accordingly
 Otherwise, if the OS and HW are fighting for
power management control, both power and
performance will suffer
 Hardware-directed power control settings are on by
default in some BIOSes
CMG‘08 INTERNATIONAL
conference
Servers Defaulting to Hardware-
Controlled Power Mgmt
 Hardware-directed power control settings are on by
default in some BIOSes
 Platform alters P-states, C-states, T-states, and/or D-
states without OS information
○ One alternative is to have platform dynamically restrict the
available states and update the OS via ACPI (<= 2 Hz)
 May take over processor performance counters!
○ Obviously this is a big concern when using performance
monitoring tools that utilize the on-CPU counters
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 ACPI Power States
 Component Power
 Windows Server 2003  2008
 Windows Server 2008 R2
 Power Diagnostics and Control
 Summary
CMG‘08 INTERNATIONAL
conference
Component Power Metering
• Only a small set of server models provide the
functionality of component power reporting
• Extra HW instrumentation (or fragile probing) is
needed to monitor the component power usages
for most platforms
• Simplest alternative is to populate and then
take away any removable components and track
the overall system power delta
CMG‘08 INTERNATIONAL
conference
Example Component Power Distribution
#1
Idle 3-Year-Old 4-Socket Single-Core Server
CMG‘08 INTERNATIONAL
conference
Example Component Power Distribution
#2
Idle 4-Socket Quad-Core Server
CMG‘08 INTERNATIONAL
conference
Example Component Power Distribution
#3
CPU (2)
46%
PCI Cards
(3)
17%
SCSI HDD (4)
12%
Mobo,
8GB RAM
18%
Other
7%
Processor power management represents the best opportunity today
Source: Intel Server Products Power Budget Analysis Tool
http://www.intel.com/support/motherboards/server/sb/cs-016976.htm
CMG‘08 INTERNATIONAL
conference
Selecting Memory Components
 Lots of permutations for a given capacity
 Family (e.g., DDR#)
○ FB DIMMs draw more power
 DIMM count
○ Especially for FB, where bus may decrease frequency if
enough DIMMs
 Bus frequencies
 Ranks
 Density
 Data width
 Channel count
 Low power memory must be evaluated and targeted
for the right scenarios
 Additional servers are required if the workload is memory-
bottlenecked
CMG‘08 INTERNATIONAL
conference
Memory Power Savings
 Select the right type and number of DIMMs for
the workload
 Reduce memory accesses
 Overall
○ Smaller working set
○ Better cache hit ratios
○ Probably better performance, too
 More memory power states
 Compare server memory idle characteristics to
mobile memory
 Deeper self-refresh states
○ Takes memory longer to come out of deeper states
CMG‘08 INTERNATIONAL
conference
“Green Memory”
Tech Marking Datarate Capacity Density DQ Ranks
Power/
DIMM
DDR2 PC2-5300 667Mhz 1GB 256Mb x4 DR 18.1W
DDR2 PC2-5300 667Mhz 1GB 256Mb x8 QR 18.6W
DDR2 PC2-5300 667Mhz 1GB 512Mb x4 SR 7.6W
DDR2 PC2-5300 667Mhz 1GB 512Mb x8 DR 7.8W
DDR2 ECC 667Mhz 1GB 1Gb x16 DR 6.1W
DDR2 No ECC 667Mhz 1GB 1Gb x16 DR 5.5W
No "by 16" part with 4Gb density
DDR2 PC2-5300 667Mhz 4GB 1Gb x4 DR 14.0W
DDR2 PC2-5300 667Mhz 4GB 1Gb x8 QR 14.4W
DDR2 PC2-5300 667Mhz 4GB 2Gb x4 SR 8.6W
DDR2 PC2-5300 667Mhz 4GB 2Gb x8 DR 8.8W
CMG‘08 INTERNATIONAL
conference
Networking Power
 NIC idle power (examples)
 100 Mb 1 W
 1Gb 5 W
 Quad 1Gb 5-9 W
 10Gb 10-15 W
 Quad 10Gb 17 W
 Don’t forget network switch power
 Windows Networking Optimizations
 NDIS DPC timer period
 Wake-on-LAN (see content in WinHEC 2008)
 Low Power on Network Disconnect
CMG‘08 INTERNATIONAL
conference
Hard Disk Power
 Decreasing radius
 Cubic power relationship (Power~Radius3)
 3.5” 15K RPM drive = ~12/18 W (idle/active)
 2.5” 15K RPM drive = ~6/9 W (idle/active)
 Decreasing rotational speed
 Quintic power relationship (Power~RPM5)
 15K RPM = 2 ms avg rotational delay (serial workload)
 10K RPM = 3 ms avg (~3-4 W idle)
 7.2K RPM = 4 ms avg (may have slower seek as well)
 Frequently spinning down enterprise drives not
advisable (yet)
CMG‘08 INTERNATIONAL
conference
Storage Controller Power
 HBA / storage connection interface
 E.g., PCI-X and PCI-e cards  5-8W idle
 Array Controller
 E.g., small SAN ctlr (2U) = 200/300 W
(idle/active in direct attached mode)
 Disk Interface
 SCSI: 80  160  320 GB/s
 FC: 1  2  4  8 Gb/s
 SAS/SATA: 1.5  3.0  6.0Gb/s
CMG‘08 INTERNATIONAL
conference
PCI-Express Power
Management
 Support for Active State Power
Management (ASPM)
 a k a, Link State Power Management
 In-box power policy for ASPM state
 Requires OS control of PCI Express
features
 Available white paper
CMG‘08 INTERNATIONAL
conference
Power Supply Efficiency
 Power Factor: phase delta between input
voltage and current
 Active Power Factor Correction (PFC)
 Ratio of input:output power (ACDC)
 Entropy means 100% efficiency is unobtainable
 Default supplies at 70%; new models up to 85%
 Previous power supplies were often
optimized for high workload levels, but
most servers run at 5-20% of capacity (for
now)
 Decreases power without decreasing perf
CMG‘08 INTERNATIONAL
conference
Power Supply Efficiency
“80 Plus”
 Requirement for Energy Star (July ‘08)
 80% minimum efficiency at 20%, 50%,
and 100% of rated output
 Previous power supplies often optimized for
high loads, but most servers run at 5-20%
 Minimum power factor of 0.9 or greater at
100% of rated output
 Decrease power without decreasing perf
CMG‘08 INTERNATIONAL
conference
Power Supply Waste Power
Efficiency
Output
Power
Required
Input Power
Waste
Power
Waste Power
Cost per
Annum
70 (default) 500 W 714 W 214W $183.15
80 (near 80 plus Bronze) 500 W 625W 125W $106.98
85 (80 plus silver) 500 W 588 88W $75.31
90(above 80 plus gold) 500 W 555 55W $47.07
CMG‘08 INTERNATIONAL
conference
Fan Power
 Fans in some 1U servers consume 15-
20% of overall system power
 Fixed vs. variable-speed fans
 Decrease power without decreasing perf
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 Windows Server 2003  2008
 Overview
 Server Power Measurements
 Windows Server 2008 R2
 Power Diagnostics and Control
 Summary
CMG‘08 INTERNATIONAL
conference
Windows Server 2003
 ACPI 2.0 compliant
 Windows processor driver required for
specific CPU make/model
 Requires selecting appropriate power policy
 Each system power policy includes a
processor throttling policy
 Highest (default), lowest, or full range of P-states
 OEMs or server administrators may create
additional power plans
CMG‘08 INTERNATIONAL
conference
Windows Server 2008 - 1
 ACPI 2.0 and 3.0 compliant
 Native OS support for PPM on
multiprocessor systems
 Default power settings refined for each
release (including WS08R2)
 Windows Server 2008 & SP2
 Simplified configuration model
 Group Policy over power settings
 Power management enabled by default
(“Balanced Mode”)
CMG‘08 INTERNATIONAL
conference
Power Plans
Power Plans Min P-state Max P-state
Balanced 5% 100%
Power Saver 5% 50%
High Performance 100% 100%
CMG‘08 INTERNATIONAL
conference
Windows Server 2008 - 2
 T-states used only when no P-states
available
 Power management parameterization for
improved flexibility of P- and T-state
algorithms
 Additional tunings available for OEMs to
customize to processor, chipset, platform, role,
etc.
 Improved C3 support
 Very hard to generalize, but 2-10%
improvement in power efficiency observed
at mid-to-low utilization levels (vs. 2003)
CMG‘08 INTERNATIONAL
conference
Processor Power
Management
Windows Server Releases Fully supported by WS03, WS08, and
WS08R2
 Feature parity with Windows client
operating systems
 For example, WS08 has full support for:
○ ACPI 2.0, 3.0 processor objects, Notify()
events
○ Power policy for tuning Operating System
(OS) target state algorithms
○ Deep idle C-states
CMG‘08 INTERNATIONAL
conference
Default Power Parameters -
1
* = May not appear in Control Panel options by default
PPM
parameters
Non-PPM
parameters
CMG‘08 INTERNATIONAL
conference
Default Power Parameters -
2
How
frequent
Change
P-state or not
How to
change
CMG‘08 INTERNATIONAL
conference
Default Power Parameters -
3
Entry idle,
promote only
Deep idle,
demote only
CMG‘08 INTERNATIONAL
conference
Idle Improvement
Techniques
 Shut down unnecessary services,
applications, roles, devices, drivers
 Avoid polling and spinning in tight loops
 Avoid high-res periodic timers (<10 ms)
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 Windows Server 2003  2008
 Overview
 Server Power Measurements
 Windows Server 2008 R2
 Power Diagnostics and Control
 Summary
CMG‘08 INTERNATIONAL
conference
Measuring Power
 Few existing Windows servers are equipped
with comprehensive power metering
capabilities
 In the future, servers are likely to have onboard
power meters
○ AC power (into the power supply)
○ DC power (out of the power supply)
○ For individual components (CPU, RAM, IO, fans, disks,
…)
 The Windows Server Performance team has
resorted to two strategies:
 Metering at the wall (AC)
 Directly probing specially manufactured server
motherboards (solder and data acquisition)
CMG‘08 INTERNATIONAL
conference
Measuring Power Efficiency
Which Watts/Amps to measure?
 Total server (wall)
power
 External power
 Network switches/hubs
 Storage (disks, array
controllers, SANs)
 Power distribution and
conditioning
 HVAC
 Internal component power
 Processor package
○ Threads, cores, caches, memory
controllers, cross-package
interconnect controllers, IO
controllers (e.g., PCI-E)
 Memory (controllers, DIMMs, ranks,
banks)
 Chipsets (north bridge, south bridge,
IO controllers)
 Power supplies
○ AC in, multiple DC out
○ Redundant (active/active,
active/passive)
 IO (network, storage, video, USB)
○ Embedded components and
expansion cards
 Fans and other internal misc.
CMG‘08 INTERNATIONAL
conference
Measuring Power Efficiency
 Traditional performance benchmarks optimize for
high throughput or low response time by using all
resources
 The load line approach tracks power use as load
varies
 Pick a power point and see how much load can be
handled
 Pick a load point and see how much power is required
 Workload breadth
 Database, web server, file server, etc.
 MS uses SPECpower (a la SPECjbb) and is adding
customer-accepted performance benchmarks
 TPC-C/E/H, SpecWeb, NetBench, SAP, SPEC, …
 Semi-internal: FSCT, LCW2, Web Fundamentals,
TermSrv, PerfGates, …
CMG‘08 INTERNATIONAL
conference
Measuring Power Efficiency
Which workloads to test?
 Workload breadth
 Database, web server, file server, etc.
○ Need to prioritize based on potential for power savings and for
broadest customer coverage
 Each has unique “work accomplished” metrics (e.g., ops per
second)
 Industry standard workloads, such as SPEC and TPC
 Custom workloads designed to test power scenarios
 Microsoft is currently using SPECpower and customer-
accepted performance benchmarks to convey power
efficiency
 TPC-C/E/H, SpecWeb, NetBench, SAP, SPEC, …
 Semi-internal: FSCT, LCW2, Web Fundamentals, TermSrv,
PerfGates, …
CMG‘08 INTERNATIONAL
conference
Industry Standard
Workloads SPEC
 SPECpower is the only standardized benchmark at this point
○ Single workload defined to date
 Order processing for a wholesale supplier running typical Java business applications
 Basically SPECjbb with some changes
 Minimal I/O and kernel time
○ Other SPEC benchmarks could have a “power” version, and each one may
or may not be modified from the “perf” version
 TPC
 Could add a power metric to each of their existing benchmarks, but
details are still being worked out
○ What is server power vs. storage power?
○ What needs to be installed in the audited server?
 I suspect they will stick to the same approach used for pricing, in that the system has to
be available as a purchasable product
 What about the “price” of power?
○ Etc.
CMG‘08 INTERNATIONAL
conference
Measuring Power Efficiency
Windows Server Performance Lab
 Methodology for obtaining power load line data
for TPC-C, TPC-E, FSCT, and Web
Fundamentals have been demonstrated
 Benchmark loads varied by throttling number of active
users
 Multiple workloads tested in Hyper-V environment
 SPECpower has been successfully tuned
 Data has been gathered on 2-, 4-, and 8-socket
systems with various processors
 Wall-socket power measurements
 Component power measurement by brute force (device
extraction)
CMG‘08 INTERNATIONAL
conference
Varying Load Levels
68
Iteration SPECpower
(Reduce load)
TPC-E
(Reduce users)
FSCT
(Increase users)
1 100% load 100% of max users 0 users
2 90% load ~90% of max users 10% of max users
3 80% load ~80% of max users 20% of max users
4 70% load ~70% of max users 30% of max users
5 60% load ~60% of max users 40% of max users
6 50% load ~50% of max users 50% of max users
7 40% load ~40% of max users 60% of max users
8 30% load ~30% of max users 70% of max users
9 20% load ~20% of max users 80% of max users
10 10% load ~10% of max users 90% of max users
11 0% load 0 users 100% of max users
Similar strategy used for Web Fundamentals
CMG‘08 INTERNATIONAL
conference
Testbed
CMG‘08 INTERNATIONAL
conference
HW and SW Test
Configurations
 Sample platforms
 2-socket and 4-socket quad-core
 8-socket dual-core
 x64 (AMD and Intel); ia64
 Hardware- and software-controlled power
management modes
 WS03, WS08, WS08SP2 (prerelease), and WS08R2
(prerelease)
 Windows power schemes
 Balanced, Higher Performance, Power Saver, …
 P-State settings and heuristics
 C-State settings and heuristics
 Parameterized power management optimizations
○ E.G., core parking, tick skipping
CMG‘08 INTERNATIONAL
conference
SPECpower: WS03 and
WS08
60%
70%
80%
90%
100%
0% 20% 40% 60% 80% 100%
Power(%ofMaxWatts)
Workload (% of Max ssj_opts)
W2K3.SP1 W2K8.RTM W2K8.SP2
2 sockets,
8 cores total
CMG‘08 INTERNATIONAL
conference
SPECpower & FSCT: WS03 and
WS08
SPECpower throughput and power
at different workload levels
on a 4-socket quad-core system
60%
65%
70%
75%
80%
85%
90%
95%
100%
0% 20% 40% 60% 80% 100%
Power(%ofmaximumwatts)
Workload (% of maximum throughput)
Windows Server 2003 Windows Server 2008
60%
65%
70%
75%
80%
85%
90%
95%
100%
0% 20% 40% 60% 80% 100%
Power(%ofMaximumwatts)
Workload (% of maximum throughput)
Windows Server 2003 Windows Server 2008
FSCT throughput and power
at different workload levels
on a 2-socket dual-core system
CMG‘08 INTERNATIONAL
conference
TPC-E: WS03 and WS08
TPC-E power usage at
varying workload levels
TPC-E power efficiency (tpsE/Watt) at
varying workload levels
70%
75%
80%
85%
90%
95%
100%
0% 20% 40% 60% 80%
Watts(%ofmaximum)
Workload (% of maximum tpsE)
Windows Server 2003 Windows Server 2008
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80%
tpsE/Watt(%ofmaximum)
Workload (% of maximum tpsE)
Windows Server 2003 Windows Server 2008
CMG‘08 INTERNATIONAL
conference
OOB Windows Server 2008
0%
20%
40%
60%
80%
100%
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
ssj_opsperWatt(%ofMaximum)
Power(%ofMaximum)
Workload (% of max ssj_ops)
power ssj_ops per Watt
SPECpower throughput (ssj_ops) and
power at varying workload levels
Processor utilization and frequency as
SPECpower workload decreases over time
70%
75%
80%
85%
90%
95%
100%
0%
20%
40%
60%
80%
100%
0 10 20 30 40 50 60 70
ProcessorFrequency(%ofMaximum)
AverageProcessorUtilization
Time (minutes) with decreasing workload
Processor Utilization Processor Frequency
Time (min)
CMG‘08 INTERNATIONAL
conference
TPC-E: Windows Server
2008
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
30 40 50 60 70 80 90 100 110 120 130 140
CumulativeP-StateDistribution
Time (minutes)
Distribution of P-States as workload
decreases over time
P0
P1
P2
P3
P4
C1
CMG‘08 INTERNATIONAL
conference
4 quad-core CPUs, 16 GB, RAID-5 array
Measured Projected
Server
Config
Active
Clients
Avg Watts kWh / yr Cost Kg of CO2
WS03, IIS6 0 468 4100 375 3190
WS08, IIS7 0 457 4000 357 3110
WS03, IIS6 20 537 4700 430 3660
WS08, IIS7 20 500 4380 401 3410
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 Windows Server 2003  2008
 Windows Server 2008 R2
 Server Energy Vision
 Idle Power Optimizations
 Core Parking
 Hyper-V (v2)
 Power Metering and Budgeting
 SSD
 Power Diagnostics and Control
 Summary
CMG‘08 INTERNATIONAL
conference
Windows Server Core Energy
Vision
 Dynamic Data Center
 Coordination across all data center
components to scale infrastructure and
computing according to business needs
 Scalable Node: Server power efficiency
 Low idle power consumption
 Power consumption should scale with load
CMG‘08 INTERNATIONAL
conference
Dynamic Data Center
 Holistic approach spanning all
infrastructure not just the computing nodes
 Reducing waste and optimizing
performance
 Scaling and migrating workloads
 Coordination with power and cooling systems
 Watch out for over-eager workload consolidation
or low-power component acquisition
 Building platform and management
infrastructure
CMG‘08 INTERNATIONAL
conference
Dynamic Data Center – The
Problem
 Addressing energy consumption in the data center
requires a holistic approach spanning all infrastructure not
just the computing nodes
 Many factors affect how a data center consumes energy
 Hardware, workload, time of day/week/year, locality, etc.
 Data centers are generally statically configured for peak load
 Tremendous opportunities for reducing waste and
optimizing performance exist
 Scaling and migrating workloads across groups of machines
 Coordination with power and cooling systems
 Opportunities also exist for unexpected reduction in computing
capacity through over-eager workload consolidation or low-
power component acquisition without proper planning / testing
CMG‘08 INTERNATIONAL
conference
Dynamic Data Center – The
Vision
 Enable the management of aggregate
servers in conjunction with data center
infrastructure
 Deliver this through building platform
and management infrastructure
 Power metering and budgeting
 Virtualization and workload migration
 Standards-based management technologies
 Coordination between in-band and out-of-
band management systems
CMG‘08 INTERNATIONAL
conference
Scalable Node
 Today power consumption does not scale in
line with server utilization
 Typical commodity servers consume 50-70% of the
maximum power when completely idle
 Basic approaches:
○ Increase server utilization via virtualization
○ Reduce power when full performance not needed
○ Power down / put to sleep excess servers
 Work with partners to provide the best power
and performance by managing the system
efficiently
 Windows power management improvements
CMG‘08 INTERNATIONAL
conference
Scalable Node – The
Problem
 Today power consumption does not scale in
line with server utilization
 Typical commodity servers consume 50-70% of
the maximum power when completely idle
○ Idle servers have low efficiency due to high idle power
○ Efficiency rises with utilization due to idle power
amortization
 Tremendous opportunities exist for reducing
energy needs
○ Reduce power when full performance is not required
○ Leverage virtualization solutions to increase server
utilization
○ Power down servers when they are not needed
CMG‘08 INTERNATIONAL
conference
Scalable Node – The Vision
 Work with partners to provide the best power
and performance by managing the system
efficiently
 Deliver this through improvements to
Windows Power Management
 Build on existing infrastructure and extend
Windows value
 Enhancements to processor power management
 Focus on idle and low-to-medium workload levels
 Support for device performance states
CMG‘08 INTERNATIONAL
conference
Windows Server 2008 R2 - 1
 Refined “Balanced Mode” defaults to
optimize power efficiency
 Takes advantage of advances in server
platform hardware (e.g., powering down
individual cores or sockets)
 Configurable power settings for new
features (e.g., core parking)
 P-state and C-state selection algorithms
updated
 Increased support for joint OS/HW power
management
CMG‘08 INTERNATIONAL
conference
Windows Server 2008 R2 - 2
 Simplified configuration model
 Group Policy control over all power settings
 Rich command line interface and refined UI
elements
 In-band WMI power metering and
budgeting support
 Remote manageability of power policy via
WMI
 Additional qualification logo to indicate
enhanced power management support
CMG‘08 INTERNATIONAL
conference
Windows Device Power
Management
 Extensible power policy infrastructure
 Allows easy incorporation of power
management-enabled devices
○ Device power settings integrate with Windows
system power policy
○ Device power settings can appear in
Advanced power UI
○ Rich notification support
 Allows for true OEM power management
innovation and value
CMG‘08 INTERNATIONAL
conference
Enhanced Power
Management Logo
 Additional Qualification logo for
“Enhanced Power Management” that
indicates support for the following:
 Processor power management through
Windows
 Power metering and budgeting
 Power On/Off via WS-Management
(SMASH)
CMG‘08 INTERNATIONAL
conference
Windows Server 2008 P-State
Parameters
Balanced Mode Settings WS08 R2 Pre-Beta WS08R2
Time Check 100 ms 100 ms 50 ms
Increase Time 100 ms 100 ms 50 ms
Decrease Time 300 ms 100 ms 50 ms
Increase Percentage 30% 70% 80%
Decrease Percentage 50% 30% 70%
Domain Accounting
Policy
0 (On) Always Off Always Off
Increase Policy IDEAL (0) IDEAL (0) SINGLE (1)
Decrease Policy SINGLE (1) SINGLE (1) IDEAL (0)
CMG‘08 INTERNATIONAL
conference
Optimized for Low-to-Medium Loads
 Even though 100% utilization may have the
highest power efficiency, few servers run at
full capacity
 Servers at maximum utilization provide less
opportunities for power optimizations
 In the short term, targeting low utilization
servers will provide most benefit
 In medium term, targeting medium
utilization servers will provide increased
benefit
 E.g, consolidation and virtualization will increase
average utilization levels
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 Windows Server 2003  2008
 Windows Server 2008 R2
 Server Energy Vision
 Idle Power Optimizations
 Core Parking
 Hyper-V (v2)
 Power Metering and Budgeting
 SSD
 Power Diagnostics and Control
 Summary
CMG‘08 INTERNATIONAL
conference
Get Idle; Stay Idle
 Shut down unnecessary services, applications, roles,
devices, drivers
 Avoid polling and spinning in tight loops
 Avoid high-res periodic timers (<10 ms)
 Timer Coalescing
 Intelligent Timer Tick Distribution (ITTD)
 Use NUMA-based affinity for threads and interrupts
 Thread (via APIs and tools): soft (IdealProc), hard (affinity
mask)
 Interrupts (via IntPolicy.exe)
 Idle improvements extend to Hyper-V
 Significant reduction in platform interrupt activity
 Enables power savings and greater scalability
CMG‘08 INTERNATIONAL
conference
Timer Coalescing
 Platform energy efficiency can be improved by extending
idle periods
 New timer coalescing API enables callers to specify a tolerance for
due time
 Enables the kernel to expire multiple timers at the same time
 Extensions should integrate with WS08R2 API/DDI
CMG‘08 INTERNATIONAL
conference
Intelligent Timer Tick Distribution
(Tick Skipping)
 Extend processor sleep states by not waking
the CPU unnecessarily
 CPU 0 handles the periodic system timer
tick; other processors are signaled as
necessary
 Non-timer interrupts will still wake sleeping
processors
 Not available on IA64
 Only enabled on systems with more C-states
than just C1
CMG‘08 INTERNATIONAL
conference
Background Process
Management
 Background activity on the macro scale (minutes, hours) is
also important for power
 E.g., disk defragmentation, AV scans
 Prevents low-power idle and sleep modes
 Will collapsing multiple background activities result in a
significantly heavier load during that interval and thus potentially
impede concurrent foreground activity?
 Unified Background Process Manager (UBPM)
 New WS08R2 infrastructure
 Drives scheduling of services and scheduled tasks
 Transparent to users, IT pros, and existing APIs
 Enables trigger-starting services
 Delivers usage data and metrics to Microsoft via CEIP
CMG‘08 INTERNATIONAL
conference
UBPM: Trigger-Start
Services
 Many services configured to Autostart and wait for rare
events
 UBPM enables Trigger-Start services based on
environmental changes
 Device arrival/removal, IP address change, domain join, etc.
 Examples
○ Bluetooth service is started only if a Bluetooth radio is currently attached
○ BitLocker encryption service started only when new volumes detected
 ISV Call to Action
 Leverage trigger-start capability for value-add services
 Validate performance impact with XPerf tools
○ Performance impact can be positive or negative
CMG‘08 INTERNATIONAL
conference
Coordinated Processor
Clocking Control
 New processor performance state interface
described via ACPI
 Feature enables OS and HW platform
coordination of processor power management
 Platform is in direct control of T-states and P-states
 OS dynamically specifies processor performance
requirements on per-processor basis as a
percentage of maximum frequency
 Platform is responsible for delivering requested
performance
○ In some cases, like a power budget condition, the
platform may underdeliver, but must report this
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 Windows Server 2003  2008
 Windows Server 2008 R2
 Server Energy Vision
 Idle Power Optimizations
 Core Parking
 Hyper-V (v2)
 Power Metering and Budgeting
 SSD
 Power Diagnostics and Control
 Summary
CMG‘08 INTERNATIONAL
conference
Processor Core Parking
 This is a Windows scheduler optimization, not HW!
 Goals
 Save power on multi-processor systems by dynamically
scaling number of active cores to match workload
 Drop parked cores into deepest C-states
 Approach
 Use historical information to predict future workload
 Calculate number of cores needed
 Heuristically select the “unparked” cores
 Monitoring
 Perfmon and ETW
CMG‘08 INTERNATIONAL
conference
Processor (Logical) Core
Parking
 Logical core = HW thread (e.g., Intel®
Hyperthreading)
 Extension of Windows’ processor
performance state engine
 Configurable via power policy settings
 Parking may reduce performance,
depending on the parameter settings, by
reducing OS responsiveness to rising load
levels
 Parking could improve performance by
concentrating work onto a smaller number
of cores
CMG‘08 INTERNATIONAL
conference
Selecting Cores to Park - 1
 WS08R2 (Beta) approach:
 Leave one logical core unparked per NUMA node
 Other possible approaches, including
customizable minimum unparked entities
 Park entire packages at once
 Park logical cores individually, regardless of
packages
 Leave one logical core unparked per socket
 Leave one logical core unparked per physical core
 Affinitized activity does tend to unpark logical
cores that must be used (selection heuristic)
 Beta tracks affinitized threads, not DPCs / Interrupts
CMG‘08 INTERNATIONAL
conference
Selecting Cores to Park - 2
 Parking algorithm takes many inputs. At a minimum:
 Time since the last parking decision was made
 Average frequencies of each core over the last time interval
 Average CPU “utilization” over the last time interval
 Possible additional inputs depending on parameter
setting and final WS08R2 refinements:
○ Power state domains (i.e., groups of associated cores)
○ Current processor P-States
○ P-State change rate policies (SINGLE, ROCKET, IDEAL)
○ Affinitized DPCs / Interrupts
○ Time spent in affinitized activity
○ More comprehensive or longer historical information
○ More system component topology information
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 Windows Server 2003  2008
 Windows Server 2008 R2
 Server Energy Vision
 Idle Power Optimizations
 Core Parking
 Hyper-V (v2)
 Power Metering and Budgeting
 SSD
 Power Diagnostics and Control
 Summary
CMG‘08 INTERNATIONAL
conference
Hyper-V Power
Management
 Full P-state/C-state management already
integrated between Windows root partition
and Hyper-V v1 (WS08)
 Enlightenments added in Hyper-V v2
(WS08R2)
 Hypervisor delivers child clocks without requiring
root interaction, plus Intelligent Timer Tick
Distribution (to children)
 Core parking enabled for all partitions
CMG‘08 INTERNATIONAL
conference
Web Fundamentals Dynamic: WS08
0
5000
10000
15000
20000
25000
250
270
290
310
330
350
370
0% 50% 100%
Throughput(Reqs/Sec)
Watts
System Utilization Percentage
Watts Throughput
For these experiments, the highest system utilization under WF
workload is ~80%. This issue has been subsequently resolved.
0
5000
10000
15000
20000
25000
250
270
290
310
330
350
370
0% 50% 100%
Throughput(Reqs/sec)
Watts System Utilization Percentage
Watts Throughput
Adding load to each guest Adding guests
CMG‘08 INTERNATIONAL
conference
SPECpower: WS08
0
50
100
150
200
250
300
250
270
290
310
330
350
370
390
0% 20% 40% 60% 80% 100%
Throughput(inThousands)
Watts
System Utilization Percentage
Watts Throughput
0
50
100
150
200
250
300
250
270
290
310
330
350
370
390
0% 20% 40% 60% 80% 100%
Throughput(inThousands)
Watts
System Utilization Percentage
Watts Throughput
Adding load to each guest Adding guests
CMG‘08 INTERNATIONAL
conference
SPECpower + WF (WS08)
SPECpower throughput and server power
usage versus total system utilization
Power usage for various
throughput levels
0
20
40
60
80
100
120
140
250
270
290
310
330
350
370
390
0% 20% 40% 60% 80% 100%
Throughput(inThousands)
Watts
System Utilization Percentage
Watts Throughput
60%
65%
70%
75%
80%
85%
90%
95%
100%
0% 20% 40% 60% 80% 100%Power(%ofWatts)
Workload (% of maximum throughput)
•4 Guests running WF (4940 requests/sec)
•~25% system utilization; ~35% guest virtual processor utilization
•4 Guests running SPECpower (similar efficiency as single workload)
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 Windows Server 2003  2008
 Windows Server 2008 R2
 Server Energy Vision
 Idle Power Optimizations
 Core Parking
 Hyper-V (v2)
 Power Metering and Budgeting
 SSD
 Power Diagnostics and Control
 Summary
CMG‘08 INTERNATIONAL
conference
Power Metering and
Budgeting
 In the future, servers are likely to have
onboard power meters
 AC power (into the power supply)
 DC power (out of the power supply)
 For components (CPU, RAM, IO, fans, disks, …)
 WS08R2 provides the capability to monitor
such meters, as well as communicate with
power management logic, through
standard Windows and ACPI interfaces
 Power budget information is reported to OS
 Optional support for configuring the budget from
within Windows
CMG‘08 INTERNATIONAL
conference
Power Metering and
Budgeting
System
Center
.
.
.
WMI Consumers
WMI Namespace
rootcimv2power
Power Supply
class
Power Meter
class
Power Meter
Events
User-mode
Power Service
Power
WMI
providers
Standard Windows IOCTL
interface
In-box ACPI-based
implementation
Vendors provide
ACPI code in
firmware
Other vendor
specific
implementations…
Implemented
in WS08R2
BMC hardware
Admin scripts
Hardware
Management
tools
CMG‘08 INTERNATIONAL
conference
Power Metering and
Budgeting WS08R2 introduces the ability to report power consumption
and budgeting information
 Server platform reports this in-band to the OS via ACPI
 No additional drivers are required or HW changes, only platform
support
 Power information is exposed via WMI
 Adheres to the DMTF Power Supply Profile v1.01
 Power budget information is reported to the OS
 Optional support for configuring the budget from within Windows
 Extendable to enable per-device metering
 WDM driver interface available
 Design goals
 Standard hardware and software interfaces
 Native infrastructure, easily extendable
 Leverages existing platform technology
CMG‘08 INTERNATIONAL
conference
Power Metering and Budgeting – Usage
 Statistical/inventory/auditing
 Data center can monitor power
consumption across nodes
 Administrator can write scripts to control
power policies and receive power condition
events
 Model can be extended to per-device
meters
 Another set of metrics for virtualization and
consolidation
CMG‘08 INTERNATIONAL
conference
Power Metering and Budgeting – WDM
 Standard Windows driver IOCTL interface
 Event model based on pending IO requests
(IRPs)
 Two separate device interfaces
 Consumed by the WMI providers
 An alternative to the ACPI implementation
 Future direction – potentially consumed by
the kernel power manager
 Documented on MSDN
CMG‘08 INTERNATIONAL
conference
Power Metering and Budgeting –
ACPI
 Rationale
 Works as the abstraction layer to the underlying
platform technology (IPMI, WSMAN, etc.)
 Scales across different platforms
 Does not require special drivers
 Requires only firmware updates
 Currently being proposed to the ACPI 4.0
specification
 Delegate tasks to the BMC (e.g., rolling
average calculation, polling for events, etc.)
CMG‘08 INTERNATIONAL
conference
Power Metering and Budgeting –
ACPI
 Power supply device
 Extends the current power source device
 Control method to publish capabilities
 Power meter device
 Similar to control method for batteries
 A set of control methods to get capabilities
and set configuration parameters, trip points,
and configure hardware enforced limits
 Event notification via Notify codes
CMG‘08 INTERNATIONAL
conference
Power Metering and Budgeting –
ACPI
 WS08R2 will provide
 In-box driver to support power meter device(s)
described in ACPI
 In-box IPMI operation region handler as part of
the Microsoft IPMI driver – allowing ACPI control
methods to communicate with IPMI using the
KCS protocol
○ Format similar to the SMBUS OpRegion
○ 3rd-party IPMI drivers can register OpRegion
handler
for other IPMI protocol(s)
○ Also proposed to ACPI 4.0 specification
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 Windows Server 2003  2008
 Windows Server 2008 R2
 Server Energy Vision
 Idle Power Optimizations
 Core Parking
 Hyper-V (v2)
 Power Metering and Budgeting
 SSD
 Power Diagnostics and Control
 Summary
CMG‘08 INTERNATIONAL
conference
WS08R2 Enables Improved
Endurance for SSD
Technology SSD can identify itself differently from HDD in
ATA as defined through ATA8-ACS Identify
Word 217: Nominal media rotation rate
 Reporting non-rotating media will allow WS08R2
to set Defrag off as default; improving device
endurance by reducing writes
CMG‘08 INTERNATIONAL
conference
WS08R2 Enables Optimization
for SSD Technology
 Microsoft implementation of “Trim” feature
 NTFS will send down delete notification to the device
supporting “trim”
○ File system operations: Format, Delete, Truncate,
Compression
○ OS internal processes: e.g., Snapshot, Volume
Manager
 Three optimization opportunities for the device
 Enhancing device wear leveling by eliminating
merge operation for all deleted data blocks
 Making early garbage collection possible for fast
write
 Keeping device’s unused storage area as high as
possible; more room for device wear leveling.
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 Windows Server 2003  2008
 Windows Server 2008 R2
 Power Diagnostics and Control
 Perfmon/Resmon
 Powertst
 Powercfg
 Summary
CMG‘08 INTERNATIONAL
conference
Check Processor ACPI
States:
System Event ID 4
CMG‘08 INTERNATIONAL
conference
Kernel Debugger
!ppmperf
Provides P-state and T-state information
!ppmidle
Provides C-state information
CMG‘08 INTERNATIONAL
conference
Monitoring Power Status - 1
 System Event Log: ID 4
 Perfmon/Logman
 Processor
○ Provide average C-state information
 % C1/2/3 Time and C1/2/3 Transactions/sec
 Processor Information
○ Parking status
 Processor Performance
○ Only present if P-states are exposed
○ Provide current P-state information (e.g., avg freq)
 Resource Monitor
 CPU % Max Frequency average and graph
CMG‘08 INTERNATIONAL
conference
Perfmon: Processor
Frequency
CMG‘08 INTERNATIONAL
conference
Perfmon: Processor
Frequency vs. Utilization
CMG‘08 INTERNATIONAL
conference
Resmon: Processor
Frequency
CMG‘08 INTERNATIONAL
conference
Monitoring Power Status - 2
 ETW tracing (Windows Perf Tool Kit)
 Xperf –on power
 Pwrtest.exe
 Logs use of P-, T-, and C-states
 Pwrtest /ppm
○ Sampling P-state and C-state performance
 Pwrtest /ppm /live
○ Event driven logging for all the P-state and C-
state transactions
CMG‘08 INTERNATIONAL
conference
Pwrtest.exe /info:ppm
C:Program FilesMicrosoft PwrTest>
pwrtest /info:ppm
PROCESSOR_POWER_INFORMATION
CPU Number = 0
MaxMhz = xxxx
CurrentMhz = yyyy
MhzLimit = zzzz
MaxIdleState = M
CurrentIdleState = N
InstanceName: CPU Model X
(continued)
Processor Performance States
PerfStates:
Max Transition Latency: xx us
Number of States: yy
State Speed (Mhz) Type
0 aaaa (100%) Perf
1 bbbb ( ss%) Perf
2 cccc ( tt%) Perf
3 dddd ( uu%) Throttle
4 eeee ( vv%) Throttle
5 ffff ( ww%) Throttle
CMG‘08 INTERNATIONAL
conference
Pwrtest.exe in Logging Mode - 1
C:Program FilesMicrosoft PwrTest> pwrtest /ppm
Elapsed Idle C1 C2 C3 P- Freq Freq Perf/
Cpu [ms] [%] [%] [%] [%] State [%] [MHz] Throttle
--- ------- ---- --- --- --- ----- ---- ----- --------
0 5007 98 0 73 26 2 54 1000 P
1 5007 99 0 93 6 2 54 1000 P
0 10014 97 0 72 27 2 54 1000 P
1 10014 97 0 91 8 2 54 1000 P
0 15021 88 1 0 0 2 54 1000 P
1 15021 89 1 0 0 2 54 1000 P
0 20028 99 0 0 100 2 54 1000 P
CMG‘08 INTERNATIONAL
conference
Pwrtest.exe in Logging Mode - 2
C:Program FilesMicrosoft PwrTest> pwrtest /ppm /live
Waiting for PPM Events. Press 'Ctrl-C' to quit...
Timestamp Proc# Event Information
-------------------------------------------------------------------------------
21:27:41.133 0 Idle State Demotion (Old:2, New:1, Affinity:0x1)
21:27:41.133 1 Idle State Demotion (Old:2, New:1, Affinity:0x2)
21:27:41.196 1 Perf State Change (State:0, Speed:1833 Mhz)
21:27:41.196 1 Domain Perf State Change
(State:0, Speed:1833 Mhz, Affinity:0x3)
21:27:41.196 0 Idle State Demotion (Old:1, New:0, Affinity:0x1)
CMG‘08 INTERNATIONAL
conference
Setting P-State Parameters
CMG‘08 INTERNATIONAL
conference
Power Controls:
Powercfg.exe
 Configure power settings within a
specific power scheme (WS03+)
 WS08R2: Detect common energy
efficiency problems (via /ENERGY flag)
 USB device selective suspend
 Processor Power Management (PPM)
 Inefficient power policy settings
 Platform timer resolution
 Platform firmware problems
 …and more
CMG‘08 INTERNATIONAL
conference
Configure power setting within a specific power
scheme
Set AC, DC values for individual settings
Every power setting belongs to a Subgroup
-setdcvalueindex used for battery scenario
C:> powercfg.exe –setacvalueindex
<SCHEME> <SUBGROUP> <SETTING> <VALUE>
C:> powercfg.exe –setacvalueindex
SCHEME_BALANCED SUB_SLEEP STANDBYIDLE 0
CMG‘08 INTERNATIONAL
conference
Power Efficiency
Diagnostics
 “Powercfg /ENERGY” to start tracing
 Close open applications and documents first
 Inbox with WS08R2 only
 Leverages new inbox ETW instrumentation
 Advanced users can run utility and view HTML output
 Automatically executed when the system is idle [Win7]
 Reports data to Microsoft via Customer Experience Improvement
Program (CEIP)
 Attend
for demo and details
CMG‘08 INTERNATIONAL
conference
Power Efficiency Diagnostics
CMG‘08 INTERNATIONAL
conference
Power Efficiency Diagnostics
CMG‘08 INTERNATIONAL
conference
Lab Issues: Processor
Utilization is Based on Non-Idle
Wall Time Idle == idle loop or HALT
 It doesn’t take frequency into account, so 100% CPU
utilization could be at P0 or at Pn
 There may actually be more performance on the table
 Idle time will include the time taken to return from C-
states (HALT), which could be microseconds
 CPU utilization will include cache warm-up effects if
the cache has been flushed to reach the deepest C-
states
 CPU utilization will include latencies caused by
remote memory being in low-power states
 In particular, AMD and future Intel processors where
memory is socket-attached
CMG‘08 INTERNATIONAL
conference
Lab Issues: OS vs. HW C-
States
 Only three C-states selected by the OS:
 C1: C1 in HW
 C2: lowest power “type 2” C-state reported
by HW
 C3: Cn in HW
 Perfmon shows OS perspective of C-
states
CMG‘08 INTERNATIONAL
conference
Outline
 Motivation
 Background
 Windows Server 2003  2008
 Windows Server 2008 R2
 Power Diagnostics and Control
 Summary
CMG‘08 INTERNATIONAL
conference
Summary
 Windows Server 2008 and 2008 R2 deliver
real energy savings for the data center
 New WS08R2 features deliver enhanced
power efficiency and better manageability
 Improvements to idle and low-to-medium
workload operating efficiency
 Management of power policy via WMI
 Power metering support provides energy
consumption information through Windows
CMG‘08 INTERNATIONAL
conference
Future Work Example:
NonVolatile Memory (NVM)
 Solid State Disk (current server usage)
 Potential additional layer(s) in memory hierarchy
 Cache (a la ReadyBoost)
 DRAM complement
 Very low power when idle
 But low-power DRAM may narrow the gap significantly
 Poor performance of random writes
 Could be improved by coalescing and remapping writes
 Block orientation
 Difficult to use as DRAM complement
 Limited lifetime of Flash cells
 Future NVM technologies may improve on this
CMG‘08 INTERNATIONAL
conference
Call to Action - 1
 Make sure any reduction in server capabilities
is a planned-for and acceptable tradeoff
between power and performance (e.g.)
 TANSTAAFL, Do More With Less
 Reduce idle activity and power consumption
 Validate new platform power management using
Power Efficiency Diagnostics
 ISV/IHV Call to Action for Power: eliminate
activity during workload idle periods in
applications and drivers
 Target average idle period at minimum >100ms
 Provide software with adjustable tradeoffs between
power and performance when appropriate
CMG‘08 INTERNATIONAL
conference
Call To Action - 2
 Build power efficient platforms and
solutions
 Expose complete processor (and memory and
device) information from BIOS
 Ensure drivers and applications work with core
parking enabled
 Speak with Microsoft about creating ACPI-based
power meter and supply devices
 Get the Enhanced Power Management logo
 Review microsoft.com power whitepapers
and presentations
CMG‘08 INTERNATIONAL
conference
The Power of WinHEC
2008!
Power-Performance Benchmarks, AMD, and
Scalable Windows with HP Integrity Servers, HP
CMG‘08 INTERNATIONAL
conference
Additional Resources WDK available with pre-Beta
 Web Resources:
 White papers and presentations at www.microsoft.com (search on “power”)
○ http://www.microsoft.com/whdc (search on “power”)
 Windows Hardware Developer Central – Power Management:
…/whdc/system/pnppwr/
 Processor Power Management in Windows Vista and Windows Server 2008:
…/whdc/system/pnppwr/powermgmt/ProcPowerMgmt.mspx
 ACPI / Power Management: …/whdc/system/pnppwr/powermgmt/default.mspx
 Recommendations for Power Budgeting with Windows Server:
…/whdc/system/pnppwr/powermgmt/Svr_PowerBudget.mspx
 Active State Power Management in Windows Vista: …/whdc/connect/pci/aspm.mspx
○ Windows Server 2008 Power Savings
http://download.microsoft.com/download/4/5/9/459033a1-6ee2-45b3-ae76-
a2dd1da3e81b/Windows_Server_2008_Power_Savings.docx
○ Designing Efficient Background Processes for Windows (Trigger-Start Services):
http://go.microsoft.com/fwlink/?LinkId=128622
 ACPI Specifications: http://www.acpi.info
 80 Plus Program for power supplies: http://www.80plus.org
 Energy Star Power Supply Specification Draft:
http://www.energystar.gov/ia/partners/prod_development/new_specs/downloads/Draft1_Server
_Spec.pdf
E-mail: Server Power Feedback alias srvpwrfb@microsoft.com
CMG‘08 INTERNATIONAL
conference
Sources
 Estimating Total Power Consumption by Servers in the U.S. and the
World – Jonathan G. Koomey, Ph.D.
 http://enterprise.amd.com/Downloads/svrpwrusecompletefinal.pdf
 Bureau of Labor Statistics
 http://data.bls.gov/cgi-bin/cpicalc.pl
 US Energy Information Administration
 http://www.eia.doe.gov/fuelelectric.html
 AFCOM Data Center Institute’s Five Bold Predictions, 2006
 http://www.afcom.com/News_Releases/Afcom_In_The_News_05010601.asp
 Intel Server Products Power Budget Analysis Tool
 http://www.intel.com/support/motherboards/server/sb/cs-016976.htm
 Data center TCO benefits of reduced air flow -- Malone, Vinson, and
Bash
 Various Gartner press releases
 Aperture Research Institute
 EYP Mission Critical Facilities Inc.
 Power In, Dollars Out: How to Stem the Flow in the Data Center
 http://www.microsoft.com/whdc/system/pnppwr/powermgmt/Svr_Pwr_ITAdmin.mspx
CMG‘08 INTERNATIONAL
conference
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
CMG‘08 INTERNATIONAL
conference
CMG‘08 INTERNATIONAL
conference
Servers
$3,000,000Infrastructure
$1,800,000
Power
$1,000,000
3 yr server and 15 yr infrastructure amortization
CMG‘08 INTERNATIONAL
conference
2004 Energy Consumption = ~ 100 quads
2004 Energy Expenditures = ~ $910 billion
0
4000
8000
12000
16000
20000
24000
28000
32000
36000
1940 1950 1960 1970 1980 1990 2000 2010
In d u strial = red
T ran sp o rtatio n = p u rp le
Resid en tial = g reen
Co mmercial = blu e
U .S. Energy C onsumption
1949 - 2004
A ll Fuels (TB TU )
Growing Energy Demand
CMG‘08 INTERNATIONAL
conference
Datacenter Costs Breakdown -
1
CMG‘08 INTERNATIONAL
conference
Electricity Use by End-Use: 2000 - 2006
CMG‘08 INTERNATIONAL
conference
CMG‘08 INTERNATIONAL
conference
CMG‘08 INTERNATIONAL
conference
CMG‘08 INTERNATIONAL
conference
Power Metering
 In the future, servers are likely to have
onboard power meters
 AC power (into the power supply)
 DC power (out of the power supply)
 For individual components (CPU, RAM, IO, fans,
disks, …)
 WS08R2 provides the capability to monitor
such meters, as well as communicate with
power management logic, through
standard Windows and ACPI interfaces
CMG‘08 INTERNATIONAL
conference
Power Metering and
Budgeting WS08R2 introduces the ability to report power consumption
and budgeting information
 Server platform reports this in-band to the OS via ACPI
 No additional drivers are required or HW changes, only platform
support
 Power information is exposed via WMI
 Adheres to the DMTF Power Supply Profile v1.01
 Power budget information is reported to the OS
 Optional support for configuring the budget from within Windows
 Extendable to enable per-device metering
 WDM driver interface available
 Design goals
 Standard hardware and software interfaces
 Native infrastructure, easily extendable
 Leverages existing platform technology
CMG‘08 INTERNATIONAL
conference
Power Metering and
Budgeting
.
.
.
WMI Consumers
Implemented
in WS08R2
CMG‘08 INTERNATIONAL
conference
Based on the DMTF management profiles
New power namespace – rootcimv2power
1) Power supply device
Inventory information
Capabilities/characteristics
Redundancy information
CMG‘08 INTERNATIONAL
conference
2) Power meter device
Inventory information
Capabilities/characteristics
Latest meter measurements
OS-Configurable trip-points
Configurable platform enforced limit
3) Power supply/meter events
Notification for changes in configuration and capabilities
Notification for trip-points crossed and platform limit enforced
CMG‘08 INTERNATIONAL
conference
Statistical/inventory/auditing
Data center can monitor power consumption
across nodes
Administrator can write scripts to control power
policies and receive power condition events
Model can be extended to per-device meters
Another set of metrics for virtualization and
consolidation
CMG‘08 INTERNATIONAL
conference
Standard Windows driver IOCTL interface
Event model based on pending IO requests
(IRPs)
Two separate device interfaces
Consumed by the WMI providers
An alternative to the ACPI implementation
Future direction – potentially consumed by the
kernel power manager
Documented on MSDN
CMG‘08 INTERNATIONAL
conference
Rationale
Works as the abstraction layer to the underlying
platform technology (IPMI, WSMAN, etc.)
Scales across different platforms
Does not require special drivers
Requires only firmware updates
Currently being proposed to the ACPI 4.0
specification
Delegate tasks to the BMC (e.g., rolling average
calculation, polling for events, etc.)
CMG‘08 INTERNATIONAL
conference
Power supply device
Extends the current power source device
Control method to publish capabilities
Power meter device
Similar to control method batteries
A set of control methods to get capabilities
and set configuration parameters, trip points,
and configure hardware enforced limits
Event notification via Notify codes
CMG‘08 INTERNATIONAL
conference
WS08R2 will provide
In-box driver to support power meter device(s)
described in ACPI
In-box IPMI operation region handler as part of the
Microsoft IPMI driver – allowing ACPI control
methods to communicate with IPMI using the KCS
protocol
Format similar to the SMBUS OpRegion
3rd-party IPMI drivers can register OpRegion handler
for other IPMI protocol(s)
Also proposed to ACPI 4.0 specification
CMG‘08 INTERNATIONAL
conference
Architecture Details
CMG‘08 INTERNATIONAL
conference
CMG‘08 INTERNATIONAL
conference
Flash SSD versus HDD (Jun
‘08)
HDD Flash SSD
Endurance (write cycles per bit) 10^12 10^5 (SLC*
)
10^6 (MLC*
)
Cost per byte 1x 2.5x – 25x
Performance : Small random read requests 1x 10 – 100x
Active Power (Watts/byte) 10-20x 1x
Shock Resistance
Non-operating
Operating
100g  200g (2010)
~10g
1500g
100g
Thermal (°C) 5-55 0-70
* SLC – Single Level Cell
* MLC – Multi Level Cell
CMG‘08 INTERNATIONAL
conference
Flash Characteristics (Jun
’08)
 Chip Read 50 MB/s
Write 25 MB/s
Scales with number of chips
 Read Latency 25 μs to start,
100 μs for 2KB “page”
 Write Latency 200 to 300 μs for 2KB “page”
2,000 μs to erase
 Active Power 1-2 Watts for 8 chips + controller
CMG‘08 INTERNATIONAL
conference
SSD High-IOps Workload
TCO
 Decrease TCO for IOps-intensive systems
 IOps bottleneck causes customers to buy spindles
instead of capacity, driving up TCO and
operational complexity (e.g., workload balancing)
 SSDs provide less expensive systems for same
performance targets
 Smaller form factors
CMG‘08 INTERNATIONAL
conference
SSD Performance Concerns
- 1
 Random write perf
 Could be alleviated with next generation of
products
 New technological problems may arise with future
generations (no guarantee that it will stay at same
level)
 Potential bottleneck on erasing/block cleaning
 Mixing workloads creates unexpected
performance characteristics
 Read:write ratio, request sizes, sequentiality
CMG‘08 INTERNATIONAL
conference
SSD Performance Concerns
- 2
 First-pass performance might be better than
steady-state
 When nearing EOL, perf may degrade as blocks
are removed from pool
 Does mapping metadata have be re-
read/initialized after power failure?
 Need enough onboard parallelism to keep
internal serial interfaces from becoming
bottlenecks
 Just like disk arrays, the wrong stripe unit size
can kill perf in an SSD array
CMG‘08 INTERNATIONAL
conference
WS08R2 Enables Improved
Endurance for SSD Technology
 SSD can identify itself differently from HDD in ATA
as defined by ATA8-ACS Identify Word 217:
Nominal media rotation rate
 Reporting non-rotating media will allow WS08R2
to set Defrag off as default; improving device endurance
by reducing writes
CMG‘08 INTERNATIONAL
conference
WS08R2 Enables Optimization
for SSD Technology
 Microsoft implementation of “Trim” feature
 NTFS will send down delete notification to the device
supporting “trim”
○ File system operations: Format, Delete, Truncate, Compression
○ OS internal processes: e.g., Snapshot, Volume Manager
 Three optimization opportunities for the device
 Enhancing device wear leveling by eliminating merge
operation for all deleted data blocks
 Making early garbage collection possible for fast write
 Keeping device’s unused storage area as high as
possible; more room for device wear leveling.
CMG‘08 INTERNATIONAL
conference
Parallelism Tradeoffs
 No one scheme optimal for all workloads
With faster serial connect, intra-
chip ops are less important
CMG‘08 INTERNATIONAL
conference
SSD Performance Trends
Source: a subset of sample data from internal lab
CMG‘08 INTERNATIONAL
conference
SSD Performance Trends
Source: a subset of sample data from internal lab
CMG‘08 INTERNATIONAL
conference
SSD Cost Trends
Source:
CMG‘08 INTERNATIONAL
conference
CMG‘08 INTERNATIONAL
conference
Hyper-V Power
Management
 Full P-state/C-state management already
integrated between Windows root partition
and Hyper-V v1 (WS08)
 Enlightenments, such as timer assist added in
Hyper-V v2 (WS08R2)
 Hypervisor delivers child clocks without requiring
root interaction, plus ITTD
 Core parking enabled for all partitions
CMG‘08 INTERNATIONAL
conference
Overview
Scheduling virtual machines on a single server for
density as opposed to dispersion
This allows “park/sleep” cores by putting them into
deep C states
Benefits
Significantly enhances Green IT by being able to
reduce power required for CPUs
Idle improvements extend to Hyper-V
Significant reduction in platform interrupt activity
Enables power savings and greater scalability
CMG‘08 INTERNATIONAL
conference
Windows Server 2008
16 LP Server
CMG‘08 INTERNATIONAL
conference
WS08R2 Hyper-V Core Parking
16 LP Server
CMG‘08 INTERNATIONAL
conference
CMG‘08 INTERNATIONAL
conference
Hyper-V Power Efficiency
Windows Server Performance Lab
 Testbed Configurations
 Single Workloads
 Web Fundamentals
 SPECpower
 Mixed Workloads
CMG‘08 INTERNATIONAL
conference
Hyper-V Power Efficiency
Workload configuration - 1
 Methodology for obtaining power load line
data for TPC-C, TPC-E, FSCT, and Web
Fundamentals have been demonstrated
 Benchmark loads varied by throttling number of
active users
 Multiple workloads tested in Hyper-V environment
 SPECpower has been successfully tuned
CMG‘08 INTERNATIONAL
conference
Workload Characteristics
 Web Fundamentals (WF)
 Dynamic scenario
 CPU-bound workload
 SPECpower (modified SPECjbb)
 Kit version 1.0
 Java version: JDK 1.6.0_02
 JVM options: -Xms1024m -Xmx1024m -
XXaggressive -XXlargePages -
XXthroughputCompaction -XXcallprofiling -
XXlazyUnlocking -Xgc:genpar -XXgcthreads:2 -
XXtlasize:min=8k,preferred=1024k
CMG‘08 INTERNATIONAL
conference
Hyper-V Power Efficiency
Workload configuration - 2
 Single workloads
 All the guests run the same workload
 Two scenarios:
○ Fixing the number of active guests and scaling the
load in each guest
○ Fixing the load in each guest and activating more
guests
 Mixed workloads
 Half of guests run each workload
○ Fixed load in WF guests (~35% CPU utilization each)
○ Varying load in SPECpower guests
CMG‘08 INTERNATIONAL
conference
HW and SW Test
Configurations Hardware
 2-socket quad-core processors
○ Minimal P-States
 16GB memory: 4x4GB 667MHz DIMMs
 External (wall) power monitor
 Software
 OS: Windows Server 2008
○ OS Power Management: Balanced mode
 Hyper-V v2 (pre-release build)
○ Configured with 8 guests
 Single virtual processor: 3.16GHz
 1.75GB memory
CMG‘08 INTERNATIONAL
conference
Web Fundamentals
Dynamic
 Adding Load to Each Guest
250
270
290
310
330
350
370
0 10000 20000 30000
Watts
Throughput (Requests / Sec)
Throughput and power usage
versus total system utilization
Power usage for various
throughput levels
0
5000
10000
15000
20000
25000
250
270
290
310
330
350
370
0% 50% 100%
Throughput(Requests/Sec)
Watts
System Utilization Percentage
Watts Throughput
For these experiments, the highest system utilization under WF
workload is ~80%. This issue has been subsequently resolved.
CMG‘08 INTERNATIONAL
conference
Web Fundamentals
Dynamic
 Activating Guests - 1
0
5000
10000
15000
20000
25000
250
270
290
310
330
350
370
0% 50% 100%
Throughput(requests/sec)
Watts
System Utilization Percentage
Watts Throughput
Throughput and power usage
versus total system utilization
Power usage for various
throughput levels
•Data points from left to right: 0 guest, 1 guest, 2 guests, …, 8 guests active
•Each active guest tries to run at the maximum load
250
270
290
310
330
350
370
0 10000 20000 30000
Watts
Throughput (requests / sec)
CMG‘08 INTERNATIONAL
conference
Web Fundamentals
Dynamic
 Activating Guests - 2Virtual processor utilizations for
different numbers of active guests
•The maximum utilization of each guest decreases as more guests are
activated. Most of this decrease has been subsequently removed.
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8
VirtualProcessorUtilization(%)
Number of Guests
Guest 1
Guest 2
Guest 3
Guest 4
Guest 5
Guest 6
Guest 7
Guest 8
CMG‘08 INTERNATIONAL
conference
SPECpower
 Adding Load to Each Guest - 1
Throughput and power usage
versus total system utilization
Power usage for various
throughput levels
0
50
100
150
200
250
300
250
270
290
310
330
350
370
390
0% 20% 40% 60% 80% 100%
Throughput(inThousands)
Watts
System Utilization Percentage
Watts Throughput
65%
70%
75%
80%
85%
90%
95%
100%
0% 20% 40% 60% 80% 100%Power(%ofMaxWatts)
Workload (% of maximum throughput)
CMG‘08 INTERNATIONAL
conference
SPECpower
 Adding Load to Each Guest - 2
Average processor frequency for various workload levels
0% 20% 40% 60% 80% 100%
Frequency(MHz)
Workload (% of Max Throughput)
CMG‘08 INTERNATIONAL
conference
SPECpower
 Adding Load to Each Guest - 3
Virtual processor utilizations for
various workload levels
Logical processor utilizations for
various workload levels
•All the guests change load levels concurrently.
•The VM scheduler is biased towards utilizing higher numbered processors.
0
10
20
30
40
50
60
70
80
90
100
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
ActiveIdle
Utilization(%)
Load
Guest 1
Guest 2
Guest 3
Guest 4
Guest 5
Guest 6
Guest 7
Guest 8
0
10
20
30
40
50
60
70
80
90
100
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
ActiveIdle
Utilization(%)
Load
Proc 1
Proc 2
Proc 3
Proc 4
Proc 5
Proc 6
Proc 7
Proc 8
CMG‘08 INTERNATIONAL
conference
SPECpower
 Activating Guests
Throughput and power usage
versus total system utilization
Power usage for various
throughput levels
Similar scalability behavior of power and throughput as when adding load.
0
50
100
150
200
250
300
250
270
290
310
330
350
370
390
0% 20% 40% 60% 80% 100%
Throughput(inThousands)
Watts
System Utilization Percentage
Watts Throughput
60%
65%
70%
75%
80%
85%
90%
95%
100%
0% 20% 40% 60% 80% 100%Power(%ofMaxWatts)
Workload (% of maximum throughput)
CMG‘08 INTERNATIONAL
conference
SPECpower and WF
Mixed Workloads - 1
SPECpower throughput and server power
usage versus total system utilization
Power usage for various
throughput levels
0
20
40
60
80
100
120
140
250
270
290
310
330
350
370
390
0% 20% 40% 60% 80% 100%
Throughput(inThousands)
Watts
System Utilization Percentage
Watts Throughput
60%
65%
70%
75%
80%
85%
90%
95%
100%
0% 20% 40% 60% 80% 100%
Power(%ofWatts) Workload (% of maximum throughput)
•4 Guests running WF (4940 req/sec)
•~25% system utilization; ~35% guest virtual processor utilization
•4 Guests running SPECpower (similar efficiency as single workload)
CMG‘08 INTERNATIONAL
conference
SPECpower and WF
Mixed Workloads - 2
Average processor frequency for various
levels of SPECpower workload
0% 20% 40% 60% 80% 100%
Frequency(MHz)
Workload (% of Max SPECPower Throughput)
CMG‘08 INTERNATIONAL
conference
Hyper-V Power Efficiency
Future Experiments
 More workloads
 Different workload mix scenarios
 Different combinations of fixed and varying
workloads
 More VM configurations
 Multiple virtual processors per guest
 Oversubscription
CMG‘08 INTERNATIONAL
conference
CMG‘08 INTERNATIONAL
conference
Power WMI Provider
 Enables power policy configuration through
standard WMI interface
 Change power setting values
 Activate a given plan
 Conforms to DMTF data model
 To get started…
 Change a power setting: Win32_PowerSetting
 Activate a plan: Win32_Plan.Activate() method
 Attend
for
additional details
CMG‘08 INTERNATIONAL
conference
Configuration and Administration
 WMI interfaces to query and set configuration
settings
 Configuration of systems
 Global administration
 Management applications
 WMI interfaces to query current and hardware
capabilities
 3rd party applications
 Diagnostics
CMG‘08 INTERNATIONAL
conference
TargetSetting = "Microsoft:PowerSetting{3c0bc021-c8a8-4e07-a973-
6b14cbcb2b7e}" 'display blank timeout
Set objWMIService = GetObject("WinMgmts:.rootcimv2power")
Set SettingIndices = objWMIService.ExecQuery(“ASSOCIATORS OF {“ &
chr(34) &
“Win32_PowerSetting.InstanceID=“ & chr(34) & TargetSetting &
chr(34) & “} WHERE ResultClass = Win32_PowerSettingDataIndex”)
For Each SettingIndex in SettingIndices
Set Plan = objWMIService.ExecQuery(“ASSOCIATORS OF {“ & chr(34) &
SettingIndex.InstanceID & “} WHERE ResultClass = Win32_PowerPlan”)
If Plan.IsActive THEN
SettingIndex.SettingIndexValue = 120 ‘2 seconds
SettingIndex.Put_
Plan.Activate()
CMG‘08 INTERNATIONAL
conference
Remote Power
Manageability - 1
 WS08R2 supports the configuration of power
policy via WMI
 Local and remote management via WMI
 Adheres to DMTF conventions for setting data
 Scriptable
 Includes support for reading and writing of all
power plan and setting data
 Active power plan can get changed remotely
 Power Action can be carried out (sending a node
to S3)
CMG‘08 INTERNATIONAL
conference
Remote Power
Manageability - 2
CMG‘08 INTERNATIONAL
conference
Get the Active Plan:
Set objWMIService =
GetObject("WinMgmts:.rootcimv2power")
Set PowerPlans =
objWMIService.InstancesOf("Win32_PowerPlan")
For Each PowerPlan in PowerPlans
If PowerPlan.IsActive Then
wscript.echo "Current Plan: " &
PowerPlan.ElementName
End If
Next
PowerPlan.Activate()
CMG‘08 INTERNATIONAL
conference
Get all power settings in the Active Plan:
(Continued with PowerPlan)
EscapedInstanceID = Replace(PowerPlan.InstanceID, "", "")
Set PowerSettingIndexes = objWMIService.ExecQuery(
"ASSOCIATORS OF {Win32_PowerPlan.InstanceID=" & chr(34) &
EscapedInstanceID & chr(34) & "}")
For Each PowerSettingIndex in PowerSettingIndexes
EscapedInstanceID = Replace(PowerSettingIndex.InstanceID, "", "")
Set PowerSettings = objWMIService.ExecQuery(
"ASSOCIATORS OF {Win32_PowerSettingDataIndex.InstanceID=" & chr(34) &
EscapedInstanceID & chr(34) & "} WHERE ResultClass =
Win32_PowerSetting")
For Each PowerSetting in PowerSettings
wscript.echo “Power Setting: “ & PowerSetting.InstanceID
wscript.echo “Description: “ & PowerSetting.Description
wscript.echo “Index Value: “ & PowerSettingIndex.SettingIndexValue

Contenu connexe

En vedette

HR Analytics: New Insights and New Capabilities?
HR Analytics: New Insights and New Capabilities?HR Analytics: New Insights and New Capabilities?
HR Analytics: New Insights and New Capabilities?Lewis Garrad
 
fuentes del derecho tributario y tributos
fuentes del derecho tributario y tributosfuentes del derecho tributario y tributos
fuentes del derecho tributario y tributosnereidapp
 
Cómo Potenciar tu negocio y optimizar tus campañas en Facebook
Cómo Potenciar tu negocio y optimizar tus campañas en FacebookCómo Potenciar tu negocio y optimizar tus campañas en Facebook
Cómo Potenciar tu negocio y optimizar tus campañas en FacebookSocial Ocho
 
Real-Time rt PCR, Dr mohamed ibrahim
Real-Time rt PCR, Dr mohamed ibrahimReal-Time rt PCR, Dr mohamed ibrahim
Real-Time rt PCR, Dr mohamed ibrahimMohamed Ibrahim Azzam
 
La justicia de paz
La justicia de pazLa justicia de paz
La justicia de pazLuis Hera
 
Windows server power_efficiency___robben_and_worthington__final
Windows server power_efficiency___robben_and_worthington__finalWindows server power_efficiency___robben_and_worthington__final
Windows server power_efficiency___robben_and_worthington__finalBruce Worthington
 
El Positivismo
El PositivismoEl Positivismo
El PositivismoLuis Hera
 
Derecho financiero
Derecho financieroDerecho financiero
Derecho financierocorderomp
 
Fuentes del Derecho Financiero y su relación con otras ramas del Derecho
Fuentes del Derecho Financiero y su relación con otras ramas del DerechoFuentes del Derecho Financiero y su relación con otras ramas del Derecho
Fuentes del Derecho Financiero y su relación con otras ramas del DerechoDiyerson Moreno
 
Deepak malhotra plan_of_hr
Deepak malhotra plan_of_hrDeepak malhotra plan_of_hr
Deepak malhotra plan_of_hrmalhotrad1973
 

En vedette (17)

HR Analytics: New Insights and New Capabilities?
HR Analytics: New Insights and New Capabilities?HR Analytics: New Insights and New Capabilities?
HR Analytics: New Insights and New Capabilities?
 
fuentes del derecho tributario y tributos
fuentes del derecho tributario y tributosfuentes del derecho tributario y tributos
fuentes del derecho tributario y tributos
 
Cómo Potenciar tu negocio y optimizar tus campañas en Facebook
Cómo Potenciar tu negocio y optimizar tus campañas en FacebookCómo Potenciar tu negocio y optimizar tus campañas en Facebook
Cómo Potenciar tu negocio y optimizar tus campañas en Facebook
 
Real-Time rt PCR, Dr mohamed ibrahim
Real-Time rt PCR, Dr mohamed ibrahimReal-Time rt PCR, Dr mohamed ibrahim
Real-Time rt PCR, Dr mohamed ibrahim
 
La justicia de paz
La justicia de pazLa justicia de paz
La justicia de paz
 
Windows server power_efficiency___robben_and_worthington__final
Windows server power_efficiency___robben_and_worthington__finalWindows server power_efficiency___robben_and_worthington__final
Windows server power_efficiency___robben_and_worthington__final
 
Bicentenario
BicentenarioBicentenario
Bicentenario
 
Higiene industrial
Higiene industrialHigiene industrial
Higiene industrial
 
Justicia de paz
Justicia de pazJusticia de paz
Justicia de paz
 
El Positivismo
El PositivismoEl Positivismo
El Positivismo
 
Derecho financiero
Derecho financieroDerecho financiero
Derecho financiero
 
HR analytics
HR analyticsHR analytics
HR analytics
 
La Potestad Tributaria
La Potestad TributariaLa Potestad Tributaria
La Potestad Tributaria
 
Fuentes del Derecho Financiero y su relación con otras ramas del Derecho
Fuentes del Derecho Financiero y su relación con otras ramas del DerechoFuentes del Derecho Financiero y su relación con otras ramas del Derecho
Fuentes del Derecho Financiero y su relación con otras ramas del Derecho
 
Deepak malhotra plan_of_hr
Deepak malhotra plan_of_hrDeepak malhotra plan_of_hr
Deepak malhotra plan_of_hr
 
Pmp quality chapter 8
Pmp quality chapter 8Pmp quality chapter 8
Pmp quality chapter 8
 
Menulis Pantun
Menulis PantunMenulis Pantun
Menulis Pantun
 

Similaire à CMG'08 International Conference Server Power Optimization

Variable Frequency Drives
Variable Frequency DrivesVariable Frequency Drives
Variable Frequency DrivesAvanceon-Lahore
 
Variable Frequency Drives
Variable Frequency DrivesVariable Frequency Drives
Variable Frequency DrivesAvanceon-Lahore
 
PowerAndPerformance
PowerAndPerformancePowerAndPerformance
PowerAndPerformancePartha Kundu
 
Energy Efficiency in Large Scale Systems
Energy Efficiency in Large Scale SystemsEnergy Efficiency in Large Scale Systems
Energy Efficiency in Large Scale SystemsJerry Sheehan
 
New IGBT Gate Driver Software
New IGBT Gate Driver SoftwareNew IGBT Gate Driver Software
New IGBT Gate Driver SoftwareJuan Munoz
 
Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...
Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...
Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...IBMAsean
 
Sample Power Audit Results
Sample Power Audit ResultsSample Power Audit Results
Sample Power Audit ResultsSandeep Changula
 
Performance Optimization for Brushless DC Motor Using DRV8312 Evaluation Board
Performance Optimization for Brushless DC Motor Using DRV8312 Evaluation BoardPerformance Optimization for Brushless DC Motor Using DRV8312 Evaluation Board
Performance Optimization for Brushless DC Motor Using DRV8312 Evaluation BoardIRJET Journal
 
Competitive Analysis Dell Open Manage Power Center vs IBM Active Energy Manager
Competitive Analysis Dell Open Manage Power Center vs IBM Active Energy ManagerCompetitive Analysis Dell Open Manage Power Center vs IBM Active Energy Manager
Competitive Analysis Dell Open Manage Power Center vs IBM Active Energy ManagerDavid Jenkins
 
Biến tần yaskawa d1000 - Biến tần hiệu suất cao và tiết kiệm năng lượng.
Biến tần yaskawa d1000 - Biến tần hiệu suất cao và tiết kiệm năng lượng.Biến tần yaskawa d1000 - Biến tần hiệu suất cao và tiết kiệm năng lượng.
Biến tần yaskawa d1000 - Biến tần hiệu suất cao và tiết kiệm năng lượng.Long Trump
 
How to reduce energy consumption of pumping systems in mining by up to 30%
How to reduce energy consumption of pumping systems in mining by up to 30%How to reduce energy consumption of pumping systems in mining by up to 30%
How to reduce energy consumption of pumping systems in mining by up to 30%Schneider Electric
 
Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010Dileep Bhandarkar
 
Automatic power factor controller by microcontroller
Automatic power factor controller by microcontrollerAutomatic power factor controller by microcontroller
Automatic power factor controller by microcontrollerSanket Shitole
 
Estrategias para ahorro de energía en applicaciones de misión crítica de IT
Estrategias para ahorro de energía en applicaciones de misión crítica de ITEstrategias para ahorro de energía en applicaciones de misión crítica de IT
Estrategias para ahorro de energía en applicaciones de misión crítica de ITData Center Consultores
 
Presentation to dm as november 2007 with dynamic provisioning information
Presentation to dm as   november 2007 with dynamic provisioning informationPresentation to dm as   november 2007 with dynamic provisioning information
Presentation to dm as november 2007 with dynamic provisioning informationxKinAnx
 
IRJET- Speed Control of BLDC Motor using PID Tuned Fuzzy Controller
IRJET-  	  Speed Control of BLDC Motor using PID Tuned Fuzzy ControllerIRJET-  	  Speed Control of BLDC Motor using PID Tuned Fuzzy Controller
IRJET- Speed Control of BLDC Motor using PID Tuned Fuzzy ControllerIRJET Journal
 
ARC's Larry O'Brien Process Automation Presentation @ ARC Industry Forum 2010
ARC's Larry O'Brien Process Automation Presentation @ ARC Industry Forum 2010ARC's Larry O'Brien Process Automation Presentation @ ARC Industry Forum 2010
ARC's Larry O'Brien Process Automation Presentation @ ARC Industry Forum 2010ARC Advisory Group
 

Similaire à CMG'08 International Conference Server Power Optimization (20)

Variable Frequency Drives
Variable Frequency DrivesVariable Frequency Drives
Variable Frequency Drives
 
Variable Frequency Drives
Variable Frequency DrivesVariable Frequency Drives
Variable Frequency Drives
 
PowerAndPerformance
PowerAndPerformancePowerAndPerformance
PowerAndPerformance
 
Energy Efficiency in Large Scale Systems
Energy Efficiency in Large Scale SystemsEnergy Efficiency in Large Scale Systems
Energy Efficiency in Large Scale Systems
 
New IGBT Gate Driver Software
New IGBT Gate Driver SoftwareNew IGBT Gate Driver Software
New IGBT Gate Driver Software
 
Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...
Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...
Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...
 
Sample Power Audit Results
Sample Power Audit ResultsSample Power Audit Results
Sample Power Audit Results
 
Performance Optimization for Brushless DC Motor Using DRV8312 Evaluation Board
Performance Optimization for Brushless DC Motor Using DRV8312 Evaluation BoardPerformance Optimization for Brushless DC Motor Using DRV8312 Evaluation Board
Performance Optimization for Brushless DC Motor Using DRV8312 Evaluation Board
 
Competitive Analysis Dell Open Manage Power Center vs IBM Active Energy Manager
Competitive Analysis Dell Open Manage Power Center vs IBM Active Energy ManagerCompetitive Analysis Dell Open Manage Power Center vs IBM Active Energy Manager
Competitive Analysis Dell Open Manage Power Center vs IBM Active Energy Manager
 
Biến tần yaskawa d1000 - Biến tần hiệu suất cao và tiết kiệm năng lượng.
Biến tần yaskawa d1000 - Biến tần hiệu suất cao và tiết kiệm năng lượng.Biến tần yaskawa d1000 - Biến tần hiệu suất cao và tiết kiệm năng lượng.
Biến tần yaskawa d1000 - Biến tần hiệu suất cao và tiết kiệm năng lượng.
 
Hitec Brochure QPS English_LR
Hitec Brochure QPS English_LRHitec Brochure QPS English_LR
Hitec Brochure QPS English_LR
 
How to reduce energy consumption of pumping systems in mining by up to 30%
How to reduce energy consumption of pumping systems in mining by up to 30%How to reduce energy consumption of pumping systems in mining by up to 30%
How to reduce energy consumption of pumping systems in mining by up to 30%
 
Delta ia pq-apf2000_c_en_20160118
Delta ia pq-apf2000_c_en_20160118Delta ia pq-apf2000_c_en_20160118
Delta ia pq-apf2000_c_en_20160118
 
Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010
 
Automatic power factor controller by microcontroller
Automatic power factor controller by microcontrollerAutomatic power factor controller by microcontroller
Automatic power factor controller by microcontroller
 
Estrategias para ahorro de energía en applicaciones de misión crítica de IT
Estrategias para ahorro de energía en applicaciones de misión crítica de ITEstrategias para ahorro de energía en applicaciones de misión crítica de IT
Estrategias para ahorro de energía en applicaciones de misión crítica de IT
 
Emerson Energy Logic
Emerson Energy LogicEmerson Energy Logic
Emerson Energy Logic
 
Presentation to dm as november 2007 with dynamic provisioning information
Presentation to dm as   november 2007 with dynamic provisioning informationPresentation to dm as   november 2007 with dynamic provisioning information
Presentation to dm as november 2007 with dynamic provisioning information
 
IRJET- Speed Control of BLDC Motor using PID Tuned Fuzzy Controller
IRJET-  	  Speed Control of BLDC Motor using PID Tuned Fuzzy ControllerIRJET-  	  Speed Control of BLDC Motor using PID Tuned Fuzzy Controller
IRJET- Speed Control of BLDC Motor using PID Tuned Fuzzy Controller
 
ARC's Larry O'Brien Process Automation Presentation @ ARC Industry Forum 2010
ARC's Larry O'Brien Process Automation Presentation @ ARC Industry Forum 2010ARC's Larry O'Brien Process Automation Presentation @ ARC Industry Forum 2010
ARC's Larry O'Brien Process Automation Presentation @ ARC Industry Forum 2010
 

Dernier

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 

Dernier (20)

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 

CMG'08 International Conference Server Power Optimization

  • 1. CMG‘08 INTERNATIONAL conference Dr. Bruce Worthington Principal Software Development Lead Windows Server Performance Microsoft Corporation
  • 2. CMG‘08 INTERNATIONAL conference Server Power Ground Rules  TANSTAAFL: Everything is a trade-off  Performance, Power, Functionality, Capacity, Cost, Reliability, Availability, Manageability, Maintainability, Usability, Environmental Impact, Lifetime, Footprint, Security, Morale  Saving Power Power Efficiency  More work at fixed power level, or  Less power at fixed work level  Shifting component power efficiencies
  • 3. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  Windows Server 2003  2008  Windows Server 2008 R2  Power Diagnostics and Control  Summary
  • 4. CMG‘08 INTERNATIONAL conference Rising Cost of Ownership  From 2000 to 2006  Computing performance:  25x  Energy efficiency:  8x  US electricity cost:  1.35x  Power per $1K of server:  4x  Server(+) world electricity: >2x ○ >1% of total world production  Datacenters use 2% of all US electricity
  • 5. CMG‘08 INTERNATIONAL conference Scale: Kilowatts  Megawatts  Idle high-performance servers  50-80% of max power draw  2-sockets ~ 250 W  4-sockets ~ 500 W  8-sockets ~ 1000 W  25 15Krpm 2.5” disks + SAN = 3U  ~ 300/450 W (idle/active)  10,000 2-socket 1U servers ~ 1-3 MW  Datacenter “container” ~ 0.5 MW  ~1500 servers + storage + infrastructure
  • 6. CMG‘08 INTERNATIONAL conference Datacenter Energy Demand  Data centers are energy intensive facilities  Server racks now designed to carry 25 kW load  Surging demand for data storage  Typical facility ~ 1MW, can be > 20 MW (even 200 MW)  Nationally 1.5% of US Electricity consumption in 2006 ○ Doubling every 5 years  Significant data center building boom,  Power and cooling constraints in existing facilities  Growing demand for compute cycles  Growing computing performance  Commoditized hardware  Declining cost of computing
  • 7. CMG‘08 INTERNATIONAL conference 15 MW Datacenter Monthly Costs “Good” (PUE=1.7) Internet-scale datacenter with DAS Servers $3,000,000Infrastructure $1,800,000 Power $1,000,000 3 yr server and 15 yr infrastructure amortization
  • 8. CMG‘08 INTERNATIONAL conference Air Movement 12% Electricity Transformer/ UPS 10% Lighting, etc. 3% Cooling 25% IT Equipment 50% Source: EYP Mission Critical Facilities Inc., New York Other than a common power source they are not connected. Datacenter Costs Breakdown - 2
  • 11. CMG‘08 INTERNATIONAL conference Environmental Impact  Governments, businesses, and organizations are trying to reduce the production of greenhouse gases  New EPA Energy Star mandates for enterprise server power efficiencies
  • 12. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  ACPI Power States  Component Power  Windows Server 2003  2008  Windows Server 2008 R2  Power Diagnostics and Control  Summary
  • 13. CMG‘08 INTERNATIONAL conference ACPI Power State Definitions  Performance states (P-states)  Dynamic voltage and frequency scaling  More than linear savings (cubic function)  Throttle states (T-states)  Linear scaling of CPU clock  “Power” states (C-states)  Low-power idle (CPU “sleep”) states  Turn off increasing amounts of silicon in package  System sleep states (S-states)  On, standby, hibernate, off  MS has not encouraged S-state support for servers ○ Changing with the increased focus on power
  • 14. CMG‘08 INTERNATIONAL conference ACPI Power State State Machine • For entire system ○ Global System States (G-States) ○ Sleeping States (S-States)  Standby (S1), Hibernate (S2), …  For processor only  Processor Performance States (P- States) ○ Different processor frequency and voltage  Processor Throttling States (T-States) ○ Processor clock throttling to reduce processor utilization (and capacity)  Processor Power States (C-States) ○ Processor is executing instructions (C0) ○ Processor is idle (C1, C2, …)  Other devices  Device Power States (D-States) ○ Similar as C-States, but are for devices other than processors G3 -Mech Off Legacy Wake Event G0 (S0) - Working G1 - Sleeping S4 S3 S2 S1 Power Failure/ Power Off G2 (S5) - Soft Off BIOS Routine C0 D0 D1 D2 D3 Modem D0 D1 D2 D3 HDD D0 D1 D2 D3 CDROM C2 C1 Cn Performance State Px Throttling C0 CPU
  • 15. CMG‘08 INTERNATIONAL conference ACPI Specification Versions  WS03 complies with ACPI 2.0  WS08 complies with ACPI 3.0  Multiprocessor ○ Dependent (ganged) and independent control ○ Independent control w/ dependent behavior (may transition or not based on other processors’ states)  MS has some ideas for ACPI 3.5
  • 16. CMG‘08 INTERNATIONAL conference ACPI Power State Dependencies  Dependency Domains for ACPI power states (assumes S0)  Logical processors in the same domain should have the same C-state, P-state, or T-state  No dependence between a processor’s C-state domain, P-state domain, or T-state domain  OS control mechanisms based on dependency relationships  Dependent control: Transitioning one processor to a new state causes other processor(s) to transition to the same state  Independent control: Transitioning one processor to a new P-state or T-state does not affect other processors’ power states  Independent control, dependent behavior: Transitioning one processor to a new P-state or T-state may or may not transition other processor(s) to the same state based on the current state of the other processor()s that share this relationship
  • 17. CMG‘08 INTERNATIONAL conference P-States  Windows processor performance states are enabled by default  Power policy allows flexible use of performance states  Values for min / max processor speed  Expressed as a percentage of maximum processor frequency  Windows will round up to the nearest available state  Processor- and workload-dependent impact  E.g., one system configuration was determined to have insignificant perf impact from capping P-states at P1, but significant power savings
  • 18. CMG‘08 INTERNATIONAL conference Power policy will always use DBS between the range defined by min / max frequency Full range or subset of available P-states Policy may be set to use only one performance state (min / max / intermediate) Will not include linear clock throttle states
  • 19. CMG‘08 INTERNATIONAL conference Example: Processor state power policy Note: This is the default policy in WS08 Intended to minimize performance hit State Freq % Type 0 2800 100 Performance 1 2520 90 Performance 2 2142 85 Performance 3 1607 75 Performance 4 964 60 Performance 5 482 50 Performance Maximum Processor State Minimum Processor State
  • 20. CMG‘08 INTERNATIONAL conference P-State Policy Settings  Example: Processor state power policy  Using a subset of available states  Can use any contiguous range  Some performance loss (may not be significant) unless P0 included (targets minimal perf hit) State Freq % Type 0 2800 100 Performance 1 2520 90 Performance 2 2142 85 Performance 3 1607 75 Performance 4 964 60 Performance 5 482 50 Performance Maximum Processor State Minimum Processor State
  • 21. CMG‘08 INTERNATIONAL conference Example: Processor state power policy Locking processor at one state Any available state may be selected Some performance loss (may not be significant) unless P0 is the state chosen (a la High Perf mode) State Freq % Type 0 2800 100 Performance 1 2520 90 Performance 2 2142 85 Performance 3 1607 75 Performance 4 964 60 Performance 5 482 50 Performance Min & Max Processor State
  • 23. CMG‘08 INTERNATIONAL conference Use Perfmon to Monitor P-State Processor Performance / % of Max Frequency
  • 24. CMG‘08 INTERNATIONAL conference Linear clock throttle states (T-states) Compared to P-states, T-states do not save energy when performing identical workloads However, throttle states may be useful for some scenarios (thermal overload) By default, WS08 uses T-states only if P- states are unavailable or in case of thermal overload No DBS: only the Maximum Processor State parameter is used
  • 25. CMG‘08 INTERNATIONAL conference Default use of linear throttle states Performance is directly affected by throttling State Freq % Type 0 2800 100 Performance 1 2520 90 Performance 2 2380 85 Performance 3 2100 75 Performance 4 1680 60 Performance 5 1400 50 Performance 6 1400 50 Throttle 7 1120 40 Throttle 8 840 30 Throttle 9 560 20 Throttle DBS Allowed No DBS Allowed
  • 26. CMG‘08 INTERNATIONAL conference Power Capping / Budgeting  Enforcing per-server power limits (static or dynamic)  Calculations based on “plate rating” are often over-configured ○ Stranded capacity  OS may not be able to respond fast enough to enforce hard limits when power spikes  Typically lower-power P-states attempted, then T-states engaged as necessary ○ OS might not get a good estimate of the resulting effective frequency ○ Monitoring applications and diagnostic tools may give incorrect data ○ Opposite strategy from OS, where P-states move towards higher performance modes when load increases  Potentially huge (and potentially unexpected) hit in performance right when it is most vital ○ Sudden hardware throttling should be last resort
  • 27. CMG‘08 INTERNATIONAL conference C-States  Although hardware may support more than 3 C-states, Windows only utilizes a maximum of 3. But that doesn’t mean Windows only uses the first three hardware C-states:  C1 = hardware C1  C2 = hardware C? ○ Lowest-power consuming c-state with _CST of type 2  C3 = hardware Cn  Wouldn’t expect P-state to affect C-state power, but it does on some processors  WS08R2 handles this by providing the capability to drop to Pn before transitioning to C-state
  • 28. CMG‘08 INTERNATIONAL conference Processor Power Management - 1  CPUs have increasing number and ranges of P-states and C-states  Ballpark expectations per socket:  A few watts per P-state  Tens of watts for lowest C-state(s)  Varying impact to server throughput and responsiveness  Mature, reliable technology  Significant deployments in mobile and desktops
  • 29. CMG‘08 INTERNATIONAL conference Processor Power Management - 2  No user intervention required  Managed by the operating system  Balances power savings with CPU utilization  Kernel selects target P-state based on processor utilization history, Windows power policies, thread scheduler, system heuristics, node/socket/HW thread hierarchy  Transition processor to “sleep” C-states when idle (i.e., no thread to run on that processor)
  • 30. CMG‘08 INTERNATIONAL conference Processor Power Management - 3  Windows’ power policy includes various parameters that influence how the kernel chooses target power states  Low voltage/power processors must be evaluated and targeted for the right scenarios  Reduces OS power management flexibility  Additional servers are required if the workload is CPU-bottlenecked
  • 31. CMG‘08 INTERNATIONAL conference Hardware Support  The correctness of all PPM tools and settings relies on accurate hardware / firmware support  Broken BIOSes found in some previous-generation servers  Reporting ○ Initialization of ACPI tables (e.g., power states, memory and I/O controller locations) ○ P-state and C-state monitoring  Controlling ○ PPM algorithm depends on correct historical information ○ HW should comply/cooperate with OS power state requests
  • 32. CMG‘08 INTERNATIONAL conference Processor Power Management Working together with OEMs/IHVs - 1  Hardware must support PPM capabilities  ACPI namespace must describe capabilities and contain processor objects  On a processor there may be multiple independently- managed power planes, potentially shared between components, such as:  Cores, Caches, Memory Controllers, and Bus/Serial interface(s) to other processors or IO components  The performance impacts of turning off various pieces of silicon must be carefully weighed and understood ○ Snooping caches must be flushed before being shut down ○ Memory or IO channels attached to a package must still be accessible by other packages ○ Bus/Serial interfaces must be running for active caches, memory, or IO ○ Different components have different power-up delays from the various power states they support
  • 33. CMG‘08 INTERNATIONAL conference Collaborative Power Budgeting  Ideal WS08R2 strategy  Platform guarantees operation within the allocated budget (HW Fail-safe)  OS scales power/perf according to workload and respects platform notifications  New R2 Beta option: OS specifies target utilization and HW selects P-states accordingly  Otherwise, if the OS and HW are fighting for power management control, both power and performance will suffer  Hardware-directed power control settings are on by default in some BIOSes
  • 34. CMG‘08 INTERNATIONAL conference Servers Defaulting to Hardware- Controlled Power Mgmt  Hardware-directed power control settings are on by default in some BIOSes  Platform alters P-states, C-states, T-states, and/or D- states without OS information ○ One alternative is to have platform dynamically restrict the available states and update the OS via ACPI (<= 2 Hz)  May take over processor performance counters! ○ Obviously this is a big concern when using performance monitoring tools that utilize the on-CPU counters
  • 35. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  ACPI Power States  Component Power  Windows Server 2003  2008  Windows Server 2008 R2  Power Diagnostics and Control  Summary
  • 36. CMG‘08 INTERNATIONAL conference Component Power Metering • Only a small set of server models provide the functionality of component power reporting • Extra HW instrumentation (or fragile probing) is needed to monitor the component power usages for most platforms • Simplest alternative is to populate and then take away any removable components and track the overall system power delta
  • 37. CMG‘08 INTERNATIONAL conference Example Component Power Distribution #1 Idle 3-Year-Old 4-Socket Single-Core Server
  • 38. CMG‘08 INTERNATIONAL conference Example Component Power Distribution #2 Idle 4-Socket Quad-Core Server
  • 39. CMG‘08 INTERNATIONAL conference Example Component Power Distribution #3 CPU (2) 46% PCI Cards (3) 17% SCSI HDD (4) 12% Mobo, 8GB RAM 18% Other 7% Processor power management represents the best opportunity today Source: Intel Server Products Power Budget Analysis Tool http://www.intel.com/support/motherboards/server/sb/cs-016976.htm
  • 40. CMG‘08 INTERNATIONAL conference Selecting Memory Components  Lots of permutations for a given capacity  Family (e.g., DDR#) ○ FB DIMMs draw more power  DIMM count ○ Especially for FB, where bus may decrease frequency if enough DIMMs  Bus frequencies  Ranks  Density  Data width  Channel count  Low power memory must be evaluated and targeted for the right scenarios  Additional servers are required if the workload is memory- bottlenecked
  • 41. CMG‘08 INTERNATIONAL conference Memory Power Savings  Select the right type and number of DIMMs for the workload  Reduce memory accesses  Overall ○ Smaller working set ○ Better cache hit ratios ○ Probably better performance, too  More memory power states  Compare server memory idle characteristics to mobile memory  Deeper self-refresh states ○ Takes memory longer to come out of deeper states
  • 42. CMG‘08 INTERNATIONAL conference “Green Memory” Tech Marking Datarate Capacity Density DQ Ranks Power/ DIMM DDR2 PC2-5300 667Mhz 1GB 256Mb x4 DR 18.1W DDR2 PC2-5300 667Mhz 1GB 256Mb x8 QR 18.6W DDR2 PC2-5300 667Mhz 1GB 512Mb x4 SR 7.6W DDR2 PC2-5300 667Mhz 1GB 512Mb x8 DR 7.8W DDR2 ECC 667Mhz 1GB 1Gb x16 DR 6.1W DDR2 No ECC 667Mhz 1GB 1Gb x16 DR 5.5W No "by 16" part with 4Gb density DDR2 PC2-5300 667Mhz 4GB 1Gb x4 DR 14.0W DDR2 PC2-5300 667Mhz 4GB 1Gb x8 QR 14.4W DDR2 PC2-5300 667Mhz 4GB 2Gb x4 SR 8.6W DDR2 PC2-5300 667Mhz 4GB 2Gb x8 DR 8.8W
  • 43. CMG‘08 INTERNATIONAL conference Networking Power  NIC idle power (examples)  100 Mb 1 W  1Gb 5 W  Quad 1Gb 5-9 W  10Gb 10-15 W  Quad 10Gb 17 W  Don’t forget network switch power  Windows Networking Optimizations  NDIS DPC timer period  Wake-on-LAN (see content in WinHEC 2008)  Low Power on Network Disconnect
  • 44. CMG‘08 INTERNATIONAL conference Hard Disk Power  Decreasing radius  Cubic power relationship (Power~Radius3)  3.5” 15K RPM drive = ~12/18 W (idle/active)  2.5” 15K RPM drive = ~6/9 W (idle/active)  Decreasing rotational speed  Quintic power relationship (Power~RPM5)  15K RPM = 2 ms avg rotational delay (serial workload)  10K RPM = 3 ms avg (~3-4 W idle)  7.2K RPM = 4 ms avg (may have slower seek as well)  Frequently spinning down enterprise drives not advisable (yet)
  • 45. CMG‘08 INTERNATIONAL conference Storage Controller Power  HBA / storage connection interface  E.g., PCI-X and PCI-e cards  5-8W idle  Array Controller  E.g., small SAN ctlr (2U) = 200/300 W (idle/active in direct attached mode)  Disk Interface  SCSI: 80  160  320 GB/s  FC: 1  2  4  8 Gb/s  SAS/SATA: 1.5  3.0  6.0Gb/s
  • 46. CMG‘08 INTERNATIONAL conference PCI-Express Power Management  Support for Active State Power Management (ASPM)  a k a, Link State Power Management  In-box power policy for ASPM state  Requires OS control of PCI Express features  Available white paper
  • 47. CMG‘08 INTERNATIONAL conference Power Supply Efficiency  Power Factor: phase delta between input voltage and current  Active Power Factor Correction (PFC)  Ratio of input:output power (ACDC)  Entropy means 100% efficiency is unobtainable  Default supplies at 70%; new models up to 85%  Previous power supplies were often optimized for high workload levels, but most servers run at 5-20% of capacity (for now)  Decreases power without decreasing perf
  • 48. CMG‘08 INTERNATIONAL conference Power Supply Efficiency “80 Plus”  Requirement for Energy Star (July ‘08)  80% minimum efficiency at 20%, 50%, and 100% of rated output  Previous power supplies often optimized for high loads, but most servers run at 5-20%  Minimum power factor of 0.9 or greater at 100% of rated output  Decrease power without decreasing perf
  • 49. CMG‘08 INTERNATIONAL conference Power Supply Waste Power Efficiency Output Power Required Input Power Waste Power Waste Power Cost per Annum 70 (default) 500 W 714 W 214W $183.15 80 (near 80 plus Bronze) 500 W 625W 125W $106.98 85 (80 plus silver) 500 W 588 88W $75.31 90(above 80 plus gold) 500 W 555 55W $47.07
  • 50. CMG‘08 INTERNATIONAL conference Fan Power  Fans in some 1U servers consume 15- 20% of overall system power  Fixed vs. variable-speed fans  Decrease power without decreasing perf
  • 51. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  Windows Server 2003  2008  Overview  Server Power Measurements  Windows Server 2008 R2  Power Diagnostics and Control  Summary
  • 52. CMG‘08 INTERNATIONAL conference Windows Server 2003  ACPI 2.0 compliant  Windows processor driver required for specific CPU make/model  Requires selecting appropriate power policy  Each system power policy includes a processor throttling policy  Highest (default), lowest, or full range of P-states  OEMs or server administrators may create additional power plans
  • 53. CMG‘08 INTERNATIONAL conference Windows Server 2008 - 1  ACPI 2.0 and 3.0 compliant  Native OS support for PPM on multiprocessor systems  Default power settings refined for each release (including WS08R2)  Windows Server 2008 & SP2  Simplified configuration model  Group Policy over power settings  Power management enabled by default (“Balanced Mode”)
  • 54. CMG‘08 INTERNATIONAL conference Power Plans Power Plans Min P-state Max P-state Balanced 5% 100% Power Saver 5% 50% High Performance 100% 100%
  • 55. CMG‘08 INTERNATIONAL conference Windows Server 2008 - 2  T-states used only when no P-states available  Power management parameterization for improved flexibility of P- and T-state algorithms  Additional tunings available for OEMs to customize to processor, chipset, platform, role, etc.  Improved C3 support  Very hard to generalize, but 2-10% improvement in power efficiency observed at mid-to-low utilization levels (vs. 2003)
  • 56. CMG‘08 INTERNATIONAL conference Processor Power Management Windows Server Releases Fully supported by WS03, WS08, and WS08R2  Feature parity with Windows client operating systems  For example, WS08 has full support for: ○ ACPI 2.0, 3.0 processor objects, Notify() events ○ Power policy for tuning Operating System (OS) target state algorithms ○ Deep idle C-states
  • 57. CMG‘08 INTERNATIONAL conference Default Power Parameters - 1 * = May not appear in Control Panel options by default PPM parameters Non-PPM parameters
  • 58. CMG‘08 INTERNATIONAL conference Default Power Parameters - 2 How frequent Change P-state or not How to change
  • 59. CMG‘08 INTERNATIONAL conference Default Power Parameters - 3 Entry idle, promote only Deep idle, demote only
  • 60. CMG‘08 INTERNATIONAL conference Idle Improvement Techniques  Shut down unnecessary services, applications, roles, devices, drivers  Avoid polling and spinning in tight loops  Avoid high-res periodic timers (<10 ms)
  • 61. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  Windows Server 2003  2008  Overview  Server Power Measurements  Windows Server 2008 R2  Power Diagnostics and Control  Summary
  • 62. CMG‘08 INTERNATIONAL conference Measuring Power  Few existing Windows servers are equipped with comprehensive power metering capabilities  In the future, servers are likely to have onboard power meters ○ AC power (into the power supply) ○ DC power (out of the power supply) ○ For individual components (CPU, RAM, IO, fans, disks, …)  The Windows Server Performance team has resorted to two strategies:  Metering at the wall (AC)  Directly probing specially manufactured server motherboards (solder and data acquisition)
  • 63. CMG‘08 INTERNATIONAL conference Measuring Power Efficiency Which Watts/Amps to measure?  Total server (wall) power  External power  Network switches/hubs  Storage (disks, array controllers, SANs)  Power distribution and conditioning  HVAC  Internal component power  Processor package ○ Threads, cores, caches, memory controllers, cross-package interconnect controllers, IO controllers (e.g., PCI-E)  Memory (controllers, DIMMs, ranks, banks)  Chipsets (north bridge, south bridge, IO controllers)  Power supplies ○ AC in, multiple DC out ○ Redundant (active/active, active/passive)  IO (network, storage, video, USB) ○ Embedded components and expansion cards  Fans and other internal misc.
  • 64. CMG‘08 INTERNATIONAL conference Measuring Power Efficiency  Traditional performance benchmarks optimize for high throughput or low response time by using all resources  The load line approach tracks power use as load varies  Pick a power point and see how much load can be handled  Pick a load point and see how much power is required  Workload breadth  Database, web server, file server, etc.  MS uses SPECpower (a la SPECjbb) and is adding customer-accepted performance benchmarks  TPC-C/E/H, SpecWeb, NetBench, SAP, SPEC, …  Semi-internal: FSCT, LCW2, Web Fundamentals, TermSrv, PerfGates, …
  • 65. CMG‘08 INTERNATIONAL conference Measuring Power Efficiency Which workloads to test?  Workload breadth  Database, web server, file server, etc. ○ Need to prioritize based on potential for power savings and for broadest customer coverage  Each has unique “work accomplished” metrics (e.g., ops per second)  Industry standard workloads, such as SPEC and TPC  Custom workloads designed to test power scenarios  Microsoft is currently using SPECpower and customer- accepted performance benchmarks to convey power efficiency  TPC-C/E/H, SpecWeb, NetBench, SAP, SPEC, …  Semi-internal: FSCT, LCW2, Web Fundamentals, TermSrv, PerfGates, …
  • 66. CMG‘08 INTERNATIONAL conference Industry Standard Workloads SPEC  SPECpower is the only standardized benchmark at this point ○ Single workload defined to date  Order processing for a wholesale supplier running typical Java business applications  Basically SPECjbb with some changes  Minimal I/O and kernel time ○ Other SPEC benchmarks could have a “power” version, and each one may or may not be modified from the “perf” version  TPC  Could add a power metric to each of their existing benchmarks, but details are still being worked out ○ What is server power vs. storage power? ○ What needs to be installed in the audited server?  I suspect they will stick to the same approach used for pricing, in that the system has to be available as a purchasable product  What about the “price” of power? ○ Etc.
  • 67. CMG‘08 INTERNATIONAL conference Measuring Power Efficiency Windows Server Performance Lab  Methodology for obtaining power load line data for TPC-C, TPC-E, FSCT, and Web Fundamentals have been demonstrated  Benchmark loads varied by throttling number of active users  Multiple workloads tested in Hyper-V environment  SPECpower has been successfully tuned  Data has been gathered on 2-, 4-, and 8-socket systems with various processors  Wall-socket power measurements  Component power measurement by brute force (device extraction)
  • 68. CMG‘08 INTERNATIONAL conference Varying Load Levels 68 Iteration SPECpower (Reduce load) TPC-E (Reduce users) FSCT (Increase users) 1 100% load 100% of max users 0 users 2 90% load ~90% of max users 10% of max users 3 80% load ~80% of max users 20% of max users 4 70% load ~70% of max users 30% of max users 5 60% load ~60% of max users 40% of max users 6 50% load ~50% of max users 50% of max users 7 40% load ~40% of max users 60% of max users 8 30% load ~30% of max users 70% of max users 9 20% load ~20% of max users 80% of max users 10 10% load ~10% of max users 90% of max users 11 0% load 0 users 100% of max users Similar strategy used for Web Fundamentals
  • 70. CMG‘08 INTERNATIONAL conference HW and SW Test Configurations  Sample platforms  2-socket and 4-socket quad-core  8-socket dual-core  x64 (AMD and Intel); ia64  Hardware- and software-controlled power management modes  WS03, WS08, WS08SP2 (prerelease), and WS08R2 (prerelease)  Windows power schemes  Balanced, Higher Performance, Power Saver, …  P-State settings and heuristics  C-State settings and heuristics  Parameterized power management optimizations ○ E.G., core parking, tick skipping
  • 71. CMG‘08 INTERNATIONAL conference SPECpower: WS03 and WS08 60% 70% 80% 90% 100% 0% 20% 40% 60% 80% 100% Power(%ofMaxWatts) Workload (% of Max ssj_opts) W2K3.SP1 W2K8.RTM W2K8.SP2 2 sockets, 8 cores total
  • 72. CMG‘08 INTERNATIONAL conference SPECpower & FSCT: WS03 and WS08 SPECpower throughput and power at different workload levels on a 4-socket quad-core system 60% 65% 70% 75% 80% 85% 90% 95% 100% 0% 20% 40% 60% 80% 100% Power(%ofmaximumwatts) Workload (% of maximum throughput) Windows Server 2003 Windows Server 2008 60% 65% 70% 75% 80% 85% 90% 95% 100% 0% 20% 40% 60% 80% 100% Power(%ofMaximumwatts) Workload (% of maximum throughput) Windows Server 2003 Windows Server 2008 FSCT throughput and power at different workload levels on a 2-socket dual-core system
  • 73. CMG‘08 INTERNATIONAL conference TPC-E: WS03 and WS08 TPC-E power usage at varying workload levels TPC-E power efficiency (tpsE/Watt) at varying workload levels 70% 75% 80% 85% 90% 95% 100% 0% 20% 40% 60% 80% Watts(%ofmaximum) Workload (% of maximum tpsE) Windows Server 2003 Windows Server 2008 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% tpsE/Watt(%ofmaximum) Workload (% of maximum tpsE) Windows Server 2003 Windows Server 2008
  • 74. CMG‘08 INTERNATIONAL conference OOB Windows Server 2008 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% ssj_opsperWatt(%ofMaximum) Power(%ofMaximum) Workload (% of max ssj_ops) power ssj_ops per Watt SPECpower throughput (ssj_ops) and power at varying workload levels Processor utilization and frequency as SPECpower workload decreases over time 70% 75% 80% 85% 90% 95% 100% 0% 20% 40% 60% 80% 100% 0 10 20 30 40 50 60 70 ProcessorFrequency(%ofMaximum) AverageProcessorUtilization Time (minutes) with decreasing workload Processor Utilization Processor Frequency Time (min)
  • 75. CMG‘08 INTERNATIONAL conference TPC-E: Windows Server 2008 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 30 40 50 60 70 80 90 100 110 120 130 140 CumulativeP-StateDistribution Time (minutes) Distribution of P-States as workload decreases over time P0 P1 P2 P3 P4 C1
  • 76. CMG‘08 INTERNATIONAL conference 4 quad-core CPUs, 16 GB, RAID-5 array Measured Projected Server Config Active Clients Avg Watts kWh / yr Cost Kg of CO2 WS03, IIS6 0 468 4100 375 3190 WS08, IIS7 0 457 4000 357 3110 WS03, IIS6 20 537 4700 430 3660 WS08, IIS7 20 500 4380 401 3410
  • 77. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  Windows Server 2003  2008  Windows Server 2008 R2  Server Energy Vision  Idle Power Optimizations  Core Parking  Hyper-V (v2)  Power Metering and Budgeting  SSD  Power Diagnostics and Control  Summary
  • 78. CMG‘08 INTERNATIONAL conference Windows Server Core Energy Vision  Dynamic Data Center  Coordination across all data center components to scale infrastructure and computing according to business needs  Scalable Node: Server power efficiency  Low idle power consumption  Power consumption should scale with load
  • 79. CMG‘08 INTERNATIONAL conference Dynamic Data Center  Holistic approach spanning all infrastructure not just the computing nodes  Reducing waste and optimizing performance  Scaling and migrating workloads  Coordination with power and cooling systems  Watch out for over-eager workload consolidation or low-power component acquisition  Building platform and management infrastructure
  • 80. CMG‘08 INTERNATIONAL conference Dynamic Data Center – The Problem  Addressing energy consumption in the data center requires a holistic approach spanning all infrastructure not just the computing nodes  Many factors affect how a data center consumes energy  Hardware, workload, time of day/week/year, locality, etc.  Data centers are generally statically configured for peak load  Tremendous opportunities for reducing waste and optimizing performance exist  Scaling and migrating workloads across groups of machines  Coordination with power and cooling systems  Opportunities also exist for unexpected reduction in computing capacity through over-eager workload consolidation or low- power component acquisition without proper planning / testing
  • 81. CMG‘08 INTERNATIONAL conference Dynamic Data Center – The Vision  Enable the management of aggregate servers in conjunction with data center infrastructure  Deliver this through building platform and management infrastructure  Power metering and budgeting  Virtualization and workload migration  Standards-based management technologies  Coordination between in-band and out-of- band management systems
  • 82. CMG‘08 INTERNATIONAL conference Scalable Node  Today power consumption does not scale in line with server utilization  Typical commodity servers consume 50-70% of the maximum power when completely idle  Basic approaches: ○ Increase server utilization via virtualization ○ Reduce power when full performance not needed ○ Power down / put to sleep excess servers  Work with partners to provide the best power and performance by managing the system efficiently  Windows power management improvements
  • 83. CMG‘08 INTERNATIONAL conference Scalable Node – The Problem  Today power consumption does not scale in line with server utilization  Typical commodity servers consume 50-70% of the maximum power when completely idle ○ Idle servers have low efficiency due to high idle power ○ Efficiency rises with utilization due to idle power amortization  Tremendous opportunities exist for reducing energy needs ○ Reduce power when full performance is not required ○ Leverage virtualization solutions to increase server utilization ○ Power down servers when they are not needed
  • 84. CMG‘08 INTERNATIONAL conference Scalable Node – The Vision  Work with partners to provide the best power and performance by managing the system efficiently  Deliver this through improvements to Windows Power Management  Build on existing infrastructure and extend Windows value  Enhancements to processor power management  Focus on idle and low-to-medium workload levels  Support for device performance states
  • 85. CMG‘08 INTERNATIONAL conference Windows Server 2008 R2 - 1  Refined “Balanced Mode” defaults to optimize power efficiency  Takes advantage of advances in server platform hardware (e.g., powering down individual cores or sockets)  Configurable power settings for new features (e.g., core parking)  P-state and C-state selection algorithms updated  Increased support for joint OS/HW power management
  • 86. CMG‘08 INTERNATIONAL conference Windows Server 2008 R2 - 2  Simplified configuration model  Group Policy control over all power settings  Rich command line interface and refined UI elements  In-band WMI power metering and budgeting support  Remote manageability of power policy via WMI  Additional qualification logo to indicate enhanced power management support
  • 87. CMG‘08 INTERNATIONAL conference Windows Device Power Management  Extensible power policy infrastructure  Allows easy incorporation of power management-enabled devices ○ Device power settings integrate with Windows system power policy ○ Device power settings can appear in Advanced power UI ○ Rich notification support  Allows for true OEM power management innovation and value
  • 88. CMG‘08 INTERNATIONAL conference Enhanced Power Management Logo  Additional Qualification logo for “Enhanced Power Management” that indicates support for the following:  Processor power management through Windows  Power metering and budgeting  Power On/Off via WS-Management (SMASH)
  • 89. CMG‘08 INTERNATIONAL conference Windows Server 2008 P-State Parameters Balanced Mode Settings WS08 R2 Pre-Beta WS08R2 Time Check 100 ms 100 ms 50 ms Increase Time 100 ms 100 ms 50 ms Decrease Time 300 ms 100 ms 50 ms Increase Percentage 30% 70% 80% Decrease Percentage 50% 30% 70% Domain Accounting Policy 0 (On) Always Off Always Off Increase Policy IDEAL (0) IDEAL (0) SINGLE (1) Decrease Policy SINGLE (1) SINGLE (1) IDEAL (0)
  • 90. CMG‘08 INTERNATIONAL conference Optimized for Low-to-Medium Loads  Even though 100% utilization may have the highest power efficiency, few servers run at full capacity  Servers at maximum utilization provide less opportunities for power optimizations  In the short term, targeting low utilization servers will provide most benefit  In medium term, targeting medium utilization servers will provide increased benefit  E.g, consolidation and virtualization will increase average utilization levels
  • 91. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  Windows Server 2003  2008  Windows Server 2008 R2  Server Energy Vision  Idle Power Optimizations  Core Parking  Hyper-V (v2)  Power Metering and Budgeting  SSD  Power Diagnostics and Control  Summary
  • 92. CMG‘08 INTERNATIONAL conference Get Idle; Stay Idle  Shut down unnecessary services, applications, roles, devices, drivers  Avoid polling and spinning in tight loops  Avoid high-res periodic timers (<10 ms)  Timer Coalescing  Intelligent Timer Tick Distribution (ITTD)  Use NUMA-based affinity for threads and interrupts  Thread (via APIs and tools): soft (IdealProc), hard (affinity mask)  Interrupts (via IntPolicy.exe)  Idle improvements extend to Hyper-V  Significant reduction in platform interrupt activity  Enables power savings and greater scalability
  • 93. CMG‘08 INTERNATIONAL conference Timer Coalescing  Platform energy efficiency can be improved by extending idle periods  New timer coalescing API enables callers to specify a tolerance for due time  Enables the kernel to expire multiple timers at the same time  Extensions should integrate with WS08R2 API/DDI
  • 94. CMG‘08 INTERNATIONAL conference Intelligent Timer Tick Distribution (Tick Skipping)  Extend processor sleep states by not waking the CPU unnecessarily  CPU 0 handles the periodic system timer tick; other processors are signaled as necessary  Non-timer interrupts will still wake sleeping processors  Not available on IA64  Only enabled on systems with more C-states than just C1
  • 95. CMG‘08 INTERNATIONAL conference Background Process Management  Background activity on the macro scale (minutes, hours) is also important for power  E.g., disk defragmentation, AV scans  Prevents low-power idle and sleep modes  Will collapsing multiple background activities result in a significantly heavier load during that interval and thus potentially impede concurrent foreground activity?  Unified Background Process Manager (UBPM)  New WS08R2 infrastructure  Drives scheduling of services and scheduled tasks  Transparent to users, IT pros, and existing APIs  Enables trigger-starting services  Delivers usage data and metrics to Microsoft via CEIP
  • 96. CMG‘08 INTERNATIONAL conference UBPM: Trigger-Start Services  Many services configured to Autostart and wait for rare events  UBPM enables Trigger-Start services based on environmental changes  Device arrival/removal, IP address change, domain join, etc.  Examples ○ Bluetooth service is started only if a Bluetooth radio is currently attached ○ BitLocker encryption service started only when new volumes detected  ISV Call to Action  Leverage trigger-start capability for value-add services  Validate performance impact with XPerf tools ○ Performance impact can be positive or negative
  • 97. CMG‘08 INTERNATIONAL conference Coordinated Processor Clocking Control  New processor performance state interface described via ACPI  Feature enables OS and HW platform coordination of processor power management  Platform is in direct control of T-states and P-states  OS dynamically specifies processor performance requirements on per-processor basis as a percentage of maximum frequency  Platform is responsible for delivering requested performance ○ In some cases, like a power budget condition, the platform may underdeliver, but must report this
  • 98. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  Windows Server 2003  2008  Windows Server 2008 R2  Server Energy Vision  Idle Power Optimizations  Core Parking  Hyper-V (v2)  Power Metering and Budgeting  SSD  Power Diagnostics and Control  Summary
  • 99. CMG‘08 INTERNATIONAL conference Processor Core Parking  This is a Windows scheduler optimization, not HW!  Goals  Save power on multi-processor systems by dynamically scaling number of active cores to match workload  Drop parked cores into deepest C-states  Approach  Use historical information to predict future workload  Calculate number of cores needed  Heuristically select the “unparked” cores  Monitoring  Perfmon and ETW
  • 100. CMG‘08 INTERNATIONAL conference Processor (Logical) Core Parking  Logical core = HW thread (e.g., Intel® Hyperthreading)  Extension of Windows’ processor performance state engine  Configurable via power policy settings  Parking may reduce performance, depending on the parameter settings, by reducing OS responsiveness to rising load levels  Parking could improve performance by concentrating work onto a smaller number of cores
  • 101. CMG‘08 INTERNATIONAL conference Selecting Cores to Park - 1  WS08R2 (Beta) approach:  Leave one logical core unparked per NUMA node  Other possible approaches, including customizable minimum unparked entities  Park entire packages at once  Park logical cores individually, regardless of packages  Leave one logical core unparked per socket  Leave one logical core unparked per physical core  Affinitized activity does tend to unpark logical cores that must be used (selection heuristic)  Beta tracks affinitized threads, not DPCs / Interrupts
  • 102. CMG‘08 INTERNATIONAL conference Selecting Cores to Park - 2  Parking algorithm takes many inputs. At a minimum:  Time since the last parking decision was made  Average frequencies of each core over the last time interval  Average CPU “utilization” over the last time interval  Possible additional inputs depending on parameter setting and final WS08R2 refinements: ○ Power state domains (i.e., groups of associated cores) ○ Current processor P-States ○ P-State change rate policies (SINGLE, ROCKET, IDEAL) ○ Affinitized DPCs / Interrupts ○ Time spent in affinitized activity ○ More comprehensive or longer historical information ○ More system component topology information
  • 103. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  Windows Server 2003  2008  Windows Server 2008 R2  Server Energy Vision  Idle Power Optimizations  Core Parking  Hyper-V (v2)  Power Metering and Budgeting  SSD  Power Diagnostics and Control  Summary
  • 104. CMG‘08 INTERNATIONAL conference Hyper-V Power Management  Full P-state/C-state management already integrated between Windows root partition and Hyper-V v1 (WS08)  Enlightenments added in Hyper-V v2 (WS08R2)  Hypervisor delivers child clocks without requiring root interaction, plus Intelligent Timer Tick Distribution (to children)  Core parking enabled for all partitions
  • 105. CMG‘08 INTERNATIONAL conference Web Fundamentals Dynamic: WS08 0 5000 10000 15000 20000 25000 250 270 290 310 330 350 370 0% 50% 100% Throughput(Reqs/Sec) Watts System Utilization Percentage Watts Throughput For these experiments, the highest system utilization under WF workload is ~80%. This issue has been subsequently resolved. 0 5000 10000 15000 20000 25000 250 270 290 310 330 350 370 0% 50% 100% Throughput(Reqs/sec) Watts System Utilization Percentage Watts Throughput Adding load to each guest Adding guests
  • 106. CMG‘08 INTERNATIONAL conference SPECpower: WS08 0 50 100 150 200 250 300 250 270 290 310 330 350 370 390 0% 20% 40% 60% 80% 100% Throughput(inThousands) Watts System Utilization Percentage Watts Throughput 0 50 100 150 200 250 300 250 270 290 310 330 350 370 390 0% 20% 40% 60% 80% 100% Throughput(inThousands) Watts System Utilization Percentage Watts Throughput Adding load to each guest Adding guests
  • 107. CMG‘08 INTERNATIONAL conference SPECpower + WF (WS08) SPECpower throughput and server power usage versus total system utilization Power usage for various throughput levels 0 20 40 60 80 100 120 140 250 270 290 310 330 350 370 390 0% 20% 40% 60% 80% 100% Throughput(inThousands) Watts System Utilization Percentage Watts Throughput 60% 65% 70% 75% 80% 85% 90% 95% 100% 0% 20% 40% 60% 80% 100%Power(%ofWatts) Workload (% of maximum throughput) •4 Guests running WF (4940 requests/sec) •~25% system utilization; ~35% guest virtual processor utilization •4 Guests running SPECpower (similar efficiency as single workload)
  • 108. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  Windows Server 2003  2008  Windows Server 2008 R2  Server Energy Vision  Idle Power Optimizations  Core Parking  Hyper-V (v2)  Power Metering and Budgeting  SSD  Power Diagnostics and Control  Summary
  • 109. CMG‘08 INTERNATIONAL conference Power Metering and Budgeting  In the future, servers are likely to have onboard power meters  AC power (into the power supply)  DC power (out of the power supply)  For components (CPU, RAM, IO, fans, disks, …)  WS08R2 provides the capability to monitor such meters, as well as communicate with power management logic, through standard Windows and ACPI interfaces  Power budget information is reported to OS  Optional support for configuring the budget from within Windows
  • 110. CMG‘08 INTERNATIONAL conference Power Metering and Budgeting System Center . . . WMI Consumers WMI Namespace rootcimv2power Power Supply class Power Meter class Power Meter Events User-mode Power Service Power WMI providers Standard Windows IOCTL interface In-box ACPI-based implementation Vendors provide ACPI code in firmware Other vendor specific implementations… Implemented in WS08R2 BMC hardware Admin scripts Hardware Management tools
  • 111. CMG‘08 INTERNATIONAL conference Power Metering and Budgeting WS08R2 introduces the ability to report power consumption and budgeting information  Server platform reports this in-band to the OS via ACPI  No additional drivers are required or HW changes, only platform support  Power information is exposed via WMI  Adheres to the DMTF Power Supply Profile v1.01  Power budget information is reported to the OS  Optional support for configuring the budget from within Windows  Extendable to enable per-device metering  WDM driver interface available  Design goals  Standard hardware and software interfaces  Native infrastructure, easily extendable  Leverages existing platform technology
  • 112. CMG‘08 INTERNATIONAL conference Power Metering and Budgeting – Usage  Statistical/inventory/auditing  Data center can monitor power consumption across nodes  Administrator can write scripts to control power policies and receive power condition events  Model can be extended to per-device meters  Another set of metrics for virtualization and consolidation
  • 113. CMG‘08 INTERNATIONAL conference Power Metering and Budgeting – WDM  Standard Windows driver IOCTL interface  Event model based on pending IO requests (IRPs)  Two separate device interfaces  Consumed by the WMI providers  An alternative to the ACPI implementation  Future direction – potentially consumed by the kernel power manager  Documented on MSDN
  • 114. CMG‘08 INTERNATIONAL conference Power Metering and Budgeting – ACPI  Rationale  Works as the abstraction layer to the underlying platform technology (IPMI, WSMAN, etc.)  Scales across different platforms  Does not require special drivers  Requires only firmware updates  Currently being proposed to the ACPI 4.0 specification  Delegate tasks to the BMC (e.g., rolling average calculation, polling for events, etc.)
  • 115. CMG‘08 INTERNATIONAL conference Power Metering and Budgeting – ACPI  Power supply device  Extends the current power source device  Control method to publish capabilities  Power meter device  Similar to control method for batteries  A set of control methods to get capabilities and set configuration parameters, trip points, and configure hardware enforced limits  Event notification via Notify codes
  • 116. CMG‘08 INTERNATIONAL conference Power Metering and Budgeting – ACPI  WS08R2 will provide  In-box driver to support power meter device(s) described in ACPI  In-box IPMI operation region handler as part of the Microsoft IPMI driver – allowing ACPI control methods to communicate with IPMI using the KCS protocol ○ Format similar to the SMBUS OpRegion ○ 3rd-party IPMI drivers can register OpRegion handler for other IPMI protocol(s) ○ Also proposed to ACPI 4.0 specification
  • 117. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  Windows Server 2003  2008  Windows Server 2008 R2  Server Energy Vision  Idle Power Optimizations  Core Parking  Hyper-V (v2)  Power Metering and Budgeting  SSD  Power Diagnostics and Control  Summary
  • 118. CMG‘08 INTERNATIONAL conference WS08R2 Enables Improved Endurance for SSD Technology SSD can identify itself differently from HDD in ATA as defined through ATA8-ACS Identify Word 217: Nominal media rotation rate  Reporting non-rotating media will allow WS08R2 to set Defrag off as default; improving device endurance by reducing writes
  • 119. CMG‘08 INTERNATIONAL conference WS08R2 Enables Optimization for SSD Technology  Microsoft implementation of “Trim” feature  NTFS will send down delete notification to the device supporting “trim” ○ File system operations: Format, Delete, Truncate, Compression ○ OS internal processes: e.g., Snapshot, Volume Manager  Three optimization opportunities for the device  Enhancing device wear leveling by eliminating merge operation for all deleted data blocks  Making early garbage collection possible for fast write  Keeping device’s unused storage area as high as possible; more room for device wear leveling.
  • 120. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  Windows Server 2003  2008  Windows Server 2008 R2  Power Diagnostics and Control  Perfmon/Resmon  Powertst  Powercfg  Summary
  • 121. CMG‘08 INTERNATIONAL conference Check Processor ACPI States: System Event ID 4
  • 122. CMG‘08 INTERNATIONAL conference Kernel Debugger !ppmperf Provides P-state and T-state information !ppmidle Provides C-state information
  • 123. CMG‘08 INTERNATIONAL conference Monitoring Power Status - 1  System Event Log: ID 4  Perfmon/Logman  Processor ○ Provide average C-state information  % C1/2/3 Time and C1/2/3 Transactions/sec  Processor Information ○ Parking status  Processor Performance ○ Only present if P-states are exposed ○ Provide current P-state information (e.g., avg freq)  Resource Monitor  CPU % Max Frequency average and graph
  • 127. CMG‘08 INTERNATIONAL conference Monitoring Power Status - 2  ETW tracing (Windows Perf Tool Kit)  Xperf –on power  Pwrtest.exe  Logs use of P-, T-, and C-states  Pwrtest /ppm ○ Sampling P-state and C-state performance  Pwrtest /ppm /live ○ Event driven logging for all the P-state and C- state transactions
  • 128. CMG‘08 INTERNATIONAL conference Pwrtest.exe /info:ppm C:Program FilesMicrosoft PwrTest> pwrtest /info:ppm PROCESSOR_POWER_INFORMATION CPU Number = 0 MaxMhz = xxxx CurrentMhz = yyyy MhzLimit = zzzz MaxIdleState = M CurrentIdleState = N InstanceName: CPU Model X (continued) Processor Performance States PerfStates: Max Transition Latency: xx us Number of States: yy State Speed (Mhz) Type 0 aaaa (100%) Perf 1 bbbb ( ss%) Perf 2 cccc ( tt%) Perf 3 dddd ( uu%) Throttle 4 eeee ( vv%) Throttle 5 ffff ( ww%) Throttle
  • 129. CMG‘08 INTERNATIONAL conference Pwrtest.exe in Logging Mode - 1 C:Program FilesMicrosoft PwrTest> pwrtest /ppm Elapsed Idle C1 C2 C3 P- Freq Freq Perf/ Cpu [ms] [%] [%] [%] [%] State [%] [MHz] Throttle --- ------- ---- --- --- --- ----- ---- ----- -------- 0 5007 98 0 73 26 2 54 1000 P 1 5007 99 0 93 6 2 54 1000 P 0 10014 97 0 72 27 2 54 1000 P 1 10014 97 0 91 8 2 54 1000 P 0 15021 88 1 0 0 2 54 1000 P 1 15021 89 1 0 0 2 54 1000 P 0 20028 99 0 0 100 2 54 1000 P
  • 130. CMG‘08 INTERNATIONAL conference Pwrtest.exe in Logging Mode - 2 C:Program FilesMicrosoft PwrTest> pwrtest /ppm /live Waiting for PPM Events. Press 'Ctrl-C' to quit... Timestamp Proc# Event Information ------------------------------------------------------------------------------- 21:27:41.133 0 Idle State Demotion (Old:2, New:1, Affinity:0x1) 21:27:41.133 1 Idle State Demotion (Old:2, New:1, Affinity:0x2) 21:27:41.196 1 Perf State Change (State:0, Speed:1833 Mhz) 21:27:41.196 1 Domain Perf State Change (State:0, Speed:1833 Mhz, Affinity:0x3) 21:27:41.196 0 Idle State Demotion (Old:1, New:0, Affinity:0x1)
  • 132. CMG‘08 INTERNATIONAL conference Power Controls: Powercfg.exe  Configure power settings within a specific power scheme (WS03+)  WS08R2: Detect common energy efficiency problems (via /ENERGY flag)  USB device selective suspend  Processor Power Management (PPM)  Inefficient power policy settings  Platform timer resolution  Platform firmware problems  …and more
  • 133. CMG‘08 INTERNATIONAL conference Configure power setting within a specific power scheme Set AC, DC values for individual settings Every power setting belongs to a Subgroup -setdcvalueindex used for battery scenario C:> powercfg.exe –setacvalueindex <SCHEME> <SUBGROUP> <SETTING> <VALUE> C:> powercfg.exe –setacvalueindex SCHEME_BALANCED SUB_SLEEP STANDBYIDLE 0
  • 134. CMG‘08 INTERNATIONAL conference Power Efficiency Diagnostics  “Powercfg /ENERGY” to start tracing  Close open applications and documents first  Inbox with WS08R2 only  Leverages new inbox ETW instrumentation  Advanced users can run utility and view HTML output  Automatically executed when the system is idle [Win7]  Reports data to Microsoft via Customer Experience Improvement Program (CEIP)  Attend for demo and details
  • 137. CMG‘08 INTERNATIONAL conference Lab Issues: Processor Utilization is Based on Non-Idle Wall Time Idle == idle loop or HALT  It doesn’t take frequency into account, so 100% CPU utilization could be at P0 or at Pn  There may actually be more performance on the table  Idle time will include the time taken to return from C- states (HALT), which could be microseconds  CPU utilization will include cache warm-up effects if the cache has been flushed to reach the deepest C- states  CPU utilization will include latencies caused by remote memory being in low-power states  In particular, AMD and future Intel processors where memory is socket-attached
  • 138. CMG‘08 INTERNATIONAL conference Lab Issues: OS vs. HW C- States  Only three C-states selected by the OS:  C1: C1 in HW  C2: lowest power “type 2” C-state reported by HW  C3: Cn in HW  Perfmon shows OS perspective of C- states
  • 139. CMG‘08 INTERNATIONAL conference Outline  Motivation  Background  Windows Server 2003  2008  Windows Server 2008 R2  Power Diagnostics and Control  Summary
  • 140. CMG‘08 INTERNATIONAL conference Summary  Windows Server 2008 and 2008 R2 deliver real energy savings for the data center  New WS08R2 features deliver enhanced power efficiency and better manageability  Improvements to idle and low-to-medium workload operating efficiency  Management of power policy via WMI  Power metering support provides energy consumption information through Windows
  • 141. CMG‘08 INTERNATIONAL conference Future Work Example: NonVolatile Memory (NVM)  Solid State Disk (current server usage)  Potential additional layer(s) in memory hierarchy  Cache (a la ReadyBoost)  DRAM complement  Very low power when idle  But low-power DRAM may narrow the gap significantly  Poor performance of random writes  Could be improved by coalescing and remapping writes  Block orientation  Difficult to use as DRAM complement  Limited lifetime of Flash cells  Future NVM technologies may improve on this
  • 142. CMG‘08 INTERNATIONAL conference Call to Action - 1  Make sure any reduction in server capabilities is a planned-for and acceptable tradeoff between power and performance (e.g.)  TANSTAAFL, Do More With Less  Reduce idle activity and power consumption  Validate new platform power management using Power Efficiency Diagnostics  ISV/IHV Call to Action for Power: eliminate activity during workload idle periods in applications and drivers  Target average idle period at minimum >100ms  Provide software with adjustable tradeoffs between power and performance when appropriate
  • 143. CMG‘08 INTERNATIONAL conference Call To Action - 2  Build power efficient platforms and solutions  Expose complete processor (and memory and device) information from BIOS  Ensure drivers and applications work with core parking enabled  Speak with Microsoft about creating ACPI-based power meter and supply devices  Get the Enhanced Power Management logo  Review microsoft.com power whitepapers and presentations
  • 144. CMG‘08 INTERNATIONAL conference The Power of WinHEC 2008! Power-Performance Benchmarks, AMD, and Scalable Windows with HP Integrity Servers, HP
  • 145. CMG‘08 INTERNATIONAL conference Additional Resources WDK available with pre-Beta  Web Resources:  White papers and presentations at www.microsoft.com (search on “power”) ○ http://www.microsoft.com/whdc (search on “power”)  Windows Hardware Developer Central – Power Management: …/whdc/system/pnppwr/  Processor Power Management in Windows Vista and Windows Server 2008: …/whdc/system/pnppwr/powermgmt/ProcPowerMgmt.mspx  ACPI / Power Management: …/whdc/system/pnppwr/powermgmt/default.mspx  Recommendations for Power Budgeting with Windows Server: …/whdc/system/pnppwr/powermgmt/Svr_PowerBudget.mspx  Active State Power Management in Windows Vista: …/whdc/connect/pci/aspm.mspx ○ Windows Server 2008 Power Savings http://download.microsoft.com/download/4/5/9/459033a1-6ee2-45b3-ae76- a2dd1da3e81b/Windows_Server_2008_Power_Savings.docx ○ Designing Efficient Background Processes for Windows (Trigger-Start Services): http://go.microsoft.com/fwlink/?LinkId=128622  ACPI Specifications: http://www.acpi.info  80 Plus Program for power supplies: http://www.80plus.org  Energy Star Power Supply Specification Draft: http://www.energystar.gov/ia/partners/prod_development/new_specs/downloads/Draft1_Server _Spec.pdf E-mail: Server Power Feedback alias srvpwrfb@microsoft.com
  • 146. CMG‘08 INTERNATIONAL conference Sources  Estimating Total Power Consumption by Servers in the U.S. and the World – Jonathan G. Koomey, Ph.D.  http://enterprise.amd.com/Downloads/svrpwrusecompletefinal.pdf  Bureau of Labor Statistics  http://data.bls.gov/cgi-bin/cpicalc.pl  US Energy Information Administration  http://www.eia.doe.gov/fuelelectric.html  AFCOM Data Center Institute’s Five Bold Predictions, 2006  http://www.afcom.com/News_Releases/Afcom_In_The_News_05010601.asp  Intel Server Products Power Budget Analysis Tool  http://www.intel.com/support/motherboards/server/sb/cs-016976.htm  Data center TCO benefits of reduced air flow -- Malone, Vinson, and Bash  Various Gartner press releases  Aperture Research Institute  EYP Mission Critical Facilities Inc.  Power In, Dollars Out: How to Stem the Flow in the Data Center  http://www.microsoft.com/whdc/system/pnppwr/powermgmt/Svr_Pwr_ITAdmin.mspx
  • 147. CMG‘08 INTERNATIONAL conference © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  • 150. CMG‘08 INTERNATIONAL conference 2004 Energy Consumption = ~ 100 quads 2004 Energy Expenditures = ~ $910 billion 0 4000 8000 12000 16000 20000 24000 28000 32000 36000 1940 1950 1960 1970 1980 1990 2000 2010 In d u strial = red T ran sp o rtatio n = p u rp le Resid en tial = g reen Co mmercial = blu e U .S. Energy C onsumption 1949 - 2004 A ll Fuels (TB TU ) Growing Energy Demand
  • 156. CMG‘08 INTERNATIONAL conference Power Metering  In the future, servers are likely to have onboard power meters  AC power (into the power supply)  DC power (out of the power supply)  For individual components (CPU, RAM, IO, fans, disks, …)  WS08R2 provides the capability to monitor such meters, as well as communicate with power management logic, through standard Windows and ACPI interfaces
  • 157. CMG‘08 INTERNATIONAL conference Power Metering and Budgeting WS08R2 introduces the ability to report power consumption and budgeting information  Server platform reports this in-band to the OS via ACPI  No additional drivers are required or HW changes, only platform support  Power information is exposed via WMI  Adheres to the DMTF Power Supply Profile v1.01  Power budget information is reported to the OS  Optional support for configuring the budget from within Windows  Extendable to enable per-device metering  WDM driver interface available  Design goals  Standard hardware and software interfaces  Native infrastructure, easily extendable  Leverages existing platform technology
  • 158. CMG‘08 INTERNATIONAL conference Power Metering and Budgeting . . . WMI Consumers Implemented in WS08R2
  • 159. CMG‘08 INTERNATIONAL conference Based on the DMTF management profiles New power namespace – rootcimv2power 1) Power supply device Inventory information Capabilities/characteristics Redundancy information
  • 160. CMG‘08 INTERNATIONAL conference 2) Power meter device Inventory information Capabilities/characteristics Latest meter measurements OS-Configurable trip-points Configurable platform enforced limit 3) Power supply/meter events Notification for changes in configuration and capabilities Notification for trip-points crossed and platform limit enforced
  • 161. CMG‘08 INTERNATIONAL conference Statistical/inventory/auditing Data center can monitor power consumption across nodes Administrator can write scripts to control power policies and receive power condition events Model can be extended to per-device meters Another set of metrics for virtualization and consolidation
  • 162. CMG‘08 INTERNATIONAL conference Standard Windows driver IOCTL interface Event model based on pending IO requests (IRPs) Two separate device interfaces Consumed by the WMI providers An alternative to the ACPI implementation Future direction – potentially consumed by the kernel power manager Documented on MSDN
  • 163. CMG‘08 INTERNATIONAL conference Rationale Works as the abstraction layer to the underlying platform technology (IPMI, WSMAN, etc.) Scales across different platforms Does not require special drivers Requires only firmware updates Currently being proposed to the ACPI 4.0 specification Delegate tasks to the BMC (e.g., rolling average calculation, polling for events, etc.)
  • 164. CMG‘08 INTERNATIONAL conference Power supply device Extends the current power source device Control method to publish capabilities Power meter device Similar to control method batteries A set of control methods to get capabilities and set configuration parameters, trip points, and configure hardware enforced limits Event notification via Notify codes
  • 165. CMG‘08 INTERNATIONAL conference WS08R2 will provide In-box driver to support power meter device(s) described in ACPI In-box IPMI operation region handler as part of the Microsoft IPMI driver – allowing ACPI control methods to communicate with IPMI using the KCS protocol Format similar to the SMBUS OpRegion 3rd-party IPMI drivers can register OpRegion handler for other IPMI protocol(s) Also proposed to ACPI 4.0 specification
  • 168. CMG‘08 INTERNATIONAL conference Flash SSD versus HDD (Jun ‘08) HDD Flash SSD Endurance (write cycles per bit) 10^12 10^5 (SLC* ) 10^6 (MLC* ) Cost per byte 1x 2.5x – 25x Performance : Small random read requests 1x 10 – 100x Active Power (Watts/byte) 10-20x 1x Shock Resistance Non-operating Operating 100g  200g (2010) ~10g 1500g 100g Thermal (°C) 5-55 0-70 * SLC – Single Level Cell * MLC – Multi Level Cell
  • 169. CMG‘08 INTERNATIONAL conference Flash Characteristics (Jun ’08)  Chip Read 50 MB/s Write 25 MB/s Scales with number of chips  Read Latency 25 μs to start, 100 μs for 2KB “page”  Write Latency 200 to 300 μs for 2KB “page” 2,000 μs to erase  Active Power 1-2 Watts for 8 chips + controller
  • 170. CMG‘08 INTERNATIONAL conference SSD High-IOps Workload TCO  Decrease TCO for IOps-intensive systems  IOps bottleneck causes customers to buy spindles instead of capacity, driving up TCO and operational complexity (e.g., workload balancing)  SSDs provide less expensive systems for same performance targets  Smaller form factors
  • 171. CMG‘08 INTERNATIONAL conference SSD Performance Concerns - 1  Random write perf  Could be alleviated with next generation of products  New technological problems may arise with future generations (no guarantee that it will stay at same level)  Potential bottleneck on erasing/block cleaning  Mixing workloads creates unexpected performance characteristics  Read:write ratio, request sizes, sequentiality
  • 172. CMG‘08 INTERNATIONAL conference SSD Performance Concerns - 2  First-pass performance might be better than steady-state  When nearing EOL, perf may degrade as blocks are removed from pool  Does mapping metadata have be re- read/initialized after power failure?  Need enough onboard parallelism to keep internal serial interfaces from becoming bottlenecks  Just like disk arrays, the wrong stripe unit size can kill perf in an SSD array
  • 173. CMG‘08 INTERNATIONAL conference WS08R2 Enables Improved Endurance for SSD Technology  SSD can identify itself differently from HDD in ATA as defined by ATA8-ACS Identify Word 217: Nominal media rotation rate  Reporting non-rotating media will allow WS08R2 to set Defrag off as default; improving device endurance by reducing writes
  • 174. CMG‘08 INTERNATIONAL conference WS08R2 Enables Optimization for SSD Technology  Microsoft implementation of “Trim” feature  NTFS will send down delete notification to the device supporting “trim” ○ File system operations: Format, Delete, Truncate, Compression ○ OS internal processes: e.g., Snapshot, Volume Manager  Three optimization opportunities for the device  Enhancing device wear leveling by eliminating merge operation for all deleted data blocks  Making early garbage collection possible for fast write  Keeping device’s unused storage area as high as possible; more room for device wear leveling.
  • 175. CMG‘08 INTERNATIONAL conference Parallelism Tradeoffs  No one scheme optimal for all workloads With faster serial connect, intra- chip ops are less important
  • 176. CMG‘08 INTERNATIONAL conference SSD Performance Trends Source: a subset of sample data from internal lab
  • 177. CMG‘08 INTERNATIONAL conference SSD Performance Trends Source: a subset of sample data from internal lab
  • 180. CMG‘08 INTERNATIONAL conference Hyper-V Power Management  Full P-state/C-state management already integrated between Windows root partition and Hyper-V v1 (WS08)  Enlightenments, such as timer assist added in Hyper-V v2 (WS08R2)  Hypervisor delivers child clocks without requiring root interaction, plus ITTD  Core parking enabled for all partitions
  • 181. CMG‘08 INTERNATIONAL conference Overview Scheduling virtual machines on a single server for density as opposed to dispersion This allows “park/sleep” cores by putting them into deep C states Benefits Significantly enhances Green IT by being able to reduce power required for CPUs Idle improvements extend to Hyper-V Significant reduction in platform interrupt activity Enables power savings and greater scalability
  • 185. CMG‘08 INTERNATIONAL conference Hyper-V Power Efficiency Windows Server Performance Lab  Testbed Configurations  Single Workloads  Web Fundamentals  SPECpower  Mixed Workloads
  • 186. CMG‘08 INTERNATIONAL conference Hyper-V Power Efficiency Workload configuration - 1  Methodology for obtaining power load line data for TPC-C, TPC-E, FSCT, and Web Fundamentals have been demonstrated  Benchmark loads varied by throttling number of active users  Multiple workloads tested in Hyper-V environment  SPECpower has been successfully tuned
  • 187. CMG‘08 INTERNATIONAL conference Workload Characteristics  Web Fundamentals (WF)  Dynamic scenario  CPU-bound workload  SPECpower (modified SPECjbb)  Kit version 1.0  Java version: JDK 1.6.0_02  JVM options: -Xms1024m -Xmx1024m - XXaggressive -XXlargePages - XXthroughputCompaction -XXcallprofiling - XXlazyUnlocking -Xgc:genpar -XXgcthreads:2 - XXtlasize:min=8k,preferred=1024k
  • 188. CMG‘08 INTERNATIONAL conference Hyper-V Power Efficiency Workload configuration - 2  Single workloads  All the guests run the same workload  Two scenarios: ○ Fixing the number of active guests and scaling the load in each guest ○ Fixing the load in each guest and activating more guests  Mixed workloads  Half of guests run each workload ○ Fixed load in WF guests (~35% CPU utilization each) ○ Varying load in SPECpower guests
  • 189. CMG‘08 INTERNATIONAL conference HW and SW Test Configurations Hardware  2-socket quad-core processors ○ Minimal P-States  16GB memory: 4x4GB 667MHz DIMMs  External (wall) power monitor  Software  OS: Windows Server 2008 ○ OS Power Management: Balanced mode  Hyper-V v2 (pre-release build) ○ Configured with 8 guests  Single virtual processor: 3.16GHz  1.75GB memory
  • 190. CMG‘08 INTERNATIONAL conference Web Fundamentals Dynamic  Adding Load to Each Guest 250 270 290 310 330 350 370 0 10000 20000 30000 Watts Throughput (Requests / Sec) Throughput and power usage versus total system utilization Power usage for various throughput levels 0 5000 10000 15000 20000 25000 250 270 290 310 330 350 370 0% 50% 100% Throughput(Requests/Sec) Watts System Utilization Percentage Watts Throughput For these experiments, the highest system utilization under WF workload is ~80%. This issue has been subsequently resolved.
  • 191. CMG‘08 INTERNATIONAL conference Web Fundamentals Dynamic  Activating Guests - 1 0 5000 10000 15000 20000 25000 250 270 290 310 330 350 370 0% 50% 100% Throughput(requests/sec) Watts System Utilization Percentage Watts Throughput Throughput and power usage versus total system utilization Power usage for various throughput levels •Data points from left to right: 0 guest, 1 guest, 2 guests, …, 8 guests active •Each active guest tries to run at the maximum load 250 270 290 310 330 350 370 0 10000 20000 30000 Watts Throughput (requests / sec)
  • 192. CMG‘08 INTERNATIONAL conference Web Fundamentals Dynamic  Activating Guests - 2Virtual processor utilizations for different numbers of active guests •The maximum utilization of each guest decreases as more guests are activated. Most of this decrease has been subsequently removed. 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 VirtualProcessorUtilization(%) Number of Guests Guest 1 Guest 2 Guest 3 Guest 4 Guest 5 Guest 6 Guest 7 Guest 8
  • 193. CMG‘08 INTERNATIONAL conference SPECpower  Adding Load to Each Guest - 1 Throughput and power usage versus total system utilization Power usage for various throughput levels 0 50 100 150 200 250 300 250 270 290 310 330 350 370 390 0% 20% 40% 60% 80% 100% Throughput(inThousands) Watts System Utilization Percentage Watts Throughput 65% 70% 75% 80% 85% 90% 95% 100% 0% 20% 40% 60% 80% 100%Power(%ofMaxWatts) Workload (% of maximum throughput)
  • 194. CMG‘08 INTERNATIONAL conference SPECpower  Adding Load to Each Guest - 2 Average processor frequency for various workload levels 0% 20% 40% 60% 80% 100% Frequency(MHz) Workload (% of Max Throughput)
  • 195. CMG‘08 INTERNATIONAL conference SPECpower  Adding Load to Each Guest - 3 Virtual processor utilizations for various workload levels Logical processor utilizations for various workload levels •All the guests change load levels concurrently. •The VM scheduler is biased towards utilizing higher numbered processors. 0 10 20 30 40 50 60 70 80 90 100 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% ActiveIdle Utilization(%) Load Guest 1 Guest 2 Guest 3 Guest 4 Guest 5 Guest 6 Guest 7 Guest 8 0 10 20 30 40 50 60 70 80 90 100 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% ActiveIdle Utilization(%) Load Proc 1 Proc 2 Proc 3 Proc 4 Proc 5 Proc 6 Proc 7 Proc 8
  • 196. CMG‘08 INTERNATIONAL conference SPECpower  Activating Guests Throughput and power usage versus total system utilization Power usage for various throughput levels Similar scalability behavior of power and throughput as when adding load. 0 50 100 150 200 250 300 250 270 290 310 330 350 370 390 0% 20% 40% 60% 80% 100% Throughput(inThousands) Watts System Utilization Percentage Watts Throughput 60% 65% 70% 75% 80% 85% 90% 95% 100% 0% 20% 40% 60% 80% 100%Power(%ofMaxWatts) Workload (% of maximum throughput)
  • 197. CMG‘08 INTERNATIONAL conference SPECpower and WF Mixed Workloads - 1 SPECpower throughput and server power usage versus total system utilization Power usage for various throughput levels 0 20 40 60 80 100 120 140 250 270 290 310 330 350 370 390 0% 20% 40% 60% 80% 100% Throughput(inThousands) Watts System Utilization Percentage Watts Throughput 60% 65% 70% 75% 80% 85% 90% 95% 100% 0% 20% 40% 60% 80% 100% Power(%ofWatts) Workload (% of maximum throughput) •4 Guests running WF (4940 req/sec) •~25% system utilization; ~35% guest virtual processor utilization •4 Guests running SPECpower (similar efficiency as single workload)
  • 198. CMG‘08 INTERNATIONAL conference SPECpower and WF Mixed Workloads - 2 Average processor frequency for various levels of SPECpower workload 0% 20% 40% 60% 80% 100% Frequency(MHz) Workload (% of Max SPECPower Throughput)
  • 199. CMG‘08 INTERNATIONAL conference Hyper-V Power Efficiency Future Experiments  More workloads  Different workload mix scenarios  Different combinations of fixed and varying workloads  More VM configurations  Multiple virtual processors per guest  Oversubscription
  • 201. CMG‘08 INTERNATIONAL conference Power WMI Provider  Enables power policy configuration through standard WMI interface  Change power setting values  Activate a given plan  Conforms to DMTF data model  To get started…  Change a power setting: Win32_PowerSetting  Activate a plan: Win32_Plan.Activate() method  Attend for additional details
  • 202. CMG‘08 INTERNATIONAL conference Configuration and Administration  WMI interfaces to query and set configuration settings  Configuration of systems  Global administration  Management applications  WMI interfaces to query current and hardware capabilities  3rd party applications  Diagnostics
  • 203. CMG‘08 INTERNATIONAL conference TargetSetting = "Microsoft:PowerSetting{3c0bc021-c8a8-4e07-a973- 6b14cbcb2b7e}" 'display blank timeout Set objWMIService = GetObject("WinMgmts:.rootcimv2power") Set SettingIndices = objWMIService.ExecQuery(“ASSOCIATORS OF {“ & chr(34) & “Win32_PowerSetting.InstanceID=“ & chr(34) & TargetSetting & chr(34) & “} WHERE ResultClass = Win32_PowerSettingDataIndex”) For Each SettingIndex in SettingIndices Set Plan = objWMIService.ExecQuery(“ASSOCIATORS OF {“ & chr(34) & SettingIndex.InstanceID & “} WHERE ResultClass = Win32_PowerPlan”) If Plan.IsActive THEN SettingIndex.SettingIndexValue = 120 ‘2 seconds SettingIndex.Put_ Plan.Activate()
  • 204. CMG‘08 INTERNATIONAL conference Remote Power Manageability - 1  WS08R2 supports the configuration of power policy via WMI  Local and remote management via WMI  Adheres to DMTF conventions for setting data  Scriptable  Includes support for reading and writing of all power plan and setting data  Active power plan can get changed remotely  Power Action can be carried out (sending a node to S3)
  • 206. CMG‘08 INTERNATIONAL conference Get the Active Plan: Set objWMIService = GetObject("WinMgmts:.rootcimv2power") Set PowerPlans = objWMIService.InstancesOf("Win32_PowerPlan") For Each PowerPlan in PowerPlans If PowerPlan.IsActive Then wscript.echo "Current Plan: " & PowerPlan.ElementName End If Next PowerPlan.Activate()
  • 207. CMG‘08 INTERNATIONAL conference Get all power settings in the Active Plan: (Continued with PowerPlan) EscapedInstanceID = Replace(PowerPlan.InstanceID, "", "") Set PowerSettingIndexes = objWMIService.ExecQuery( "ASSOCIATORS OF {Win32_PowerPlan.InstanceID=" & chr(34) & EscapedInstanceID & chr(34) & "}") For Each PowerSettingIndex in PowerSettingIndexes EscapedInstanceID = Replace(PowerSettingIndex.InstanceID, "", "") Set PowerSettings = objWMIService.ExecQuery( "ASSOCIATORS OF {Win32_PowerSettingDataIndex.InstanceID=" & chr(34) & EscapedInstanceID & chr(34) & "} WHERE ResultClass = Win32_PowerSetting") For Each PowerSetting in PowerSettings wscript.echo “Power Setting: “ & PowerSetting.InstanceID wscript.echo “Description: “ & PowerSetting.Description wscript.echo “Index Value: “ & PowerSettingIndex.SettingIndexValue