VMworld 2013
Anne Holler, VMware
Ganesha Shanmuganathan, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
VMworld 2013: Performance and Capacity Management of DRS Clusters
1. Performance and Capacity Management
of DRS Clusters
Anne Holler, VMware
Ganesha Shanmuganathan, VMware
VSVC5821
#VSVC5821
2. 2
Disclaimer
This session may contain product features that are
currently under development.
This session/overview of the new technology represents
no commitment from VMware to deliver these features in
any generally available product.
Features are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new technologies or features
discussed or presented have not been determined.
3. 3
DRS = Distributed Resource Scheduler
• The overall design goals of DRS are:
• Optimize VM performance subject to user control settings
• Provide resource isolation and sharing for subsets of VMs
• Use infrastructure and management resources efficiently
• Provide comprehensive automatic cluster management
• Mechanisms:
• Initial placement / Load balancing
• QoS enforcement: shares, reservations, limits, resource pools
• Policy enforcement: Affinity Rules, Anti-Affinity Rules
• Evacuation for host maintenance
VM VM VM
ESX Server
VM
ESX Server
VM VM VM
ESX Server
VMVM VMVM
DRS
Cluster
4. 4
Key elements to achieve design goals
Computing/Delivering VM CPU & memory resource entitlements
(Automatic VM Initial placement and automatic Migration)
Mapping the cluster resource pool tree onto individual hosts
Modeling vMotion remediation costs
Respecting constraints: compatibility, availability, host state, rules
Let’s look at each of these elements and examine advanced
deployment situations for each along with tips to handle them…
7. 7
• VM's dynamic entitlement (DE) is what VM would get if cluster were one giant host
• Takes into account VM resource controls and demand
Dynamic Entitlement
1 “giant host”
CPU = 60 GHz
Memory = 384 GB
6 hosts, each:
CPU = 10 GHz
Memory = 64 GB
8. 8
VM Resource Controls
Reservation: Guaranteed allocation
Limit: Guaranteed upper bound
Shares: Allocation in between
Resource pools: allocation and isolation for groups of VMs
Actual allocation depends on the
shares and demand
Configured (8GB)
Limit (6GB)
Actual(5GB)
Reserved (1GB)
0
9. 9
CPU Entitlement: Close-up on CPU Demand Estimate
By ESX:
• CPU Demand = used + stolen * run / (run + sleep)
• Stolen time includes:
• ready: vCPU is runnable but target CPU is busy
• overlap: Use of CPU to handle interrupts during this vCPU execution
• hyperthreading: Impact on CPU operation due to use of partner CPU
• power management: Loss of CPU cycles due to platform frequency scaling
http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf
By DRS:
• CPU Demand = ESX CPU Demand over time
• load balancing: average over last 5 minutes
• cost/benefit & DPM: maximum over extended period (up to 60 minutes)
10. 10
CPU Entitlement: Satisfies VM demand unless Contention
Contention monitoring: Ready time
• Rule of thumb: Ready <=5% per vCPU
• E.g., 4 vCPU VM <= 20% during low contention periods below
• Higher ready values do not necessarily indicate problems
• E.g., NUMA scheduling: vCPU has affinity for node containing memory
• DRS considers ready in CPU demand, but ready imbalance has been reported
Usage
Ready
11. 11
CPU Entitlement Scenarios
High CPU ready; manual vMotion improved performance
• Why ? DRS averaging underestimated demand of spiky CPU workload
• Tip/2013: Introduced AggressiveCPUActive advanced option
• Option uses larger of:
• 5 minute average of ESX CPU demand [matches pre-2013 computation]
• 80% percentile (2nd largest) of the last five 1 minute average ESX CPU demand
12. 12
CPU Entitlement Scenarios, continued
CPU ready time high, but host lightly utilized
• Why? Platform-level power management operating beneath ESX layer
• Tip: Set BIOS power mgmt setting to OS Control
http://blogs.vmware.com/vsphere/2012/01/having-a-performance-problem-hard-to-resolve-have-you-
checked-your-host-bios-lately.html
13. 13
Memory Entitlement: Close-up on Memory Demand Estimate
By ESX:
• Memory Demand = max last 4 minutes of oneMinuteActive
• oneMinuteActive computed as follows:
• Unmap random sample of pages
• Check mapping activity in a minute
• Scale up to memSize
• Accounts for: Large or Swapped or Ballooned pages
By DRS:
• Memory Demand = ESX Memory Demand over time + headroom:
• load balancing: average over last 5 mins + 25% (default) of consumed idle
• cost/benefit & DPM: maximum over extended period + 0% consumed idle
4GB Consumed Idle1GB ESX Demand
2GB DRS Demand
http://www.vmware.com/files/pdf/mem_mgmt_perf_vsphere5.pdf
14. 14
Memory Entitlement: Satisfies VM demand unless Contention
Contention monitoring: Amount of reclamation
• Reclamation: Ballooning, page sharing, compression, ESX swapping
• Customers often monitor for non-zero ballooning & ESX swapping values
• Some ballooning can allow more efficient memory usage
• DRS memory demand estimate has been lower than desired in some cases
Ballooning might cause guest internal swapping
15. 15
Memory Entitlement Scenarios
Scenario: DRS performed undesired VM memory imbalance move
• Why? DRS demand includes 25% idle
• Customer preferred 100%, i.e., wanted DRS to manage consumed as demand
• Tip/2013: Use new option PercentIdleMBInMemDemand (default 25%)
• Can be set to 100% to have both load-balancing & cost-benefit manage to consumed
• LegacyTip: Change IdleTax option (default 75% -> 25% idle added [100-75])
• Only influences load-balancing
16. 16
Memory Entitlement Scenarios, continued
Scenario: Customer found DPM memory consolidation too high
• Why? DPM consolidated on active mem w/o adding any idle consumed
• Customer wanted to reduce later impact of demand paging ballooned/swapped pages
• Tip/2013: Added new option PercentIdleMBInMemDemand (default 25%)
• Can be set to 100% to have DPM manage to consumed
*Image source:http://www.dailymail.co.uk/news/article-1292411/Record-temperatures-China-drive-hundreds-water.html
*
18. 18
Resource Pools in Cluster
[reservation (MHz/MB), limit (MHz/MB)]
RP2
200, 400
Root
VM 2
1000
VM 1
2000
VM 3
100
100, 400
400
100, 8000 100, 8000
150
shares
Resource
Pools
500, 8000
RP1
150
19. 19
Resource Pools in Cluster
Cluster wide Resource Pools
VM 2
1000
VM 1
2000
RP1
500, 8000
100, 8000 100, 8000
150
Root
D=200 D=500
RP2
200, 400
VM 3
100
100, 400
400
D=1000
Host A
VM 1
2000
RP1
200, 3000
100, 8000
100
Host B
VM 2
1000
RP1
300,5000
100, 8000
50
Host A Host B
20. 20
Resource Pools in Cluster
Host A
RP2
200, 400
VM 1
2000 VM 3
100
100, 400
400
RP1
200, 3000
100, 8000
100
Host B
VM 2
1000
RP1
300,5000
100, 8000
50
D=200 D=1000
D=500
21. 21
Mapping cluster RP tree onto individual hosts Scenario
DRS RP flow too slow to maintain desired VM performance
• Why?: Conservative host RP reservations, limits capped spike response
• Tip: Set CapRpReservationAtDemand to 0 (default :1) to have DRS
distribute all RP reserved resources, rather than just needed for demand
• Tip/2013: Set AllowUnlimitedCpuLimitForVms to 0 (default : 1) to have DRS
distribute limits as much as possible
Host A
VM 1
100, 8000
VM2
Host B
RP1
100,2000
100, 8000
50
RP1
100
380,5000
400,6000120,3000
VM2
23. 23
CostBenefit Filtering
Benefit: Higher resource availability
Cost:
• Migration cost: vMotion CPU & memory cost, VM slowdown
• Risk cost: Benefit may not be sustained due to load variation
Gain
(MHz or MB)
Migration
Time
Stable
Time
Benefit
Migration cost
Risk cost Time (sec)
Invocation
Interval
0
Loss
24. 24
CostBenefit Filtering
vMotions caused unacceptable performance degradation
• Why?: DRS CB didn't capture high sensitivity of VMs to vMotion
• Tip: Tune CB to model vMotion aggressively
Set IgnoreDownTimeLessThan to 0
Assumes UseDownTime is set to 1 (Default : 1)
Gain
(MHz or MB)
Migration
Time
Stable
Time
Benefit
Migration cost
Risk cost Time (sec)
Invocation
Interval
0
Loss
Gain
(MHz or MB)
Migration
Time
Stable
Time
Benefit
Migration cost
Risk cost Time (sec)
Invocation
Interval
0
Loss
25. 25
Modeling vMotion costs Scenario
DRS handles VMs known to be highly sensitive to vMotion
• 2013: powered-on low-latency VMs treated as soft-affine w/current host
• 2013: powered-on VMs w/vFlashCache reservations soft-affine w/host
SSDvFlash
26. 26
vMotion costs Scenario
DRS left cluster severely imbalanced
VM-Happiness is primary metric
DRS filtered moves aggressively
• By default DRS becomes more aggressive when imbalance is severe:
• FixSevereImbalanceOnly = 1 (default)
• SevereImbalanceRelaxMinGoodness = 1 (default)
• SevereImbalanceRelaxCostBenefit = 1 (default)
• Tip/extreme: If above defaults still leave more balance than desired, can use:
• UseDownTime = 0
• FixSevereImbalanceOnly = 0 (handle with care!)
• SevereImbalanceDropCostBenefit = 1 (handle with care!)
28. 28
Respecting constraints (e.g., availability, rules)
Customers may express business rules to influence load-balancing
• E.g.: use VM-VM anti-affinity rules for availability
• E.g.: use VM-host affinity rules for software licensing
Host Group
Anti-Affinity
29. 29
Asymmetric Cluster Scenario
Asymmetric storage or network access cost
• E.g., if moving VM from set of hosts would cause higher network latency for
storage as it has to cross racks or do L2 over L3 network, use soft affinity rule
to keep VMs on hosts with lower access cost
ToR switch
Router
ToR switch
DRS Cluster
30. 30
Respecting constraints (e.g., availability, rules) Scenarios
Stretch cluster with VMs on hosts with primary storage
• Current solution of using soft VM/Host rules to partition VMs between sites will
allow VMs to violate rules if any host over-utilized.
• Tip/2013: Added support for semi-hard VM/host rules (only drop soft VM/host
rules for constraints, not for high utilization) via option
DropSoftVmHostRulesOverutilized = 1
WAN
Replication
31. 31
Respecting constraints (e.g., availability, rules) Scenarios
More VMs/host than wanted wrt failure impact (eggs/basket)
• Why?: By default, DRS allows up to ESX supported VM limit and is balancing
CPU and memory, not number of VMs
• Tip: Use LimitVMsPerESXHost option.
Restrict the number of VMs on the host
Inflexible, requires manual scaling
Eg: LimitVMsPerESXHosts = 6
• Tip/2013: Use LimitVMsPerESXHostPercent option.
Restrict the number of VMs based on tolerance.
Flexible, automatic.
Number of VMs on host = Mean + (Buffer% * Mean)
33. 33
Cluster - Capacity
Sum (VM CPU reservations ) < 75% of the cluster CPU capacity
available for VMs
Sum (VM Memory reservations + overhead) < 75% of the cluster
Memory capacity available for VMs
34. 34
Cluster - Capacity
For maximum performance of all VMs in the cluster
Sum (VM demands) < 80% of the cluster capacity
DRS starts throttling less important VMs as demand gets closer
to/exceeds capacity
35. 35
Capacity Management Scenarios
vMotions becoming slower in the cluster
Why? Due to unreserved CPU on the host being lesser than 30% of a
core.
vMotion tries to reserve 30% of a core for the vMotion process and if that
fails, vmotion may proceed at slower rate
Lot of VM Power on failures in a DRS cluster
Why? Due to not enough un-reserved memory in the cluster/sub-cluster to
satisfy the powering ON VM’s reservation or overhead.
37. 37
Related material
Other DRS talks at VMworld 2013
• VSVC5280 - DRS: New Features, Best Practices and Future Directions (11 am
Monday & 11 am Tuesday)
• STO5636 - Storage DRS: Deep Dive and Best Practices to Suit Your Storage
Environments (4 pm, Monday & 12:30 pm Tuesday)
• VSVC5364 - Storage IO Control: Concepts, Configuration and Best Practices
to Tame Different Storage Architectures (8:30 am, Wed. & 11 am, Thursday)
From VMworld 2012
• VSP2825 - DRS: Advanced Concepts, Best Practices and Future Directions
VMware Technical Journal publications
• VMware Distributed Resource Management: Design, Implementation, and
Lessons Learned
• Storage DRS: Automated Management of Storage Devices In a Virtualized
Datacenter
More related publications at http://labs.vmware.com/academic/publications
44. 44
VM Creation
Important VM Parameters:
Number of vCPUs
Memory Size
CPU/Mem Reservation
Shares
RP Hierarchy
VM - Rules
45. 45
VM Sizing
Too many vCPUs wastes overhead
Too little vCPUs may cause application to perform poorly (Check if
all vCPUs are near 100% used)
Too much memory would cause excessive ballooning
Too little memory may cause guest internal swapping (Check for
swap statistics inside the guest)
VM needs may vary with time. Pick maximum of what the VM needs
46. 46
Case Study: Stretch cluster
Use soft VM/Host rules to partition VMs between sites
New: Added optional support for semi-hard VM/host rules (only drop soft
VM/host rules for constraints not for high utilization)
WAN
47. 47
Cluster - Capacity
Sum (VM reservations + overhead) < 75% of the cluster capacity
available for VMs
50. 50
VM Creation
Important VM Parameters:
Number of vCPUs
Memory Size
CPU/Mem Reservation
Shares
RP Hierarchy
VM - Rules
51. 51
VM Sizing
Too many vCPUs wastes overhead
Too little vCPUs may cause application to perform poorly (Check if
all vCPUs are near 100% used)
Too much memory would cause excessive ballooning
Too little memory may cause guest internal swapping (Check for
swap statistics inside the guest)
VM needs may vary with time. Pick maximum of what the VM needs
52. 52
Case Study: Stretch Cluster
• Use soft VM/Host rules to partition VMs between sites
• New: Added optional support for semi-hard VM/host rules (only drop soft
VM/host rules for constraints not for high utilization)
WAN
53. 53
Cluster – Capacity
Sum (VM reservations + overhead) < 75% of the cluster capacity
available for VMs
54. 54
Cluster – Capacity
Sum (VM demands) should be < 80% of the cluster capacity
available for VMs
58. 58
VMs in Resource Pool not getting same performance
[reservation (MHz/MB), limit
(MHz/MB)]
RP2
200, 400
Root
VM 2
1000
VM 1
2000
VM 3
100
100, 400
400
RP1
400, 8000
100, 8000 100, 8000
150
share
s
Resource
Pools
59. 59
Resource Pools - Performance
Cluster wide Resource Pools
Root
VM 2
1000
VM 1
2000
RP2
200, 400
VM 3
100
100, 400
400
RP1
400, 8000
100, 8000 100, 8000
150
Host A Host B
Host A
VM
1
20
00
RP1
150, 3000
100, 8000
100
Host B
VM
2
10
00
RP1
250,5000
100, 8000
50
60. 60
Resource Pools - Performance
Host A
R
P2
200, 400
VM
1
20
00
VM
3
10
0
100, 400
400
RP1
150, 3000
100, 8000
100
Host B
VM
2
10
00
RP1
250,5000
100, 8000
50
61. 61
Resource Pools - Performance
DRS flows resources between hosts every 5 minutes
Tuned to minimize the number of migrations by throttling this flow.
More aggressive settings to flow the resources:
Advanced Control Knobs
(1) CapRpReservationAtDemand – False (default True)
(2) AllowUnlimitedCpuLimitForVms – False (default True)
(1) Would flow reservations more aggressively
(2) Would flow limits more aggressively
63. 63
CPU Management
Demand: How CPU Demand (aka CPU Active) is estimated
• By ESX: CPU time VM would consume if there were no stolen time
• CPU Demand = used + stolen * run / (run + sleep), where stolen time includes:
• Ready: VCPU is runnable but target CPU is busy
• Overlap: Use of CPU to handle interrupts during this VCPU execution
• Hyperthreading: Impact on CPU operation due to use of partner CPU
• PowerManagement: Loss of CPU cycles due to platform frequency scaling
• By DRS load balancing: Average ESX CPU demand over last 5 minutes
• DRS Cost/Benefit & DPM power-off consider ~max over longer periods
64. 64
CPU Management, con’t
Reservation: Impact
• Ensures via admission control VM can obtain reserved CPU when demanded
• Work-conserving; other VMs use reserved CPU when VM doesn’t demand it
Ready time metric
• General rule of thumb is that it should be 5% or less per VCPU
• Values higher than this do not necessarily indicate problems
• Check out discussion & chime in on:
• http://www.yellow-bricks.com/2013/05/09/drs-not-taking-cpu-ready-time-in-to-account-need-your-help/
• Some troubleshooting case studies follow
65. 65
CPU Management Case Studies
Case: CPU ready time metric high, but host not heavily utilized
• Explanation: NUMA scheduling favors running on CPU near local memory
• Fix: none; this scheduling gives better performance
• http://blogs.vmware.com/vsphere/2012/02/vspherenuma-loadbalancing.html
66. 66
CPU Management Case Studies
Case: CPU ready time metric high, but host lightly utilized
• Explanation: Platform-level power management enabled
• Fix: Set BIOS power mgmt setting to Maximum or OS Control
67. 67
CPU Management Case Studies
Case: CPU ready time metric high, better perf after manual move
• Explanation: DRS average underestimated demand of spiky CPU workloads
• Fix: Introduced AggressiveCPUActive advanced option, which uses larger of:
• the 5 minute average of ESX CPU demand
• the 80% percentile (2nd largest) of the last 5 1 minute average ESX CPU demand
69. 69
Memory Management
Demand: How memory demand (aka memory active) is estimated
• By ESX
• Statistical: unmap small random sample of pages each minute, see what percentage
are referenced, assume that percentage of mapped pages are active; take max of last
4 minutes
• By DRS
• For load balancing: Average ESX memory demand over last 5 minutes +
percentage (default 25) of idle consumed memory
• DRS C/B and DPM power-off consider ~max over longer periods
70. 70
Memory Management, con’t
Reservation: Impact
• Ensures via admission control VM can obtain reserved amount of memory
• Not work-conserving; once reserved memory consumed, not reallocated
Reclamation
• Ballooning, transparent page sharing, compression, ESX swapping
Ballooning/swapping metrics
• Customers often monitor for non-zero values
• No over-commitment can lead to high memory cost
71. 71
Memory Management Case Studies
Case: Undesirable VM migration for memory imbalance
• Explanation: DRS managing active memory, Customer wanted DRS to
manage consumed
• Fix: Use IdleTax option to include more idle memory in active
• In vSphere 5.5, added new option PercentIdleMBInMemDemand (default 25%) which
can be set to 100% to manage to consumed
72. 72
Memory Management Case Studies
Case: DPM overconsolidated memory
• Explanation: DPM consolidating on active memory, Customer wanted DPM to
use consumed
• Fix: Added new option PercentIdleMBInMemDemand (also can be used
instead of IdleTax)