VMworld 2013: Performance and Capacity Management of DRS Clusters

Performance and Capacity Management
of DRS Clusters
Anne Holler, VMware
Ganesha Shanmuganathan, VMware
VSVC5821
#VSVC5821

2
Disclaimer
 This session may contain product features that are
currently under development.
 This session/overview of the new technology represents
no commitment from VMware to deliver these features in
any generally available product.
 Features are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
 Technical feasibility and market demand will affect final delivery.
 Pricing and packaging for any new technologies or features
discussed or presented have not been determined.

3
DRS = Distributed Resource Scheduler
• The overall design goals of DRS are:
• Optimize VM performance subject to user control settings
• Provide resource isolation and sharing for subsets of VMs
• Use infrastructure and management resources efficiently
• Provide comprehensive automatic cluster management
• Mechanisms:
• Initial placement / Load balancing
• QoS enforcement: shares, reservations, limits, resource pools
• Policy enforcement: Affinity Rules, Anti-Affinity Rules
• Evacuation for host maintenance
VM VM VM
ESX Server
VM
ESX Server
VM VM VM
ESX Server
VMVM VMVM
DRS
Cluster

4
Key elements to achieve design goals
 Computing/Delivering VM CPU & memory resource entitlements
(Automatic VM Initial placement and automatic Migration)
 Mapping the cluster resource pool tree onto individual hosts
 Modeling vMotion remediation costs
 Respecting constraints: compatibility, availability, host state, rules
 Let’s look at each of these elements and examine advanced
deployment situations for each along with tips to handle them…

6
Computing/Delivering VM CPU
and
Memory resource entitlements

7
• VM's dynamic entitlement (DE) is what VM would get if cluster were one giant host
• Takes into account VM resource controls and demand
Dynamic Entitlement
1 “giant host”
CPU = 60 GHz
Memory = 384 GB
6 hosts, each:
CPU = 10 GHz
Memory = 64 GB

8
VM Resource Controls
 Reservation: Guaranteed allocation
 Limit: Guaranteed upper bound
 Shares: Allocation in between
 Resource pools: allocation and isolation for groups of VMs
Actual allocation depends on the
shares and demand
Configured (8GB)
Limit (6GB)
Actual(5GB)
Reserved (1GB)
0

9
CPU Entitlement: Close-up on CPU Demand Estimate
 By ESX:
• CPU Demand = used + stolen * run / (run + sleep)
• Stolen time includes:
• ready: vCPU is runnable but target CPU is busy
• overlap: Use of CPU to handle interrupts during this vCPU execution
• hyperthreading: Impact on CPU operation due to use of partner CPU
• power management: Loss of CPU cycles due to platform frequency scaling
http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf
 By DRS:
• CPU Demand = ESX CPU Demand over time
• load balancing: average over last 5 minutes
• cost/benefit & DPM: maximum over extended period (up to 60 minutes)

10
CPU Entitlement: Satisfies VM demand unless Contention
 Contention monitoring: Ready time
• Rule of thumb: Ready <=5% per vCPU
• E.g., 4 vCPU VM <= 20% during low contention periods below
• Higher ready values do not necessarily indicate problems
• E.g., NUMA scheduling: vCPU has affinity for node containing memory
• DRS considers ready in CPU demand, but ready imbalance has been reported
Usage
Ready

11
CPU Entitlement Scenarios
 High CPU ready; manual vMotion improved performance
• Why ? DRS averaging underestimated demand of spiky CPU workload
• Tip/2013: Introduced AggressiveCPUActive advanced option
• Option uses larger of:
• 5 minute average of ESX CPU demand [matches pre-2013 computation]
• 80% percentile (2nd largest) of the last five 1 minute average ESX CPU demand

12
CPU Entitlement Scenarios, continued
 CPU ready time high, but host lightly utilized
• Why? Platform-level power management operating beneath ESX layer
• Tip: Set BIOS power mgmt setting to OS Control
http://blogs.vmware.com/vsphere/2012/01/having-a-performance-problem-hard-to-resolve-have-you-
checked-your-host-bios-lately.html

13
Memory Entitlement: Close-up on Memory Demand Estimate
 By ESX:
• Memory Demand = max last 4 minutes of oneMinuteActive
• oneMinuteActive computed as follows:
• Unmap random sample of pages
• Check mapping activity in a minute
• Scale up to memSize
• Accounts for: Large or Swapped or Ballooned pages
 By DRS:
• Memory Demand = ESX Memory Demand over time + headroom:
• load balancing: average over last 5 mins + 25% (default) of consumed idle
• cost/benefit & DPM: maximum over extended period + 0% consumed idle
4GB Consumed Idle1GB ESX Demand
2GB DRS Demand
http://www.vmware.com/files/pdf/mem_mgmt_perf_vsphere5.pdf

14
Memory Entitlement: Satisfies VM demand unless Contention
 Contention monitoring: Amount of reclamation
• Reclamation: Ballooning, page sharing, compression, ESX swapping
• Customers often monitor for non-zero ballooning & ESX swapping values
• Some ballooning can allow more efficient memory usage
• DRS memory demand estimate has been lower than desired in some cases
Ballooning might cause guest internal swapping

15
Memory Entitlement Scenarios
 Scenario: DRS performed undesired VM memory imbalance move
• Why? DRS demand includes 25% idle
• Customer preferred 100%, i.e., wanted DRS to manage consumed as demand
• Tip/2013: Use new option PercentIdleMBInMemDemand (default 25%)
• Can be set to 100% to have both load-balancing & cost-benefit manage to consumed
• LegacyTip: Change IdleTax option (default 75% -> 25% idle added [100-75])
• Only influences load-balancing

16
Memory Entitlement Scenarios, continued
 Scenario: Customer found DPM memory consolidation too high
• Why? DPM consolidated on active mem w/o adding any idle consumed
• Customer wanted to reduce later impact of demand paging ballooned/swapped pages
• Tip/2013: Added new option PercentIdleMBInMemDemand (default 25%)
• Can be set to 100% to have DPM manage to consumed
*Image source:http://www.dailymail.co.uk/news/article-1292411/Record-temperatures-China-drive-hundreds-water.html
*

17
Mapping Cluster Resource Pools
onto the Hosts

18
Resource Pools in Cluster
[reservation (MHz/MB), limit (MHz/MB)]
RP2
200, 400
Root
VM 2
1000
VM 1
2000
VM 3
100
100, 400
400
100, 8000 100, 8000
150
shares
Resource
Pools
500, 8000
RP1
150

19
 Cluster wide Resource Pools
VM 2
1000
VM 1
2000
RP1
500, 8000
100, 8000 100, 8000
150
Root
D=200 D=500
RP2
200, 400
VM 3
100
100, 400
400
D=1000
Host A
VM 1
2000
RP1
200, 3000
100, 8000
100
Host B
VM 2
1000
RP1
300,5000
100, 8000
50
Host A Host B

20
Host A
RP2
200, 400
VM 1
2000 VM 3
100
100, 400
400
RP1
200, 3000
100, 8000
100
Host B
VM 2
1000
RP1
300,5000
100, 8000
50
D=200 D=1000
D=500

21
Mapping cluster RP tree onto individual hosts Scenario
 DRS RP flow too slow to maintain desired VM performance
• Why?: Conservative host RP reservations, limits capped spike response
• Tip: Set CapRpReservationAtDemand to 0 (default :1) to have DRS
distribute all RP reserved resources, rather than just needed for demand
• Tip/2013: Set AllowUnlimitedCpuLimitForVms to 0 (default : 1) to have DRS
distribute limits as much as possible
Host A
VM 1
100, 8000
VM2
Host B
RP1
100,2000
100, 8000
50
RP1
100
380,5000
400,6000120,3000
VM2

22
Modeling vMotion Remediation
costs

23
CostBenefit Filtering
 Benefit: Higher resource availability
 Cost:
• Migration cost: vMotion CPU & memory cost, VM slowdown
• Risk cost: Benefit may not be sustained due to load variation
Gain
(MHz or MB)
Migration
Time
Stable
Time
Benefit
Migration cost
Risk cost Time (sec)
Invocation
Interval
0
Loss

24
CostBenefit Filtering
 vMotions caused unacceptable performance degradation
• Why?: DRS CB didn't capture high sensitivity of VMs to vMotion
• Tip: Tune CB to model vMotion aggressively
Set IgnoreDownTimeLessThan to 0
Assumes UseDownTime is set to 1 (Default : 1)
Gain
(MHz or MB)
Migration
Time
Stable
Time
Benefit
Migration cost
Invocation
Interval
0
Loss
Gain
(MHz or MB)
Migration
Time
Stable
Time
Benefit
Migration cost
Invocation
Interval
0
Loss

25
Modeling vMotion costs Scenario
 DRS handles VMs known to be highly sensitive to vMotion
• 2013: powered-on low-latency VMs treated as soft-affine w/current host
• 2013: powered-on VMs w/vFlashCache reservations soft-affine w/host
SSDvFlash

26
vMotion costs Scenario
DRS left cluster severely imbalanced
VM-Happiness is primary metric
DRS filtered moves aggressively
• By default DRS becomes more aggressive when imbalance is severe:
• FixSevereImbalanceOnly = 1 (default)
• SevereImbalanceRelaxMinGoodness = 1 (default)
• SevereImbalanceRelaxCostBenefit = 1 (default)
• Tip/extreme: If above defaults still leave more balance than desired, can use:
• UseDownTime = 0
• FixSevereImbalanceOnly = 0 (handle with care!)
• SevereImbalanceDropCostBenefit = 1 (handle with care!)

28
Respecting constraints (e.g., availability, rules)
 Customers may express business rules to influence load-balancing
• E.g.: use VM-VM anti-affinity rules for availability
• E.g.: use VM-host affinity rules for software licensing
Host Group
Anti-Affinity

29
Asymmetric Cluster Scenario
 Asymmetric storage or network access cost
• E.g., if moving VM from set of hosts would cause higher network latency for
storage as it has to cross racks or do L2 over L3 network, use soft affinity rule
to keep VMs on hosts with lower access cost
ToR switch
Router
ToR switch
DRS Cluster

30
Respecting constraints (e.g., availability, rules) Scenarios
 Stretch cluster with VMs on hosts with primary storage
• Current solution of using soft VM/Host rules to partition VMs between sites will
allow VMs to violate rules if any host over-utilized.
• Tip/2013: Added support for semi-hard VM/host rules (only drop soft VM/host
rules for constraints, not for high utilization) via option
DropSoftVmHostRulesOverutilized = 1
WAN
Replication

31
Respecting constraints (e.g., availability, rules) Scenarios
 More VMs/host than wanted wrt failure impact (eggs/basket)
• Why?: By default, DRS allows up to ESX supported VM limit and is balancing
CPU and memory, not number of VMs
• Tip: Use LimitVMsPerESXHost option.
Restrict the number of VMs on the host
Inflexible, requires manual scaling
Eg: LimitVMsPerESXHosts = 6
• Tip/2013: Use LimitVMsPerESXHostPercent option.
Restrict the number of VMs based on tolerance.
Flexible, automatic.
Number of VMs on host = Mean + (Buffer% * Mean)

33
Cluster - Capacity
Sum (VM CPU reservations ) < 75% of the cluster CPU capacity
available for VMs
Sum (VM Memory reservations + overhead) < 75% of the cluster
Memory capacity available for VMs

34
Cluster - Capacity
 For maximum performance of all VMs in the cluster
Sum (VM demands) < 80% of the cluster capacity
 DRS starts throttling less important VMs as demand gets closer
to/exceeds capacity

35
Capacity Management Scenarios
 vMotions becoming slower in the cluster
Why? Due to unreserved CPU on the host being lesser than 30% of a
core.
vMotion tries to reserve 30% of a core for the vMotion process and if that
fails, vmotion may proceed at slower rate
 Lot of VM Power on failures in a DRS cluster
Why? Due to not enough un-reserved memory in the cluster/sub-cluster to
satisfy the powering ON VM’s reservation or overhead.

36
Questions
Questions, Comments, Additional Scenarios?
DRS Survey:
http://www.vmware.com/go/drssurvey
? ?
?? ?

37
Related material
 Other DRS talks at VMworld 2013
• VSVC5280 - DRS: New Features, Best Practices and Future Directions (11 am
Monday & 11 am Tuesday)
• STO5636 - Storage DRS: Deep Dive and Best Practices to Suit Your Storage
Environments (4 pm, Monday & 12:30 pm Tuesday)
• VSVC5364 - Storage IO Control: Concepts, Configuration and Best Practices
to Tame Different Storage Architectures (8:30 am, Wed. & 11 am, Thursday)
 From VMworld 2012
• VSP2825 - DRS: Advanced Concepts, Best Practices and Future Directions
 VMware Technical Journal publications
• VMware Distributed Resource Management: Design, Implementation, and
Lessons Learned
• Storage DRS: Automated Management of Storage Devices In a Virtualized
Datacenter
More related publications at http://labs.vmware.com/academic/publications

42
Cluster Creation - Hardware
 Heterogeneous components vs. homogenous components
 Have vMotion compatible hosts (EVC helps)
 DRS supports heterogeneous hosts, storage and connectivity
Compute Network Storage
Switches
Router

43
Cluster Creation – Connectivity
 Avoid Storage islands
 Network should span entire cluster
 Handle Network/Storage bottlenecks/slow links using VM-Host affinity
rules
Storage
Switch
Router
Switch

44
VM Creation
Important VM Parameters:
 Number of vCPUs
 Memory Size
 CPU/Mem Reservation
 Shares
RP Hierarchy
VM - Rules

45
VM Sizing
 Too many vCPUs wastes overhead
 Too little vCPUs may cause application to perform poorly (Check if
all vCPUs are near 100% used)
 Too much memory would cause excessive ballooning
 Too little memory may cause guest internal swapping (Check for
swap statistics inside the guest)
 VM needs may vary with time. Pick maximum of what the VM needs

46
Case Study: Stretch cluster
Use soft VM/Host rules to partition VMs between sites
New: Added optional support for semi-hard VM/host rules (only drop soft
VM/host rules for constraints not for high utilization)
WAN

47
Cluster - Capacity
Sum (VM reservations + overhead) < 75% of the cluster capacity
available for VMs

48
Cluster Creation – Hardware
 Heterogeneous components vs. homogenous components
 Have vMotion compatible hosts (EVC helps)
 DRS supports heterogeneous hosts, storage and connectivity
Compute Network Storage
Switches
Router

49
Cluster Creation – Connectivity
 Avoid Storage islands
 Network should span entire cluster
 Handle Network/Storage bottlenecks/slow links using VM-Host affinity
rules
Storage
Switch
Router
Switch

50
VM Creation
Important VM Parameters:
 Number of vCPUs
 Memory Size
 CPU/Mem Reservation
 Shares
RP Hierarchy
VM - Rules

51
VM Sizing
 Too many vCPUs wastes overhead
 Too little vCPUs may cause application to perform poorly (Check if
all vCPUs are near 100% used)
 Too much memory would cause excessive ballooning
 Too little memory may cause guest internal swapping (Check for
swap statistics inside the guest)
 VM needs may vary with time. Pick maximum of what the VM needs

52
Case Study: Stretch Cluster
• Use soft VM/Host rules to partition VMs between sites
• New: Added optional support for semi-hard VM/host rules (only drop soft
VM/host rules for constraints not for high utilization)
WAN

53
Cluster – Capacity
Sum (VM reservations + overhead) < 75% of the cluster capacity
available for VMs

54
 Sum (VM demands) should be < 80% of the cluster capacity
available for VMs

56
 Number of hosts – As big as possible
 HA – Enable admission control
Advanced knobs to make DRS consider number of VMs

58
VMs in Resource Pool not getting same performance
[reservation (MHz/MB), limit
(MHz/MB)]
RP2
200, 400
Root
VM 2
1000
VM 1
2000
VM 3
100
100, 400
400
RP1
400, 8000
100, 8000 100, 8000
150
share
s
Resource
Pools

59
Resource Pools - Performance
 Cluster wide Resource Pools
Root
VM 2
1000
VM 1
2000
RP2
200, 400
VM 3
100
100, 400
400
RP1
400, 8000
100, 8000 100, 8000
150
Host A Host B
Host A
VM
1
20
00
RP1
150, 3000
100, 8000
100
Host B
VM
2
10
00
RP1
250,5000
100, 8000
50

60
Host A
R
P2
200, 400
VM
1
20
00
VM
3
10
0
100, 400
400
RP1
150, 3000
100, 8000
100
Host B
VM
2
10
00
RP1
250,5000
100, 8000
50

61
 DRS flows resources between hosts every 5 minutes
 Tuned to minimize the number of migrations by throttling this flow.
More aggressive settings to flow the resources:
 Advanced Control Knobs
(1) CapRpReservationAtDemand – False (default True)
(2) AllowUnlimitedCpuLimitForVms – False (default True)
(1) Would flow reservations more aggressively
(2) Would flow limits more aggressively

62
CPU Management
 Reference: http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf

63
CPU Management
 Demand: How CPU Demand (aka CPU Active) is estimated
• By ESX: CPU time VM would consume if there were no stolen time
• CPU Demand = used + stolen * run / (run + sleep), where stolen time includes:
• Ready: VCPU is runnable but target CPU is busy
• Overlap: Use of CPU to handle interrupts during this VCPU execution
• Hyperthreading: Impact on CPU operation due to use of partner CPU
• PowerManagement: Loss of CPU cycles due to platform frequency scaling
• By DRS load balancing: Average ESX CPU demand over last 5 minutes
• DRS Cost/Benefit & DPM power-off consider ~max over longer periods

64
CPU Management, con’t
 Reservation: Impact
• Ensures via admission control VM can obtain reserved CPU when demanded
• Work-conserving; other VMs use reserved CPU when VM doesn’t demand it
 Ready time metric
• General rule of thumb is that it should be 5% or less per VCPU
• Values higher than this do not necessarily indicate problems
• Check out discussion & chime in on:
• http://www.yellow-bricks.com/2013/05/09/drs-not-taking-cpu-ready-time-in-to-account-need-your-help/
• Some troubleshooting case studies follow

65
CPU Management Case Studies
 Case: CPU ready time metric high, but host not heavily utilized
• Explanation: NUMA scheduling favors running on CPU near local memory
• Fix: none; this scheduling gives better performance
• http://blogs.vmware.com/vsphere/2012/02/vspherenuma-loadbalancing.html

66
 Case: CPU ready time metric high, but host lightly utilized
• Explanation: Platform-level power management enabled
• Fix: Set BIOS power mgmt setting to Maximum or OS Control

67
 Case: CPU ready time metric high, better perf after manual move
• Explanation: DRS average underestimated demand of spiky CPU workloads
• Fix: Introduced AggressiveCPUActive advanced option, which uses larger of:
• the 5 minute average of ESX CPU demand
• the 80% percentile (2nd largest) of the last 5 1 minute average ESX CPU demand

68
Memory Management
 Reference: http://www.vmware.com/files/pdf/mem_mgmt_perf_vsphere5.pdf

69
Memory Management
 Demand: How memory demand (aka memory active) is estimated
• By ESX
• Statistical: unmap small random sample of pages each minute, see what percentage
are referenced, assume that percentage of mapped pages are active; take max of last
4 minutes
• By DRS
• For load balancing: Average ESX memory demand over last 5 minutes +
percentage (default 25) of idle consumed memory
• DRS C/B and DPM power-off consider ~max over longer periods

70
Memory Management, con’t
 Reservation: Impact
• Ensures via admission control VM can obtain reserved amount of memory
• Not work-conserving; once reserved memory consumed, not reallocated
 Reclamation
• Ballooning, transparent page sharing, compression, ESX swapping
 Ballooning/swapping metrics
• Customers often monitor for non-zero values
• No over-commitment can lead to high memory cost

71
Memory Management Case Studies
 Case: Undesirable VM migration for memory imbalance
• Explanation: DRS managing active memory, Customer wanted DRS to
manage consumed
• Fix: Use IdleTax option to include more idle memory in active
• In vSphere 5.5, added new option PercentIdleMBInMemDemand (default 25%) which
can be set to 100% to manage to consumed

72
Memory Management Case Studies
 Case: DPM overconsolidated memory
• Explanation: DPM consolidating on active memory, Customer wanted DPM to
use consumed
• Fix: Added new option PercentIdleMBInMemDemand (also can be used
instead of IdleTax)

73
Big Picture
vCenter
DRS SDRSDPM
vCloud Director
ESX ESX

VMworld 2013: Performance and Capacity Management of DRS Clusters

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à VMworld 2013: Performance and Capacity Management of DRS Clusters

Similaire à VMworld 2013: Performance and Capacity Management of DRS Clusters (20)

Plus de VMworld

Plus de VMworld (20)

Dernier

Dernier (20)

VMworld 2013: Performance and Capacity Management of DRS Clusters