This is a presentation on storage-related changes in VMware vSphere 4.1. I gave this presentation at the Triad VMUG meeting in Greensboro, NC on January 28, 2011.
1. Storage Changes in VMware vSphere 4.1 A quick review of changes and new features Scott Lowe, VCDX 39 vSpecialist, EMC Corporation Author, Mastering VMware vSphere 4 Blogger, http://blog.scottlowe.org
2. Storage Feature Summary 8 Gbps FC HBA and expanded support for FCoE CNAs New storage performance statistics Storage I/O Control vStorage APIs for Array Integration iSCSI enhancements
3. New Storage Performance Statistics More comprehensive performance statistics available both in vCenter and esxtop/resxtop Historical/real-time via GUI in vCEnter Real-time via esxtop/resxtop for drill-down/scripting Latency and throughput statistics available for: Datastore per host Storage adapter and path per host Datastore per VM VMDK per VM Greater feature parity across all storage protocols
5. The I/O Sharing Problem What You See Now What You Should See MicrosoftExchange MicrosoftExchange online store online store data mining data mining datastore datastore
6. Solution: Storage I/O Control (SIOC) CPU shares: Low Memory shares: Low CPU shares: High Memory shares: High CPU shares: High Memory shares: High I/O shares: High I/O shares: Low I/O shares: High 32GHz 16GB MicrosoftExchange online store data mining Datastore A
10. Allocate I/O Resources Shares translate into ESX I/O queue slots VMs with more shares are allowed to send more I/O’s at a time Slot assignment is dynamic, based on VM shares and current load Total # of slots available is dynamic, based on level of congestion data mining MicrosoftExchange online store I/O’s in flight STORAGE ARRAY
11. Congestion Triggers SIOC Congestion signal: ESX-array response time > threshold Default threshold: 30ms Should set different defaults for SSD and SATA Changing default threshold (not usually recommended) Low latency goal: set lower if latency is critical for some VMs High throughput goal: set close to IOPS maximization point
13. Storage Integration Points VI Client VM ESX Storage Stack VMFS NFS Snap request SvMotion request VM provisioning cmd Turn thin prov on/off VSS via VMware Tools NFS client VMware LVM Co-op vStorage API for Data Protection (VDDK) Datamover Co-op vCenter SRM Vendor-specific Plug-In e.g. EMC Virtual Storage Integrator Network Stack vStorageAPI for Multi- pathing NMP View VMware-to-Storage relationships Provision datastores more easily Leverage array features (compress/dedupe, file/filesystem/LUN snapshots HBA Drivers NIC Drivers FC/FCoE & iSCSI NFS Array APIs/Mgmt Vendor Specific vStorage API for SRM VM-Aware Unisphere VAAI SCSI cmds Storage Array Vendor-specific VAAI SCSI command support Future NFS offloads
14. Current VAAI Primitives Hardware-Accelerated Locking = 10-100x better metadata scaling Replace LUN locking with extent-based locks for better granularity Reduce number of “lock” operations required by using one efficient SCSI command to perform pre-lock, lock, and post-lock operations Increases locking efficiency by an order of magnitude Hardware-Accelerated Zero = 2-10x lower number of IO operations Eliminate redundant and repetitive host-based write commands with optimized internal array commands Hardware-Accelerated Copy = 2-10x better data movement Leverage native array Copy capability to move blocks
15. Hardware-Accelerated Locking Without API Reserves the complete LUN so that it could obtain a lock Required several SCSI commands LUN level locks affect adjacent hosts With API Locks occur at a block level One efficient SCSI command - SCSI Compare and Swap (CAS) Block level locks have no effect on adjacent hosts. Use Cases Bigger clusters with more VMs View, Lab Manager, Project Redwood More & Faster VM Snapshotting
17. Hardware-Accelerated Zero Without API SCSI Write - Many identical small blocks of zeroes moved from host to array for MANY VMware IO operations Extra zeros can be removed by EMC arrays after the fact by “zero reclaim” New Guest IO to VMDK is “pre-zeroed” With API SCSI Write Same - One giant block of zeroes moved from host to array and repeatedly written Thin provisioned array skips zero completely (pre “zero reclaim”) Use Cases Reduced IO when writing to new blocks in the VMDK for any VM Time to create VMs (particularly FT-enabled VMs SCSI WRITE (0000…..) SCSI WRITE (data) SCSI WRITE (0000….) SCSI WRITE (data) Repeat MANY times… SCSI WRITE SAME (0 * times) SCSI WRITE (data) VMDK
19. Hardware-Accelerated Copy “let’s Storage VMotion” “let’s Storage VMotion” Without API SCSI Read (Data moved from array to host) SCSI Write (Data moved from host to array) Repeat Huge periods of large VMFS level IO, done via millions of small block operations With API SCSI Extended Copy (Data moved within array) Repeat Order of magnitude reduction in IO operations Order of magnitude reduction in array IOps Use Cases Storage VMotion VM Creation from Template SCSI WRITE SCSI WRITE SCSI WRITE ..MANY times… SCSI READ SCSI READ SCSI READ ..MANY times… SCSI EXTENDED COPY “Give me a VM clone/deploy from template”
22. iSCSI Enhancements in vSphere 4.1 Boot from software iSCSI (iBFT) iSCSI offloading iSCSI session management Additions have also been made to the CLI
23. Steps in Booting from Software iSCSI Ensure that the NIC you wish to use supports iSCSI Boot. Ensure that the NIC has a supported version of firmware. Ensure that iSCSI Boot option is selected in the host BIOS for the NIC. Configure the iSCSI Boot parameters in the NIC BIOS. Configure the iSCSI target to allow initiator (NIC) access. Configure a LUN on the iSCSI target and present to the initiator.
The problem Storage I/O control is addressing is the situation where some less important workloads are taking the majority of I/O bandwidth from more important applications. In the case of the three applications shown here, the data mining is hogging a majority of the storage I/O resource. And the two more important to the business operations are getting less performance than needed. <Click> what one wants to see is a distribution of I/O that is aligned with the importance of each virtual machine. Where the most important business critical applications are getting the I/O bandwidth needed for them to be responsive and the less critical data mining application is taking less I/O bandwidth.
I/O shares can be set at the Virtual Machine level and although this capability has been there for a few previous releases, it was not enforced on a VMware cluster wide level until release 4.1. Prior to 4.1 the I/O shares and limits were enforced for a VM with more than one virtual disk or a number of VMs on a single ESX server. <click> But with 4.1 these I/O shares are now used to distribute I/O bandwidth across all the ESX servers which have access to that shared datastore.
The ability to set shares for I/O is done via edit properties on the virtual machine. This screen shows two virtual disks and the ability to set priority and limits on the I/Os per second.
Once the shares are set on the virtual machines in a vmware cluster, one needs to also enable the “Storage I/O Control” option on the properties screen for that datastores on which you want to have Storage I/O control working. The other thing that is needed for Storage I/O to kick in is that congestion measured in the form of latency must exist for a period of time on the datastore before the I/O control kicks in. The example which comes to mind is a car pool lane is not typically enforced when there is not a lot of traffic on the highway. It would be of limited value if you could travel at the same speed in the non car pool lane as well as the car pool lane. In much the same way, Storage I/O control will not be put into action when there is latency below a sustained value of 30 msec.
One can then observe which VMs have what shares and limits set via the virtual machine tab for the datastore. As datastores are now objects managed by vCenter, there are several new views in vSphere that enable you see which ESX Servers are connected to a datastore and which VMs are sharing that datastore. Many of these views also allow one to customize which columns are displayed and create specific views to report on usage.
The way in which these I/O shares are used to effect performance is that queue depth for each ESX server can be assigned and throttled to align the specific shares assigned for each VM running on the collective pool of storage. In the case of our 3 VMs displayed earlier, we have the data mining vm getting the least number of queues assigned while the other two VMs are getting many more queuing slots enabled for their I/O.
It is important to understand that SIOC does not kick in until the congestion of the datastore gets above the threshold of 35 ms for a period of time. It is a weighted average that is used to determine this latency is not just a minor spike that comes and goes quickly. This threshold value can be modified but should be done only with great care and consideration as if its too low, it might be on and off again a lot or if too high might not kick in at all.