SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
VMFS Introduction


Bergwolf@linuxfb.org
Agenda

ESX Introduction
VMFS Design Goals
VMFS Architecture
SAN Impact
Conclusion
ESX System Setup
Guest Memory Layers


               Shadow page tables (VA-
               MA).

               Page sharing (BA-MA).
ESX IO Stack

       Average IO requests just
          involves offset remapping.
Agenda

ESX Introduction
VMFS Design Goals
VMFS Architecture
SAN Influence and Impact
Conclusion
Use Case

Small number of files (30~100 per VM)
Files either very small (~a few KBs), or very
large (many GBs)
SAN storage is the underlying substrate.
All storage exported by these storage systems
is shared among all ESX servers
Design Goals

Metadata overhead should be very low
VM IO throughput and latency should be as
good as directly attached raw device
A clustered lock manager for moderating
access to files among ESX servers
Help VM deterministically react to transient
and non-transient SAN events and error
conditions.
Agenda

ESX Introduction
VMFS Design Goals
VMFS Architecture
SAN Influence and Impact
Conclusion
VMFS Architecture
A volume is an aggregation of resources and on-disk
locks.
A resource is either an inode, a file block, a sub-
block or an indirect block.
Each lock moderates access to a subset of resources.
Hosts negotiate access to resource by acquiring
relevant locks.
VMFS = a clustered lock manager + a resource
manager + a journaling module + a data mover + a
VM IO manager + POSIX system call frantend
VMKernel Logical Volume

VMFS are by default created inside VMKernel
 logical volumes. VMKernel logical volumes can
 be spanned across multiple devices.
VMFS on disk Layout
Four Resources

  file blocks
  sub-blocks
  pointer blocks
  file descriptors

Resources are grouped together into collections called
  CLUSTERs and clusters are further grouped together
  into CLUSTER GROUPS.
Block Mapping

 Packed inside inode
 Sub block addressing
 File block addressing
 Pointer block addressing

Can upgrade automatically.
System Files

System files are created at file system format
  time, and each manages one type of
  resources.
System Files

Use file blocks.
Same read/write method as regular files.
Checking file data consistency essentially
provides metadata consistency.
Cluster Groups
Cluster groups are repeated to create a file system.
An existing VMFS volume grows over unused space
on the disk or spans new disks by laying out new
cluster groups that refer to the newly added space.
VMFS resource manager makes hosts operate on
different and distant cluster groups within a system
file. This reduces the possibility of mutiple hosts
contending on the same lock(s) and increases the
efficiency of the clustered lock manager.
On-disk Lock

A single sector data
structure.
Locking is based on lease.
Atomic disk operations (SCSI
reserve-read-modify-write-
SCSI release)
On-disk Lock Data Structure
HostID: This is a 128-bit unique identifier that identifies the ESX host that
owns the lock at a given point in time. All zeros means no owner.
Mode: A set of non-zero values to indicate whether a lock is free, held
exclusively, held by multiple hosts for shared read access, or held by
multiple hosts for shared read and write access.
Generation: A monotonically increasing counter, updates every time a lock
is acquired, released or broken. While the hostID field sufficiently
disambiguates operations on a lock from different hosts, this field
disambiguates multiple operations on a lock by the same host.
HBregion: For each valid hostID (if any) currently using the lock, a pointer
to the on disk heartbeat region of the host.
HBgen: A generation number to validate the HBregion reference as being
current or stale. It disambiguates locks held by a given host before and
after a host crash and before and after a storage outage.
On-disk Heartbeat

A single sector data structure
Every host accessing a VMSF volume acquires
a heartbeat on disk to declare liveness to
other hosts.
Allocated from a 1MB reserved region of the
volume. 2048 concurrent hosts access.
HB Failure Handling

Hosts are free to break locks if heartbeat’s
timestamp does not change for 20 second. Should
replay journal when taking stale lock.
If failing to update heartbeat timestamp in five HB
period (about 15 sec and 40 HB IO tries), host will
fence itself and abort all inflight IOs.
Lock manager tries to rejoin the cluster if IO error is
not permanent, and reclaims HB slot.
On-disk Lock & HB

Each host can join a cluster by acquiring a on-
disk HB.
It can also hold thousands of on-disk locks
Journaling

Each host maintains its own journal on the
volume.
HB region on disk stores journal location.
Transaction State Machine
Optimistic Locking

All hosts in a VMFS cluster generally operate on
mutually exclusive subsets of locks on the volume.
A host that is interested in acquiring a given lock will
typically find it to be free on disk.
In stead of acquiring all locks, host first reads all
locks, if they are free, modify in memory metadata
and then upgrade locks and commit.
Transaction State Machine w/ op lock
Transaction State Machine w/ op lock
            Upgrade Lock
1: reserve disk;
2: issue asynchronous (async) reads of all
required locks;
3: if any lock is acquired by remote host,
abort and fall back to normal TSM;
4: issue async writes of all required locks;
5: wait for all async writes to complete;
6: release disk;
Agenda

ESX Introduction
VMFS Design Goals
VMFS Architecture
SAN Influence and Impact
Conclusion
Adaptive SAN-aware retries

For some SAN errors, instead of letting guest
OS retry IO, VMkernel retries the IO after an
optimal time.
Adaptive SAN-aware retries
Data Mover

clone(srcFileHandle, srcFileOffset,
dstFileHandle, dstFileOffset, length, policies)
Data Mover
Directive SCSI CMD

operator(VMID, source_blocklist,
destination_blocklist)
Zero, clone, delete
Directive SCSI CMD

atomic_test_and_set(block_number, old_image,
new_image)
For VMFS lock manager, new lock algorithm: reads a
lock image from disk, and if the lock is free, issues
an atomic_test_and_set with a new_image
containing host specific hostID, generation and
heartbeat information.
4 IOs -> 2 IOs
Agenda

ESX Introduction
VMFS Design Goals
VMFS Architecture
SAN Influence and Impact
Conclusion
Performance

Contenu connexe

Tendances

PostgreSQL Disaster Recovery with Barman
PostgreSQL Disaster Recovery with BarmanPostgreSQL Disaster Recovery with Barman
PostgreSQL Disaster Recovery with BarmanGabriele Bartolini
 
Network attached storage (nas)
Network attached storage (nas)Network attached storage (nas)
Network attached storage (nas)Vîvék Thørät
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmDataWorks Summit
 
Desktop Support Engineer Interview Questions & Answers
Desktop Support Engineer Interview Questions & Answers Desktop Support Engineer Interview Questions & Answers
Desktop Support Engineer Interview Questions & Answers Vignesh kumar
 
SCA Next Part 1 - Software Defined Radio (SDR) Webcast Slides
SCA Next Part 1 - Software Defined Radio (SDR) Webcast SlidesSCA Next Part 1 - Software Defined Radio (SDR) Webcast Slides
SCA Next Part 1 - Software Defined Radio (SDR) Webcast SlidesADLINK Technology IoT
 
HPE-Synergy-12000-Frame-Setup-and-Installation-Guide.pdf
HPE-Synergy-12000-Frame-Setup-and-Installation-Guide.pdfHPE-Synergy-12000-Frame-Setup-and-Installation-Guide.pdf
HPE-Synergy-12000-Frame-Setup-and-Installation-Guide.pdfHaiHoangViet1
 
Introduction to the Compliance Driven Development (CDD) and Security Centric ...
Introduction to the Compliance Driven Development (CDD) and Security Centric ...Introduction to the Compliance Driven Development (CDD) and Security Centric ...
Introduction to the Compliance Driven Development (CDD) and Security Centric ...VMware Tanzu
 
NAS - Network Attached Storage
NAS - Network Attached StorageNAS - Network Attached Storage
NAS - Network Attached StorageShashank Bhatnagar
 
Introducing Spring Auto REST Docs - Spring IO 2017
Introducing Spring Auto REST Docs - Spring IO 2017Introducing Spring Auto REST Docs - Spring IO 2017
Introducing Spring Auto REST Docs - Spring IO 2017Florian Benz
 
Introducing Xen Server
Introducing Xen ServerIntroducing Xen Server
Introducing Xen ServerStephenRice86
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 
Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017
Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017
Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017Amazon Web Services
 
zIIP Capacity Planning
zIIP Capacity PlanningzIIP Capacity Planning
zIIP Capacity PlanningMartin Packer
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multiKohei KaiGai
 
Presentation oracle on power power advantages and license optimization
Presentation   oracle on power power advantages and license optimizationPresentation   oracle on power power advantages and license optimization
Presentation oracle on power power advantages and license optimizationsolarisyougood
 
HPE Data Protector Administrator's Guide
HPE Data Protector Administrator's GuideHPE Data Protector Administrator's Guide
HPE Data Protector Administrator's GuideAndrey Karpov
 

Tendances (20)

PostgreSQL Disaster Recovery with Barman
PostgreSQL Disaster Recovery with BarmanPostgreSQL Disaster Recovery with Barman
PostgreSQL Disaster Recovery with Barman
 
Network attached storage (nas)
Network attached storage (nas)Network attached storage (nas)
Network attached storage (nas)
 
Storage area network
Storage area networkStorage area network
Storage area network
 
FreeBSD
FreeBSDFreeBSD
FreeBSD
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
 
Network attached storage using raspberry pi
Network attached storage using raspberry piNetwork attached storage using raspberry pi
Network attached storage using raspberry pi
 
Desktop Support Engineer Interview Questions & Answers
Desktop Support Engineer Interview Questions & Answers Desktop Support Engineer Interview Questions & Answers
Desktop Support Engineer Interview Questions & Answers
 
SCA Next Part 1 - Software Defined Radio (SDR) Webcast Slides
SCA Next Part 1 - Software Defined Radio (SDR) Webcast SlidesSCA Next Part 1 - Software Defined Radio (SDR) Webcast Slides
SCA Next Part 1 - Software Defined Radio (SDR) Webcast Slides
 
HPE-Synergy-12000-Frame-Setup-and-Installation-Guide.pdf
HPE-Synergy-12000-Frame-Setup-and-Installation-Guide.pdfHPE-Synergy-12000-Frame-Setup-and-Installation-Guide.pdf
HPE-Synergy-12000-Frame-Setup-and-Installation-Guide.pdf
 
Introduction to the Compliance Driven Development (CDD) and Security Centric ...
Introduction to the Compliance Driven Development (CDD) and Security Centric ...Introduction to the Compliance Driven Development (CDD) and Security Centric ...
Introduction to the Compliance Driven Development (CDD) and Security Centric ...
 
NAS - Network Attached Storage
NAS - Network Attached StorageNAS - Network Attached Storage
NAS - Network Attached Storage
 
Introducing Spring Auto REST Docs - Spring IO 2017
Introducing Spring Auto REST Docs - Spring IO 2017Introducing Spring Auto REST Docs - Spring IO 2017
Introducing Spring Auto REST Docs - Spring IO 2017
 
Introducing Xen Server
Introducing Xen ServerIntroducing Xen Server
Introducing Xen Server
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
redis basics
redis basicsredis basics
redis basics
 
Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017
Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017
Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017
 
zIIP Capacity Planning
zIIP Capacity PlanningzIIP Capacity Planning
zIIP Capacity Planning
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi
 
Presentation oracle on power power advantages and license optimization
Presentation   oracle on power power advantages and license optimizationPresentation   oracle on power power advantages and license optimization
Presentation oracle on power power advantages and license optimization
 
HPE Data Protector Administrator's Guide
HPE Data Protector Administrator's GuideHPE Data Protector Administrator's Guide
HPE Data Protector Administrator's Guide
 

En vedette

Google Megastore
Google MegastoreGoogle Megastore
Google Megastorebergwolf
 
How to use any static site generator with GitLab Pages.
How to use any static site generator with GitLab Pages. How to use any static site generator with GitLab Pages.
How to use any static site generator with GitLab Pages. Ivan Nemytchenko
 

En vedette (6)

RCU
RCURCU
RCU
 
Google Megastore
Google MegastoreGoogle Megastore
Google Megastore
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 
Markdown Slides [EN]
Markdown Slides [EN]Markdown Slides [EN]
Markdown Slides [EN]
 
How to use any static site generator with GitLab Pages.
How to use any static site generator with GitLab Pages. How to use any static site generator with GitLab Pages.
How to use any static site generator with GitLab Pages.
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 

Similaire à vmfs intro

Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under ContainersLearning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containersinside-BigData.com
 
Esxi troubleshooting
Esxi troubleshootingEsxi troubleshooting
Esxi troubleshootingOvi Chis
 
VMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld Europe 2014: Virtual SAN Architecture Deep DiveVMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld Europe 2014: Virtual SAN Architecture Deep DiveVMworld
 
Network Storage dan Filesystem.pdf
Network Storage dan Filesystem.pdfNetwork Storage dan Filesystem.pdf
Network Storage dan Filesystem.pdfTaseigerKu
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)Sri Prasanna
 
Iocg Whats New In V Sphere
Iocg Whats New In V SphereIocg Whats New In V Sphere
Iocg Whats New In V SphereAnne Achleman
 
VMware vSphere Storage Enhancements
VMware vSphere Storage EnhancementsVMware vSphere Storage Enhancements
VMware vSphere Storage EnhancementsAnne Achleman
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshootingglbsolutions
 
Xen server storage Overview
Xen server storage OverviewXen server storage Overview
Xen server storage OverviewNuno Alves
 

Similaire à vmfs intro (20)

Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under ContainersLearning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containers
 
Esxi troubleshooting
Esxi troubleshootingEsxi troubleshooting
Esxi troubleshooting
 
Posscon2013
Posscon2013Posscon2013
Posscon2013
 
VMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld Europe 2014: Virtual SAN Architecture Deep DiveVMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld Europe 2014: Virtual SAN Architecture Deep Dive
 
Storage
StorageStorage
Storage
 
Virtualization
VirtualizationVirtualization
Virtualization
 
Network Storage dan Filesystem.pdf
Network Storage dan Filesystem.pdfNetwork Storage dan Filesystem.pdf
Network Storage dan Filesystem.pdf
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)
 
Iocg Whats New In V Sphere
Iocg Whats New In V SphereIocg Whats New In V Sphere
Iocg Whats New In V Sphere
 
VMware vSphere Storage Enhancements
VMware vSphere Storage EnhancementsVMware vSphere Storage Enhancements
VMware vSphere Storage Enhancements
 
Installation Guide
Installation GuideInstallation Guide
Installation Guide
 
3487570
34875703487570
3487570
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshooting
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Xen server storage Overview
Xen server storage OverviewXen server storage Overview
Xen server storage Overview
 
Tlf2014
Tlf2014Tlf2014
Tlf2014
 

Plus de bergwolf

NFS updates for CLSF
NFS updates for CLSFNFS updates for CLSF
NFS updates for CLSFbergwolf
 
pnfs status
pnfs statuspnfs status
pnfs statusbergwolf
 
linux trim
linux trimlinux trim
linux trimbergwolf
 
network filesystem briefs
network filesystem briefsnetwork filesystem briefs
network filesystem briefsbergwolf
 
gsoc and grub4ext4
gsoc and grub4ext4gsoc and grub4ext4
gsoc and grub4ext4bergwolf
 
grub4ext4 status-plans
grub4ext4 status-plansgrub4ext4 status-plans
grub4ext4 status-plansbergwolf
 

Plus de bergwolf (8)

NFS updates for CLSF
NFS updates for CLSFNFS updates for CLSF
NFS updates for CLSF
 
Linux aio
Linux aioLinux aio
Linux aio
 
pnfs status
pnfs statuspnfs status
pnfs status
 
linux trim
linux trimlinux trim
linux trim
 
network filesystem briefs
network filesystem briefsnetwork filesystem briefs
network filesystem briefs
 
logfs
logfslogfs
logfs
 
gsoc and grub4ext4
gsoc and grub4ext4gsoc and grub4ext4
gsoc and grub4ext4
 
grub4ext4 status-plans
grub4ext4 status-plansgrub4ext4 status-plans
grub4ext4 status-plans
 

Dernier

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Dernier (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

vmfs intro

  • 2. Agenda ESX Introduction VMFS Design Goals VMFS Architecture SAN Impact Conclusion
  • 4. Guest Memory Layers Shadow page tables (VA- MA). Page sharing (BA-MA).
  • 5. ESX IO Stack Average IO requests just involves offset remapping.
  • 6. Agenda ESX Introduction VMFS Design Goals VMFS Architecture SAN Influence and Impact Conclusion
  • 7. Use Case Small number of files (30~100 per VM) Files either very small (~a few KBs), or very large (many GBs) SAN storage is the underlying substrate. All storage exported by these storage systems is shared among all ESX servers
  • 8. Design Goals Metadata overhead should be very low VM IO throughput and latency should be as good as directly attached raw device A clustered lock manager for moderating access to files among ESX servers Help VM deterministically react to transient and non-transient SAN events and error conditions.
  • 9. Agenda ESX Introduction VMFS Design Goals VMFS Architecture SAN Influence and Impact Conclusion
  • 10. VMFS Architecture A volume is an aggregation of resources and on-disk locks. A resource is either an inode, a file block, a sub- block or an indirect block. Each lock moderates access to a subset of resources. Hosts negotiate access to resource by acquiring relevant locks. VMFS = a clustered lock manager + a resource manager + a journaling module + a data mover + a VM IO manager + POSIX system call frantend
  • 11. VMKernel Logical Volume VMFS are by default created inside VMKernel logical volumes. VMKernel logical volumes can be spanned across multiple devices.
  • 12. VMFS on disk Layout
  • 13. Four Resources file blocks sub-blocks pointer blocks file descriptors Resources are grouped together into collections called CLUSTERs and clusters are further grouped together into CLUSTER GROUPS.
  • 14. Block Mapping Packed inside inode Sub block addressing File block addressing Pointer block addressing Can upgrade automatically.
  • 15. System Files System files are created at file system format time, and each manages one type of resources.
  • 16. System Files Use file blocks. Same read/write method as regular files. Checking file data consistency essentially provides metadata consistency.
  • 17. Cluster Groups Cluster groups are repeated to create a file system. An existing VMFS volume grows over unused space on the disk or spans new disks by laying out new cluster groups that refer to the newly added space. VMFS resource manager makes hosts operate on different and distant cluster groups within a system file. This reduces the possibility of mutiple hosts contending on the same lock(s) and increases the efficiency of the clustered lock manager.
  • 18. On-disk Lock A single sector data structure. Locking is based on lease. Atomic disk operations (SCSI reserve-read-modify-write- SCSI release)
  • 19. On-disk Lock Data Structure HostID: This is a 128-bit unique identifier that identifies the ESX host that owns the lock at a given point in time. All zeros means no owner. Mode: A set of non-zero values to indicate whether a lock is free, held exclusively, held by multiple hosts for shared read access, or held by multiple hosts for shared read and write access. Generation: A monotonically increasing counter, updates every time a lock is acquired, released or broken. While the hostID field sufficiently disambiguates operations on a lock from different hosts, this field disambiguates multiple operations on a lock by the same host. HBregion: For each valid hostID (if any) currently using the lock, a pointer to the on disk heartbeat region of the host. HBgen: A generation number to validate the HBregion reference as being current or stale. It disambiguates locks held by a given host before and after a host crash and before and after a storage outage.
  • 20. On-disk Heartbeat A single sector data structure Every host accessing a VMSF volume acquires a heartbeat on disk to declare liveness to other hosts. Allocated from a 1MB reserved region of the volume. 2048 concurrent hosts access.
  • 21. HB Failure Handling Hosts are free to break locks if heartbeat’s timestamp does not change for 20 second. Should replay journal when taking stale lock. If failing to update heartbeat timestamp in five HB period (about 15 sec and 40 HB IO tries), host will fence itself and abort all inflight IOs. Lock manager tries to rejoin the cluster if IO error is not permanent, and reclaims HB slot.
  • 22. On-disk Lock & HB Each host can join a cluster by acquiring a on- disk HB. It can also hold thousands of on-disk locks
  • 23. Journaling Each host maintains its own journal on the volume. HB region on disk stores journal location.
  • 25. Optimistic Locking All hosts in a VMFS cluster generally operate on mutually exclusive subsets of locks on the volume. A host that is interested in acquiring a given lock will typically find it to be free on disk. In stead of acquiring all locks, host first reads all locks, if they are free, modify in memory metadata and then upgrade locks and commit.
  • 27. Transaction State Machine w/ op lock Upgrade Lock 1: reserve disk; 2: issue asynchronous (async) reads of all required locks; 3: if any lock is acquired by remote host, abort and fall back to normal TSM; 4: issue async writes of all required locks; 5: wait for all async writes to complete; 6: release disk;
  • 28. Agenda ESX Introduction VMFS Design Goals VMFS Architecture SAN Influence and Impact Conclusion
  • 29. Adaptive SAN-aware retries For some SAN errors, instead of letting guest OS retry IO, VMkernel retries the IO after an optimal time.
  • 33. Directive SCSI CMD operator(VMID, source_blocklist, destination_blocklist) Zero, clone, delete
  • 34. Directive SCSI CMD atomic_test_and_set(block_number, old_image, new_image) For VMFS lock manager, new lock algorithm: reads a lock image from disk, and if the lock is free, issues an atomic_test_and_set with a new_image containing host specific hostID, generation and heartbeat information. 4 IOs -> 2 IOs
  • 35. Agenda ESX Introduction VMFS Design Goals VMFS Architecture SAN Influence and Impact Conclusion