SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
VMFS Introduction


Bergwolf@linuxfb.org
Agenda

ESX Introduction
VMFS Design Goals
VMFS Architecture
SAN Impact
Conclusion
ESX System Setup
Guest Memory Layers


               Shadow page tables (VA-
               MA).

               Page sharing (BA-MA).
ESX IO Stack

       Average IO requests just
          involves offset remapping.
Agenda

ESX Introduction
VMFS Design Goals
VMFS Architecture
SAN Influence and Impact
Conclusion
Use Case

Small number of files (30~100 per VM)
Files either very small (~a few KBs), or very
large (many GBs)
SAN storage is the underlying substrate.
All storage exported by these storage systems
is shared among all ESX servers
Design Goals

Metadata overhead should be very low
VM IO throughput and latency should be as
good as directly attached raw device
A clustered lock manager for moderating
access to files among ESX servers
Help VM deterministically react to transient
and non-transient SAN events and error
conditions.
Agenda

ESX Introduction
VMFS Design Goals
VMFS Architecture
SAN Influence and Impact
Conclusion
VMFS Architecture
A volume is an aggregation of resources and on-disk
locks.
A resource is either an inode, a file block, a sub-
block or an indirect block.
Each lock moderates access to a subset of resources.
Hosts negotiate access to resource by acquiring
relevant locks.
VMFS = a clustered lock manager + a resource
manager + a journaling module + a data mover + a
VM IO manager + POSIX system call frantend
VMKernel Logical Volume

VMFS are by default created inside VMKernel
 logical volumes. VMKernel logical volumes can
 be spanned across multiple devices.
VMFS on disk Layout
Four Resources

  file blocks
  sub-blocks
  pointer blocks
  file descriptors

Resources are grouped together into collections called
  CLUSTERs and clusters are further grouped together
  into CLUSTER GROUPS.
Block Mapping

 Packed inside inode
 Sub block addressing
 File block addressing
 Pointer block addressing

Can upgrade automatically.
System Files

System files are created at file system format
  time, and each manages one type of
  resources.
System Files

Use file blocks.
Same read/write method as regular files.
Checking file data consistency essentially
provides metadata consistency.
Cluster Groups
Cluster groups are repeated to create a file system.
An existing VMFS volume grows over unused space
on the disk or spans new disks by laying out new
cluster groups that refer to the newly added space.
VMFS resource manager makes hosts operate on
different and distant cluster groups within a system
file. This reduces the possibility of mutiple hosts
contending on the same lock(s) and increases the
efficiency of the clustered lock manager.
On-disk Lock

A single sector data
structure.
Locking is based on lease.
Atomic disk operations (SCSI
reserve-read-modify-write-
SCSI release)
On-disk Lock Data Structure
HostID: This is a 128-bit unique identifier that identifies the ESX host that
owns the lock at a given point in time. All zeros means no owner.
Mode: A set of non-zero values to indicate whether a lock is free, held
exclusively, held by multiple hosts for shared read access, or held by
multiple hosts for shared read and write access.
Generation: A monotonically increasing counter, updates every time a lock
is acquired, released or broken. While the hostID field sufficiently
disambiguates operations on a lock from different hosts, this field
disambiguates multiple operations on a lock by the same host.
HBregion: For each valid hostID (if any) currently using the lock, a pointer
to the on disk heartbeat region of the host.
HBgen: A generation number to validate the HBregion reference as being
current or stale. It disambiguates locks held by a given host before and
after a host crash and before and after a storage outage.
On-disk Heartbeat

A single sector data structure
Every host accessing a VMSF volume acquires
a heartbeat on disk to declare liveness to
other hosts.
Allocated from a 1MB reserved region of the
volume. 2048 concurrent hosts access.
HB Failure Handling

Hosts are free to break locks if heartbeat’s
timestamp does not change for 20 second. Should
replay journal when taking stale lock.
If failing to update heartbeat timestamp in five HB
period (about 15 sec and 40 HB IO tries), host will
fence itself and abort all inflight IOs.
Lock manager tries to rejoin the cluster if IO error is
not permanent, and reclaims HB slot.
On-disk Lock & HB

Each host can join a cluster by acquiring a on-
disk HB.
It can also hold thousands of on-disk locks
Journaling

Each host maintains its own journal on the
volume.
HB region on disk stores journal location.
Transaction State Machine
Optimistic Locking

All hosts in a VMFS cluster generally operate on
mutually exclusive subsets of locks on the volume.
A host that is interested in acquiring a given lock will
typically find it to be free on disk.
In stead of acquiring all locks, host first reads all
locks, if they are free, modify in memory metadata
and then upgrade locks and commit.
Transaction State Machine w/ op lock
Transaction State Machine w/ op lock
            Upgrade Lock
1: reserve disk;
2: issue asynchronous (async) reads of all
required locks;
3: if any lock is acquired by remote host,
abort and fall back to normal TSM;
4: issue async writes of all required locks;
5: wait for all async writes to complete;
6: release disk;
Agenda

ESX Introduction
VMFS Design Goals
VMFS Architecture
SAN Influence and Impact
Conclusion
Adaptive SAN-aware retries

For some SAN errors, instead of letting guest
OS retry IO, VMkernel retries the IO after an
optimal time.
Adaptive SAN-aware retries
Data Mover

clone(srcFileHandle, srcFileOffset,
dstFileHandle, dstFileOffset, length, policies)
Data Mover
Directive SCSI CMD

operator(VMID, source_blocklist,
destination_blocklist)
Zero, clone, delete
Directive SCSI CMD

atomic_test_and_set(block_number, old_image,
new_image)
For VMFS lock manager, new lock algorithm: reads a
lock image from disk, and if the lock is free, issues
an atomic_test_and_set with a new_image
containing host specific hostID, generation and
heartbeat information.
4 IOs -> 2 IOs
Agenda

ESX Introduction
VMFS Design Goals
VMFS Architecture
SAN Influence and Impact
Conclusion
Performance

Contenu connexe

Tendances

Keywords, identifiers ,datatypes in C++
Keywords, identifiers ,datatypes in C++Keywords, identifiers ,datatypes in C++
Keywords, identifiers ,datatypes in C++Ankur Pandey
 
C++11 concurrency
C++11 concurrencyC++11 concurrency
C++11 concurrencyxu liwei
 
Hunting for Privilege Escalation in Windows Environment
Hunting for Privilege Escalation in Windows EnvironmentHunting for Privilege Escalation in Windows Environment
Hunting for Privilege Escalation in Windows EnvironmentTeymur Kheirkhabarov
 
Tizen 3.0's Window System Integration Layer of OpenGLES/EGL & Vulkan Driver
Tizen 3.0's Window System Integration Layer of OpenGLES/EGL & Vulkan DriverTizen 3.0's Window System Integration Layer of OpenGLES/EGL & Vulkan Driver
Tizen 3.0's Window System Integration Layer of OpenGLES/EGL & Vulkan DriverRyo Jin
 
The SAM Pattern: State Machines and Computation
The SAM Pattern: State Machines and ComputationThe SAM Pattern: State Machines and Computation
The SAM Pattern: State Machines and ComputationJean-Jacques Dubray
 
Domain Modeling with FP (DDD Europe 2020)
Domain Modeling with FP (DDD Europe 2020)Domain Modeling with FP (DDD Europe 2020)
Domain Modeling with FP (DDD Europe 2020)Scott Wlaschin
 
Android graphic system (SurfaceFlinger) : Design Pattern's perspective
Android graphic system (SurfaceFlinger) : Design Pattern's perspectiveAndroid graphic system (SurfaceFlinger) : Design Pattern's perspective
Android graphic system (SurfaceFlinger) : Design Pattern's perspectiveBin Chen
 
AngularJS Internal
AngularJS InternalAngularJS Internal
AngularJS InternalEyal Vardi
 
Introduction to Swift programming language.
Introduction to Swift programming language.Introduction to Swift programming language.
Introduction to Swift programming language.Icalia Labs
 
Unreal Engine Basics 03 - Gameplay
Unreal Engine Basics 03 - GameplayUnreal Engine Basics 03 - Gameplay
Unreal Engine Basics 03 - GameplayNick Pruehs
 
Cyber Threat Hunting - Hunting in Memory at Scale
Cyber Threat Hunting - Hunting in Memory at ScaleCyber Threat Hunting - Hunting in Memory at Scale
Cyber Threat Hunting - Hunting in Memory at ScaleInfocyte
 
ViewModel テスト難しすぎ問題 by saiki iijima in Android Test Night #9
ViewModel テスト難しすぎ問題 by saiki iijima in Android Test Night #9ViewModel テスト難しすぎ問題 by saiki iijima in Android Test Night #9
ViewModel テスト難しすぎ問題 by saiki iijima in Android Test Night #9Saiki Iijima
 
Remote Graphical Rendering
Remote Graphical RenderingRemote Graphical Rendering
Remote Graphical RenderingJoel Isaacson
 
C programming Lab 1
C programming Lab 1C programming Lab 1
C programming Lab 1Zaibi Gondal
 
c++ project on restaurant billing
c++ project on restaurant billing c++ project on restaurant billing
c++ project on restaurant billing Swakriti Rathore
 

Tendances (20)

Keywords, identifiers ,datatypes in C++
Keywords, identifiers ,datatypes in C++Keywords, identifiers ,datatypes in C++
Keywords, identifiers ,datatypes in C++
 
C++ file
C++ fileC++ file
C++ file
 
C++11 concurrency
C++11 concurrencyC++11 concurrency
C++11 concurrency
 
IMPLEMENTATION OF AUTO KEY IN C++
IMPLEMENTATION OF AUTO KEY IN C++IMPLEMENTATION OF AUTO KEY IN C++
IMPLEMENTATION OF AUTO KEY IN C++
 
Hunting for Privilege Escalation in Windows Environment
Hunting for Privilege Escalation in Windows EnvironmentHunting for Privilege Escalation in Windows Environment
Hunting for Privilege Escalation in Windows Environment
 
Tizen 3.0's Window System Integration Layer of OpenGLES/EGL & Vulkan Driver
Tizen 3.0's Window System Integration Layer of OpenGLES/EGL & Vulkan DriverTizen 3.0's Window System Integration Layer of OpenGLES/EGL & Vulkan Driver
Tizen 3.0's Window System Integration Layer of OpenGLES/EGL & Vulkan Driver
 
Spectre & Meltdown
Spectre & MeltdownSpectre & Meltdown
Spectre & Meltdown
 
The SAM Pattern: State Machines and Computation
The SAM Pattern: State Machines and ComputationThe SAM Pattern: State Machines and Computation
The SAM Pattern: State Machines and Computation
 
Domain Modeling with FP (DDD Europe 2020)
Domain Modeling with FP (DDD Europe 2020)Domain Modeling with FP (DDD Europe 2020)
Domain Modeling with FP (DDD Europe 2020)
 
Android graphic system (SurfaceFlinger) : Design Pattern's perspective
Android graphic system (SurfaceFlinger) : Design Pattern's perspectiveAndroid graphic system (SurfaceFlinger) : Design Pattern's perspective
Android graphic system (SurfaceFlinger) : Design Pattern's perspective
 
AngularJS Internal
AngularJS InternalAngularJS Internal
AngularJS Internal
 
Introduction to Swift programming language.
Introduction to Swift programming language.Introduction to Swift programming language.
Introduction to Swift programming language.
 
Unreal Engine Basics 03 - Gameplay
Unreal Engine Basics 03 - GameplayUnreal Engine Basics 03 - Gameplay
Unreal Engine Basics 03 - Gameplay
 
Cyber Threat Hunting - Hunting in Memory at Scale
Cyber Threat Hunting - Hunting in Memory at ScaleCyber Threat Hunting - Hunting in Memory at Scale
Cyber Threat Hunting - Hunting in Memory at Scale
 
Files in c++
Files in c++Files in c++
Files in c++
 
ViewModel テスト難しすぎ問題 by saiki iijima in Android Test Night #9
ViewModel テスト難しすぎ問題 by saiki iijima in Android Test Night #9ViewModel テスト難しすぎ問題 by saiki iijima in Android Test Night #9
ViewModel テスト難しすぎ問題 by saiki iijima in Android Test Night #9
 
Remote Graphical Rendering
Remote Graphical RenderingRemote Graphical Rendering
Remote Graphical Rendering
 
C programming Lab 1
C programming Lab 1C programming Lab 1
C programming Lab 1
 
c++ project on restaurant billing
c++ project on restaurant billing c++ project on restaurant billing
c++ project on restaurant billing
 
Strings in python
Strings in pythonStrings in python
Strings in python
 

En vedette

Google Megastore
Google MegastoreGoogle Megastore
Google Megastorebergwolf
 
How to use any static site generator with GitLab Pages.
How to use any static site generator with GitLab Pages. How to use any static site generator with GitLab Pages.
How to use any static site generator with GitLab Pages. Ivan Nemytchenko
 

En vedette (6)

RCU
RCURCU
RCU
 
Google Megastore
Google MegastoreGoogle Megastore
Google Megastore
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 
Markdown Slides [EN]
Markdown Slides [EN]Markdown Slides [EN]
Markdown Slides [EN]
 
How to use any static site generator with GitLab Pages.
How to use any static site generator with GitLab Pages. How to use any static site generator with GitLab Pages.
How to use any static site generator with GitLab Pages.
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 

Similaire à vmfs intro

Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under ContainersLearning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containersinside-BigData.com
 
Esxi troubleshooting
Esxi troubleshootingEsxi troubleshooting
Esxi troubleshootingOvi Chis
 
VMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld Europe 2014: Virtual SAN Architecture Deep DiveVMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld Europe 2014: Virtual SAN Architecture Deep DiveVMworld
 
Network Storage dan Filesystem.pdf
Network Storage dan Filesystem.pdfNetwork Storage dan Filesystem.pdf
Network Storage dan Filesystem.pdfTaseigerKu
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)Sri Prasanna
 
Iocg Whats New In V Sphere
Iocg Whats New In V SphereIocg Whats New In V Sphere
Iocg Whats New In V SphereAnne Achleman
 
VMware vSphere Storage Enhancements
VMware vSphere Storage EnhancementsVMware vSphere Storage Enhancements
VMware vSphere Storage EnhancementsAnne Achleman
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshootingglbsolutions
 
Xen server storage Overview
Xen server storage OverviewXen server storage Overview
Xen server storage OverviewNuno Alves
 

Similaire à vmfs intro (20)

Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under ContainersLearning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containers
 
Esxi troubleshooting
Esxi troubleshootingEsxi troubleshooting
Esxi troubleshooting
 
Posscon2013
Posscon2013Posscon2013
Posscon2013
 
VMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld Europe 2014: Virtual SAN Architecture Deep DiveVMworld Europe 2014: Virtual SAN Architecture Deep Dive
VMworld Europe 2014: Virtual SAN Architecture Deep Dive
 
Storage
StorageStorage
Storage
 
Virtualization
VirtualizationVirtualization
Virtualization
 
Network Storage dan Filesystem.pdf
Network Storage dan Filesystem.pdfNetwork Storage dan Filesystem.pdf
Network Storage dan Filesystem.pdf
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)
 
Iocg Whats New In V Sphere
Iocg Whats New In V SphereIocg Whats New In V Sphere
Iocg Whats New In V Sphere
 
VMware vSphere Storage Enhancements
VMware vSphere Storage EnhancementsVMware vSphere Storage Enhancements
VMware vSphere Storage Enhancements
 
Installation Guide
Installation GuideInstallation Guide
Installation Guide
 
3487570
34875703487570
3487570
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshooting
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Xen server storage Overview
Xen server storage OverviewXen server storage Overview
Xen server storage Overview
 
Tlf2014
Tlf2014Tlf2014
Tlf2014
 

Plus de bergwolf

NFS updates for CLSF
NFS updates for CLSFNFS updates for CLSF
NFS updates for CLSFbergwolf
 
pnfs status
pnfs statuspnfs status
pnfs statusbergwolf
 
linux trim
linux trimlinux trim
linux trimbergwolf
 
network filesystem briefs
network filesystem briefsnetwork filesystem briefs
network filesystem briefsbergwolf
 
gsoc and grub4ext4
gsoc and grub4ext4gsoc and grub4ext4
gsoc and grub4ext4bergwolf
 
grub4ext4 status-plans
grub4ext4 status-plansgrub4ext4 status-plans
grub4ext4 status-plansbergwolf
 

Plus de bergwolf (8)

NFS updates for CLSF
NFS updates for CLSFNFS updates for CLSF
NFS updates for CLSF
 
Linux aio
Linux aioLinux aio
Linux aio
 
pnfs status
pnfs statuspnfs status
pnfs status
 
linux trim
linux trimlinux trim
linux trim
 
network filesystem briefs
network filesystem briefsnetwork filesystem briefs
network filesystem briefs
 
logfs
logfslogfs
logfs
 
gsoc and grub4ext4
gsoc and grub4ext4gsoc and grub4ext4
gsoc and grub4ext4
 
grub4ext4 status-plans
grub4ext4 status-plansgrub4ext4 status-plans
grub4ext4 status-plans
 

Dernier

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Dernier (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

vmfs intro

  • 2. Agenda ESX Introduction VMFS Design Goals VMFS Architecture SAN Impact Conclusion
  • 4. Guest Memory Layers Shadow page tables (VA- MA). Page sharing (BA-MA).
  • 5. ESX IO Stack Average IO requests just involves offset remapping.
  • 6. Agenda ESX Introduction VMFS Design Goals VMFS Architecture SAN Influence and Impact Conclusion
  • 7. Use Case Small number of files (30~100 per VM) Files either very small (~a few KBs), or very large (many GBs) SAN storage is the underlying substrate. All storage exported by these storage systems is shared among all ESX servers
  • 8. Design Goals Metadata overhead should be very low VM IO throughput and latency should be as good as directly attached raw device A clustered lock manager for moderating access to files among ESX servers Help VM deterministically react to transient and non-transient SAN events and error conditions.
  • 9. Agenda ESX Introduction VMFS Design Goals VMFS Architecture SAN Influence and Impact Conclusion
  • 10. VMFS Architecture A volume is an aggregation of resources and on-disk locks. A resource is either an inode, a file block, a sub- block or an indirect block. Each lock moderates access to a subset of resources. Hosts negotiate access to resource by acquiring relevant locks. VMFS = a clustered lock manager + a resource manager + a journaling module + a data mover + a VM IO manager + POSIX system call frantend
  • 11. VMKernel Logical Volume VMFS are by default created inside VMKernel logical volumes. VMKernel logical volumes can be spanned across multiple devices.
  • 12. VMFS on disk Layout
  • 13. Four Resources file blocks sub-blocks pointer blocks file descriptors Resources are grouped together into collections called CLUSTERs and clusters are further grouped together into CLUSTER GROUPS.
  • 14. Block Mapping Packed inside inode Sub block addressing File block addressing Pointer block addressing Can upgrade automatically.
  • 15. System Files System files are created at file system format time, and each manages one type of resources.
  • 16. System Files Use file blocks. Same read/write method as regular files. Checking file data consistency essentially provides metadata consistency.
  • 17. Cluster Groups Cluster groups are repeated to create a file system. An existing VMFS volume grows over unused space on the disk or spans new disks by laying out new cluster groups that refer to the newly added space. VMFS resource manager makes hosts operate on different and distant cluster groups within a system file. This reduces the possibility of mutiple hosts contending on the same lock(s) and increases the efficiency of the clustered lock manager.
  • 18. On-disk Lock A single sector data structure. Locking is based on lease. Atomic disk operations (SCSI reserve-read-modify-write- SCSI release)
  • 19. On-disk Lock Data Structure HostID: This is a 128-bit unique identifier that identifies the ESX host that owns the lock at a given point in time. All zeros means no owner. Mode: A set of non-zero values to indicate whether a lock is free, held exclusively, held by multiple hosts for shared read access, or held by multiple hosts for shared read and write access. Generation: A monotonically increasing counter, updates every time a lock is acquired, released or broken. While the hostID field sufficiently disambiguates operations on a lock from different hosts, this field disambiguates multiple operations on a lock by the same host. HBregion: For each valid hostID (if any) currently using the lock, a pointer to the on disk heartbeat region of the host. HBgen: A generation number to validate the HBregion reference as being current or stale. It disambiguates locks held by a given host before and after a host crash and before and after a storage outage.
  • 20. On-disk Heartbeat A single sector data structure Every host accessing a VMSF volume acquires a heartbeat on disk to declare liveness to other hosts. Allocated from a 1MB reserved region of the volume. 2048 concurrent hosts access.
  • 21. HB Failure Handling Hosts are free to break locks if heartbeat’s timestamp does not change for 20 second. Should replay journal when taking stale lock. If failing to update heartbeat timestamp in five HB period (about 15 sec and 40 HB IO tries), host will fence itself and abort all inflight IOs. Lock manager tries to rejoin the cluster if IO error is not permanent, and reclaims HB slot.
  • 22. On-disk Lock & HB Each host can join a cluster by acquiring a on- disk HB. It can also hold thousands of on-disk locks
  • 23. Journaling Each host maintains its own journal on the volume. HB region on disk stores journal location.
  • 25. Optimistic Locking All hosts in a VMFS cluster generally operate on mutually exclusive subsets of locks on the volume. A host that is interested in acquiring a given lock will typically find it to be free on disk. In stead of acquiring all locks, host first reads all locks, if they are free, modify in memory metadata and then upgrade locks and commit.
  • 27. Transaction State Machine w/ op lock Upgrade Lock 1: reserve disk; 2: issue asynchronous (async) reads of all required locks; 3: if any lock is acquired by remote host, abort and fall back to normal TSM; 4: issue async writes of all required locks; 5: wait for all async writes to complete; 6: release disk;
  • 28. Agenda ESX Introduction VMFS Design Goals VMFS Architecture SAN Influence and Impact Conclusion
  • 29. Adaptive SAN-aware retries For some SAN errors, instead of letting guest OS retry IO, VMkernel retries the IO after an optimal time.
  • 33. Directive SCSI CMD operator(VMID, source_blocklist, destination_blocklist) Zero, clone, delete
  • 34. Directive SCSI CMD atomic_test_and_set(block_number, old_image, new_image) For VMFS lock manager, new lock algorithm: reads a lock image from disk, and if the lock is free, issues an atomic_test_and_set with a new_image containing host specific hostID, generation and heartbeat information. 4 IOs -> 2 IOs
  • 35. Agenda ESX Introduction VMFS Design Goals VMFS Architecture SAN Influence and Impact Conclusion