SlideShare une entreprise Scribd logo
1  sur  27
Liquid : A Scalable
Deduplication File System
For Virtual Machine Images
CONTENTS
 INTRODUCTION
 VIRTUAL MACHINE
 DEDUPLICATION
 ISSUES IN VM STORAGE
 LIQUID SYSTEM ARCHITECTURE
 COMMUNICATION AMONG COMPONENTS HEART BEAT
PROTOCOL
 DEDUPLICATION IN LIQUID
 OPTIMIZATIONS ON FINGER PRINT CALCULATION
 STORAGE FOR DATA BLOCKS
 ADVANTAGES OF LIQUID
 CONCLUSION
2
INTRODUCTION
 Cloud computing means storing and accessing data programs
over internet instead of yours computers hard drive.
3
VIRTUAL MACHINE
 Saving as a critical component in cloud computing.
 Virtual Machine - Hypothetical Computer.
 Emulates the functions of a real world computer.
 Executes programs like a physical machine.
 Initial state of a virtual machine is stored in a file called virtual
Machine image.
4
VIRTUAL MACHINE
5
DEDUPLICATION
 Data Deduplication – data compression technology.
 Eliminates duplicate copies of repeating data.
 A redundant data block is replaced instead of storing multiple
times.
 Improves storage utilization
6
DEDUPLICATION
7
ISSUES IN VM STORAGE
 High demand on VM storage remains a challenging problem.
 Existing systems have made efforts to reduce storage
consumption.
 Uses SAN cluster.
 Cannot satisfy increasing demand due to cost limitation.
 Hence we propose LIQUID.
8
LIQUID SYSTEM ARCHITECTURE
 Three components - Single meta server with hot back up
multiple data server and multiple clients.
 Runs on user-level service process.
 VM images are split into fixed size data blocks.
 Meta server – namespace , finger print , reference count.
 Meta server – mirrored to hot back up shadow meta server.
9
LIQUID SYSTEM ARCHITECTURE (CONT)
 Data servers – change of managing data blocks in VM images.
 Organized in a distributed hash table.
 A liquid client provides a POSIX compatible file system.
 Client – critical component (provides deduplication)
 Fault tolerance – Mirroring the meta server.
 Replicas of data blocks are stored.
10
LIQUID SYSTEM ARCHITECTURE (CONT)
11
Shadow Meta Server
Meta server
Data
Servers
Client
FS
Client
FS
Client
FS
CacheCache Cache
Heart beat
Fig : Liquid architecture.
Hot backup
COMMUNICATION AMONG COMPONENTS
HEART BEAT PROTOCOL
 META SERVER-manages all data servers.
 Exchange regular heart beat message with each data server in
a ROUND ROBIN FASHION.
 Detect failed data servers when there are many data servers.
 To speed up failure detection data servers send an error
signal to meta server.
12
DEDUPLICATION IN LIQUID
 Liquid chooses fixed size chunking instead of variable size
chunking.
 Better since all files stored in VM images will be aligned on disk
block boundaries.
 Advantage-simplicity.
 Block size choice.
 Block size- balancing factor which is hard to choose.
 Great impact on both deduplication and io performance.
13
DEDUPLICATION IN LIQUID(CONT)
 Smaller block size-more random seeks when accessing a VM
image.
 Not tolerable.
 A large block size is also not preferable, it will reduce
deduplication ratio.
 Liquid choose different block size under different situation.
 Advised to use a multiplication of 4 kb between 256 kb and 1
MB to achieve good balance between IO performance and
deduplication ratio.
14
DEDUPLICATION IN LIQUID(CONT)
15
DEDUPLICATION IN LIQUID(CONT)
16
OPTIMIZATIONS ON FINGER PRINT
CALCULATION
 Rely on comparison of data block finger prints for
redundancy.
 Finger print-collision resistant hash value calculated from data
block contents.
 MD5[26] and SHA-1[12] are frequently used for this purpose.
 Finger print collision - very small, orders of magnitude smaller
than hardware error rates.
17
OPTIMIZATIONS ON FINGER PRINT
CALCULATION (CONT)
 So we could safely assume that two data blocks are identical.
 Finger print calculation - expensive.
 Delays finger print calculation for recently modified data
blocks.
 Runs deduplication lazily only when it is necessary.
 Client side maintains a shared cache which contains
recently accessed data blocks. 18
OPTIMIZATIONS ON FINGER PRINT
CALCULATION (CONT)
 A portion of memory is used by the client side of liquid as
private cache.
 Private cache hold-modified data blocks and delay finger print
calculation on them.
 Modified data block ejected from->shared cache and added
to ->private cache.
 Modified data will be ejected->if private cache becomes full.
19
OPTIMIZATIONS ON FINGER PRINT
CALCULATION (CONT)
 And ejected based on LRU policy.
 Only then will the modified data block’s finger print be
calculated.
 Liquid uses multiple threads for finger print calculation.
 Multiple threads will process different data blocks currently.
 Provides good IO performance.
20
FILE SYSTEM LAY OUT
 All file system meta data are stored on the meta server.
 Organized in a file system tree.
 Client side could cache portions of file system meta data for
fast accesses.
 When a VM is stopped ,modified meta data and data blocks
 Will be pushed back to meta server.
 Data servers ensures modification on VM image is visible to
other client nodes.
21
FILE SYSTEM LAY OUT
22
Fig. Process of look-up by fingerprint.
ADVANTAGES OF LIQUID
 Fast Virtual Machine deployment with peer to peer data
transfer.
 Low storage consumption by means of deduplication.
 Instant cloning for virtual machine images.
 On demand fetching through a network caching with local
disks.
 LIQUID files has no specific limit.
23
CONCLUSION
 Presented LIQUID which is a deduplication file system with
good IO performance.
 Achieved by caching frequently accessed data blocks in
memory cache.
 Avoids additional disk operations.
 Deduplication of VM images proved to be effective.
24
REFERENCES
 Bloom Filter, Sept. 2011. [Online]. Available :http://en.wikipedia.org/wiki/Bloom_filter
 Filesystem in Userspace, Sept. 2011. [Online]. Available: http://fuse.sourceforge.net/
 Rabin Fingerprint, Sept. 2011. [Online]. Available:
http://en.wikipedia.org/wiki/Rabin_fingerprint.
 Reiserfs, Sept. 2011. [Online]. Available: http://en.wikipedia.org/wiki/ReiserFS.
 Xfs: A High-Performance Journaling Filesystem, Sept. 2011. [Online]. Available:
http://oss.sgi.com/projects/xfs/.
 Data Deduplication, Sept. 2013. [Online]. Available:
http://en.wikipedia.org/wiki/Data_deduplication.
25
liquid a scalable deduplication file system for virtual machine images
liquid a scalable deduplication file system for virtual machine images

Contenu connexe

Tendances

Directory Write Leases in MagFS
Directory Write Leases in MagFSDirectory Write Leases in MagFS
Directory Write Leases in MagFS
Maginatics
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
GlusterFS
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
David Walker
 
Cost and performance comparison for OpenStack compute and storage infrastructure
Cost and performance comparison for OpenStack compute and storage infrastructureCost and performance comparison for OpenStack compute and storage infrastructure
Cost and performance comparison for OpenStack compute and storage infrastructure
Principled Technologies
 

Tendances (20)

Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical HighlightsMaginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
 
Directory Write Leases in MagFS
Directory Write Leases in MagFSDirectory Write Leases in MagFS
Directory Write Leases in MagFS
 
Storage as a Service with Gluster
Storage as a Service with GlusterStorage as a Service with Gluster
Storage as a Service with Gluster
 
Application layer
Application layerApplication layer
Application layer
 
Benchmarking a Scalable and Highly Available Architecture for Virtual Desktops
Benchmarking a Scalable and Highly Available Architecture for Virtual DesktopsBenchmarking a Scalable and Highly Available Architecture for Virtual Desktops
Benchmarking a Scalable and Highly Available Architecture for Virtual Desktops
 
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
 
ClustrixDB 7.5 Announcement
ClustrixDB 7.5 AnnouncementClustrixDB 7.5 Announcement
ClustrixDB 7.5 Announcement
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
 
Introduction to storage technologies
Introduction to storage technologiesIntroduction to storage technologies
Introduction to storage technologies
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
 
IMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
IMCSummit 2015 - Day 2 Developer Track - The NVM RevolutionIMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
IMCSummit 2015 - Day 2 Developer Track - The NVM Revolution
 
Containers and Databases
Containers and DatabasesContainers and Databases
Containers and Databases
 
OpenQrm
OpenQrmOpenQrm
OpenQrm
 
Cost and performance comparison for OpenStack compute and storage infrastructure
Cost and performance comparison for OpenStack compute and storage infrastructureCost and performance comparison for OpenStack compute and storage infrastructure
Cost and performance comparison for OpenStack compute and storage infrastructure
 
Gluster Data Tiering
Gluster Data TieringGluster Data Tiering
Gluster Data Tiering
 
Gluster Webinar: Introduction to GlusterFS v3.3
Gluster Webinar: Introduction to GlusterFS v3.3Gluster Webinar: Introduction to GlusterFS v3.3
Gluster Webinar: Introduction to GlusterFS v3.3
 
Using openQRM to Manage Virtual Machines
Using openQRM to Manage Virtual MachinesUsing openQRM to Manage Virtual Machines
Using openQRM to Manage Virtual Machines
 
How to choose a server for your data center's needs
How to choose a server for your data center's needsHow to choose a server for your data center's needs
How to choose a server for your data center's needs
 

En vedette

Senna analysis
Senna analysisSenna analysis
Senna analysis
smagdeburg
 

En vedette (20)

eFolder Webinar: a Deep Dive Into Deduplication
eFolder Webinar: a Deep Dive Into DeduplicationeFolder Webinar: a Deep Dive Into Deduplication
eFolder Webinar: a Deep Dive Into Deduplication
 
SecCloudPro: A Novel Secure Cloud Storage System for Auditing and Deduplication
SecCloudPro: A Novel Secure Cloud Storage System for Auditing and DeduplicationSecCloudPro: A Novel Secure Cloud Storage System for Auditing and Deduplication
SecCloudPro: A Novel Secure Cloud Storage System for Auditing and Deduplication
 
Securing online password guessing attack
Securing online password guessing attackSecuring online password guessing attack
Securing online password guessing attack
 
Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01
Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01
Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01
 
Senaa engineering
Senaa engineeringSenaa engineering
Senaa engineering
 
All about Casia 2014- The Annual Case Study Competition for Social Brands
All about Casia 2014- The Annual Case Study Competition for Social BrandsAll about Casia 2014- The Annual Case Study Competition for Social Brands
All about Casia 2014- The Annual Case Study Competition for Social Brands
 
Senna analysis
Senna analysisSenna analysis
Senna analysis
 
Deduplication
DeduplicationDeduplication
Deduplication
 
A Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized DeduplicationA Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized Deduplication
 
Hptlc by srinivas
Hptlc by srinivasHptlc by srinivas
Hptlc by srinivas
 
HydraFS
HydraFSHydraFS
HydraFS
 
A hybrid cloud approach for secure authorized deduplication
A hybrid cloud approach for secure authorized deduplicationA hybrid cloud approach for secure authorized deduplication
A hybrid cloud approach for secure authorized deduplication
 
Cassia senna
Cassia sennaCassia senna
Cassia senna
 
leaves (senna, eucalyptus, rosemary & sage)
leaves (senna, eucalyptus, rosemary & sage)leaves (senna, eucalyptus, rosemary & sage)
leaves (senna, eucalyptus, rosemary & sage)
 
A prototype of finger print based licensing system for driving
A prototype of finger print based licensing system for drivingA prototype of finger print based licensing system for driving
A prototype of finger print based licensing system for driving
 
Web clustring engine
Web clustring engineWeb clustring engine
Web clustring engine
 
Crude Drugs | Hamidul Kowsar
Crude Drugs | Hamidul KowsarCrude Drugs | Hamidul Kowsar
Crude Drugs | Hamidul Kowsar
 
A hybrid cloud approach for secure authorized deduplication.
A hybrid cloud approach for secure authorized deduplication.A hybrid cloud approach for secure authorized deduplication.
A hybrid cloud approach for secure authorized deduplication.
 
Cassia Senna
Cassia Senna Cassia Senna
Cassia Senna
 
POST-HARVEST MANAGEMENT OF MEDICINAL AND AROMATIC PLANTS
POST-HARVEST MANAGEMENT OF MEDICINAL AND AROMATIC PLANTSPOST-HARVEST MANAGEMENT OF MEDICINAL AND AROMATIC PLANTS
POST-HARVEST MANAGEMENT OF MEDICINAL AND AROMATIC PLANTS
 

Similaire à liquid a scalable deduplication file system for virtual machine images

[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
IJET - International Journal of Engineering and Techniques
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010
Membase
 
What's new in Windows Server 2012 R2
What's new in Windows Server 2012 R2What's new in Windows Server 2012 R2
What's new in Windows Server 2012 R2
Christopher Keyaert
 
Oracle 10g rac_overview
Oracle 10g rac_overviewOracle 10g rac_overview
Oracle 10g rac_overview
Robel Parvini
 

Similaire à liquid a scalable deduplication file system for virtual machine images (20)

Future prediction-ds
Future prediction-dsFuture prediction-ds
Future prediction-ds
 
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
 
Database as a Service - Tutorial @ICDE 2010
Database as a Service - Tutorial @ICDE 2010Database as a Service - Tutorial @ICDE 2010
Database as a Service - Tutorial @ICDE 2010
 
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
 
Virtual Storage Center
Virtual Storage CenterVirtual Storage Center
Virtual Storage Center
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010
 
How to scale your web app
How to scale your web appHow to scale your web app
How to scale your web app
 
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
 
CDP_2(1).pptx
CDP_2(1).pptxCDP_2(1).pptx
CDP_2(1).pptx
 
Datacenter 2014: IPnett - Martin Milnert
Datacenter 2014: IPnett - Martin MilnertDatacenter 2014: IPnett - Martin Milnert
Datacenter 2014: IPnett - Martin Milnert
 
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptCENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
 
What's new in Windows Server 2012 R2
What's new in Windows Server 2012 R2What's new in Windows Server 2012 R2
What's new in Windows Server 2012 R2
 
VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2
VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2 VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2
VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2
 
Add Memory, Improve Performance, and Lower Costs with IBM MAX5 Technology
Add Memory, Improve Performance, and Lower Costs with IBM MAX5 TechnologyAdd Memory, Improve Performance, and Lower Costs with IBM MAX5 Technology
Add Memory, Improve Performance, and Lower Costs with IBM MAX5 Technology
 
Oracle 10g rac_overview
Oracle 10g rac_overviewOracle 10g rac_overview
Oracle 10g rac_overview
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
An Introduction to Cloud Computing (2009)
An Introduction to Cloud Computing (2009)An Introduction to Cloud Computing (2009)
An Introduction to Cloud Computing (2009)
 
Gridstore's Software-Defined-Storage Architecture
Gridstore's Software-Defined-Storage ArchitectureGridstore's Software-Defined-Storage Architecture
Gridstore's Software-Defined-Storage Architecture
 
Achieving Scalability and speed with IBM Solutions - IaaS Softlayer
Achieving Scalability and speed with IBM Solutions -  IaaS SoftlayerAchieving Scalability and speed with IBM Solutions -  IaaS Softlayer
Achieving Scalability and speed with IBM Solutions - IaaS Softlayer
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

liquid a scalable deduplication file system for virtual machine images

  • 1. Liquid : A Scalable Deduplication File System For Virtual Machine Images
  • 2. CONTENTS  INTRODUCTION  VIRTUAL MACHINE  DEDUPLICATION  ISSUES IN VM STORAGE  LIQUID SYSTEM ARCHITECTURE  COMMUNICATION AMONG COMPONENTS HEART BEAT PROTOCOL  DEDUPLICATION IN LIQUID  OPTIMIZATIONS ON FINGER PRINT CALCULATION  STORAGE FOR DATA BLOCKS  ADVANTAGES OF LIQUID  CONCLUSION 2
  • 3. INTRODUCTION  Cloud computing means storing and accessing data programs over internet instead of yours computers hard drive. 3
  • 4. VIRTUAL MACHINE  Saving as a critical component in cloud computing.  Virtual Machine - Hypothetical Computer.  Emulates the functions of a real world computer.  Executes programs like a physical machine.  Initial state of a virtual machine is stored in a file called virtual Machine image. 4
  • 6. DEDUPLICATION  Data Deduplication – data compression technology.  Eliminates duplicate copies of repeating data.  A redundant data block is replaced instead of storing multiple times.  Improves storage utilization 6
  • 8. ISSUES IN VM STORAGE  High demand on VM storage remains a challenging problem.  Existing systems have made efforts to reduce storage consumption.  Uses SAN cluster.  Cannot satisfy increasing demand due to cost limitation.  Hence we propose LIQUID. 8
  • 9. LIQUID SYSTEM ARCHITECTURE  Three components - Single meta server with hot back up multiple data server and multiple clients.  Runs on user-level service process.  VM images are split into fixed size data blocks.  Meta server – namespace , finger print , reference count.  Meta server – mirrored to hot back up shadow meta server. 9
  • 10. LIQUID SYSTEM ARCHITECTURE (CONT)  Data servers – change of managing data blocks in VM images.  Organized in a distributed hash table.  A liquid client provides a POSIX compatible file system.  Client – critical component (provides deduplication)  Fault tolerance – Mirroring the meta server.  Replicas of data blocks are stored. 10
  • 11. LIQUID SYSTEM ARCHITECTURE (CONT) 11 Shadow Meta Server Meta server Data Servers Client FS Client FS Client FS CacheCache Cache Heart beat Fig : Liquid architecture. Hot backup
  • 12. COMMUNICATION AMONG COMPONENTS HEART BEAT PROTOCOL  META SERVER-manages all data servers.  Exchange regular heart beat message with each data server in a ROUND ROBIN FASHION.  Detect failed data servers when there are many data servers.  To speed up failure detection data servers send an error signal to meta server. 12
  • 13. DEDUPLICATION IN LIQUID  Liquid chooses fixed size chunking instead of variable size chunking.  Better since all files stored in VM images will be aligned on disk block boundaries.  Advantage-simplicity.  Block size choice.  Block size- balancing factor which is hard to choose.  Great impact on both deduplication and io performance. 13
  • 14. DEDUPLICATION IN LIQUID(CONT)  Smaller block size-more random seeks when accessing a VM image.  Not tolerable.  A large block size is also not preferable, it will reduce deduplication ratio.  Liquid choose different block size under different situation.  Advised to use a multiplication of 4 kb between 256 kb and 1 MB to achieve good balance between IO performance and deduplication ratio. 14
  • 17. OPTIMIZATIONS ON FINGER PRINT CALCULATION  Rely on comparison of data block finger prints for redundancy.  Finger print-collision resistant hash value calculated from data block contents.  MD5[26] and SHA-1[12] are frequently used for this purpose.  Finger print collision - very small, orders of magnitude smaller than hardware error rates. 17
  • 18. OPTIMIZATIONS ON FINGER PRINT CALCULATION (CONT)  So we could safely assume that two data blocks are identical.  Finger print calculation - expensive.  Delays finger print calculation for recently modified data blocks.  Runs deduplication lazily only when it is necessary.  Client side maintains a shared cache which contains recently accessed data blocks. 18
  • 19. OPTIMIZATIONS ON FINGER PRINT CALCULATION (CONT)  A portion of memory is used by the client side of liquid as private cache.  Private cache hold-modified data blocks and delay finger print calculation on them.  Modified data block ejected from->shared cache and added to ->private cache.  Modified data will be ejected->if private cache becomes full. 19
  • 20. OPTIMIZATIONS ON FINGER PRINT CALCULATION (CONT)  And ejected based on LRU policy.  Only then will the modified data block’s finger print be calculated.  Liquid uses multiple threads for finger print calculation.  Multiple threads will process different data blocks currently.  Provides good IO performance. 20
  • 21. FILE SYSTEM LAY OUT  All file system meta data are stored on the meta server.  Organized in a file system tree.  Client side could cache portions of file system meta data for fast accesses.  When a VM is stopped ,modified meta data and data blocks  Will be pushed back to meta server.  Data servers ensures modification on VM image is visible to other client nodes. 21
  • 22. FILE SYSTEM LAY OUT 22 Fig. Process of look-up by fingerprint.
  • 23. ADVANTAGES OF LIQUID  Fast Virtual Machine deployment with peer to peer data transfer.  Low storage consumption by means of deduplication.  Instant cloning for virtual machine images.  On demand fetching through a network caching with local disks.  LIQUID files has no specific limit. 23
  • 24. CONCLUSION  Presented LIQUID which is a deduplication file system with good IO performance.  Achieved by caching frequently accessed data blocks in memory cache.  Avoids additional disk operations.  Deduplication of VM images proved to be effective. 24
  • 25. REFERENCES  Bloom Filter, Sept. 2011. [Online]. Available :http://en.wikipedia.org/wiki/Bloom_filter  Filesystem in Userspace, Sept. 2011. [Online]. Available: http://fuse.sourceforge.net/  Rabin Fingerprint, Sept. 2011. [Online]. Available: http://en.wikipedia.org/wiki/Rabin_fingerprint.  Reiserfs, Sept. 2011. [Online]. Available: http://en.wikipedia.org/wiki/ReiserFS.  Xfs: A High-Performance Journaling Filesystem, Sept. 2011. [Online]. Available: http://oss.sgi.com/projects/xfs/.  Data Deduplication, Sept. 2013. [Online]. Available: http://en.wikipedia.org/wiki/Data_deduplication. 25