This document describes Liquid, a scalable deduplication file system designed for virtual machine images. Liquid has three main components: a meta server that manages metadata, multiple data servers that store data blocks, and clients that provide a POSIX file system interface. Liquid uses fixed-size chunking and fingerprinting to identify duplicate data blocks across VM images for deduplication. It optimizes fingerprint calculation through caching and lazy evaluation. The meta server coordinates the data servers to provide fault tolerance and access to deduplicated data.
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
liquid a scalable deduplication file system for virtual machine images
1. Liquid : A Scalable
Deduplication File System
For Virtual Machine Images
2. CONTENTS
INTRODUCTION
VIRTUAL MACHINE
DEDUPLICATION
ISSUES IN VM STORAGE
LIQUID SYSTEM ARCHITECTURE
COMMUNICATION AMONG COMPONENTS HEART BEAT
PROTOCOL
DEDUPLICATION IN LIQUID
OPTIMIZATIONS ON FINGER PRINT CALCULATION
STORAGE FOR DATA BLOCKS
ADVANTAGES OF LIQUID
CONCLUSION
2
3. INTRODUCTION
Cloud computing means storing and accessing data programs
over internet instead of yours computers hard drive.
3
4. VIRTUAL MACHINE
Saving as a critical component in cloud computing.
Virtual Machine - Hypothetical Computer.
Emulates the functions of a real world computer.
Executes programs like a physical machine.
Initial state of a virtual machine is stored in a file called virtual
Machine image.
4
6. DEDUPLICATION
Data Deduplication – data compression technology.
Eliminates duplicate copies of repeating data.
A redundant data block is replaced instead of storing multiple
times.
Improves storage utilization
6
8. ISSUES IN VM STORAGE
High demand on VM storage remains a challenging problem.
Existing systems have made efforts to reduce storage
consumption.
Uses SAN cluster.
Cannot satisfy increasing demand due to cost limitation.
Hence we propose LIQUID.
8
9. LIQUID SYSTEM ARCHITECTURE
Three components - Single meta server with hot back up
multiple data server and multiple clients.
Runs on user-level service process.
VM images are split into fixed size data blocks.
Meta server – namespace , finger print , reference count.
Meta server – mirrored to hot back up shadow meta server.
9
10. LIQUID SYSTEM ARCHITECTURE (CONT)
Data servers – change of managing data blocks in VM images.
Organized in a distributed hash table.
A liquid client provides a POSIX compatible file system.
Client – critical component (provides deduplication)
Fault tolerance – Mirroring the meta server.
Replicas of data blocks are stored.
10
11. LIQUID SYSTEM ARCHITECTURE (CONT)
11
Shadow Meta Server
Meta server
Data
Servers
Client
FS
Client
FS
Client
FS
CacheCache Cache
Heart beat
Fig : Liquid architecture.
Hot backup
12. COMMUNICATION AMONG COMPONENTS
HEART BEAT PROTOCOL
META SERVER-manages all data servers.
Exchange regular heart beat message with each data server in
a ROUND ROBIN FASHION.
Detect failed data servers when there are many data servers.
To speed up failure detection data servers send an error
signal to meta server.
12
13. DEDUPLICATION IN LIQUID
Liquid chooses fixed size chunking instead of variable size
chunking.
Better since all files stored in VM images will be aligned on disk
block boundaries.
Advantage-simplicity.
Block size choice.
Block size- balancing factor which is hard to choose.
Great impact on both deduplication and io performance.
13
14. DEDUPLICATION IN LIQUID(CONT)
Smaller block size-more random seeks when accessing a VM
image.
Not tolerable.
A large block size is also not preferable, it will reduce
deduplication ratio.
Liquid choose different block size under different situation.
Advised to use a multiplication of 4 kb between 256 kb and 1
MB to achieve good balance between IO performance and
deduplication ratio.
14
17. OPTIMIZATIONS ON FINGER PRINT
CALCULATION
Rely on comparison of data block finger prints for
redundancy.
Finger print-collision resistant hash value calculated from data
block contents.
MD5[26] and SHA-1[12] are frequently used for this purpose.
Finger print collision - very small, orders of magnitude smaller
than hardware error rates.
17
18. OPTIMIZATIONS ON FINGER PRINT
CALCULATION (CONT)
So we could safely assume that two data blocks are identical.
Finger print calculation - expensive.
Delays finger print calculation for recently modified data
blocks.
Runs deduplication lazily only when it is necessary.
Client side maintains a shared cache which contains
recently accessed data blocks. 18
19. OPTIMIZATIONS ON FINGER PRINT
CALCULATION (CONT)
A portion of memory is used by the client side of liquid as
private cache.
Private cache hold-modified data blocks and delay finger print
calculation on them.
Modified data block ejected from->shared cache and added
to ->private cache.
Modified data will be ejected->if private cache becomes full.
19
20. OPTIMIZATIONS ON FINGER PRINT
CALCULATION (CONT)
And ejected based on LRU policy.
Only then will the modified data block’s finger print be
calculated.
Liquid uses multiple threads for finger print calculation.
Multiple threads will process different data blocks currently.
Provides good IO performance.
20
21. FILE SYSTEM LAY OUT
All file system meta data are stored on the meta server.
Organized in a file system tree.
Client side could cache portions of file system meta data for
fast accesses.
When a VM is stopped ,modified meta data and data blocks
Will be pushed back to meta server.
Data servers ensures modification on VM image is visible to
other client nodes.
21
22. FILE SYSTEM LAY OUT
22
Fig. Process of look-up by fingerprint.
23. ADVANTAGES OF LIQUID
Fast Virtual Machine deployment with peer to peer data
transfer.
Low storage consumption by means of deduplication.
Instant cloning for virtual machine images.
On demand fetching through a network caching with local
disks.
LIQUID files has no specific limit.
23
24. CONCLUSION
Presented LIQUID which is a deduplication file system with
good IO performance.
Achieved by caching frequently accessed data blocks in
memory cache.
Avoids additional disk operations.
Deduplication of VM images proved to be effective.
24