Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
1. Effect of Disk Prefetching of Guest OS
on Storage Deduplication
Kuniyasu Suzaki †, Toshiki Yagi †,
Kengo Iijima †, Cyrille Artho †,
Yoshihito W t b
Y hihit Watanabe ††
†
Research Center for Information Security
††
1
2. Motivation (1/2)
• Normal OS is installed on fully virtualized environment
and assumes there are real devices.
• Do the optimization techniques of operating system
work well for virtual devices?
– Virtualized devices are developed to get native performance,
but most virtual devices have their original restrictions
which are not hidden from the view of performance.
• Should Guest OS adjust the virtual devices with
traditional optimization techniques?
2
3. Motivation (2/2)
• Our approach is not to devlop a para-virtualized device
driver and I/O Passthrough.
• Our approach
– Guest OS recognizes the feature of virtual device and adjust
the behavior for it.
• Current OS has many optimization techniques and tools.
3
4. Our targets
• virtual device (storage)
– CAS: Content Addressable Storage
• Manage virtual block device with deduplication.
• CAS has original restrictions; Occupancy problem, size
mismatching, and alignment problem.
• G t OS: Li
Guest OS Linux
– readahead: Disk prefetch mechanism in Linux kernel
– System call “readahead” is different function.
– block reallocation of file system
• A kind of defrag tool. We developed “ext-optimizer”
which reallocate data block using access profile.
4
5. CAS: Content addressable Storage
• Data is not addressed by its physical location. Data is
addressed by a unique name (a secure hash is used usually)
derived from the content.
• Same contents are expressed by one original content (same
hash) and others are addressed by indirect link. (Storage
Deduplication)
– Plan9 has Venti [USENIX FAST02]
– Data Domain (EMC) Deduplication [USENIX FAST08]
( ) p [ ]
– LBCAS (Loopback Content Addressable Storage) [LinuxSymp09]
Virtual Disk CAS Storage Archive
Indexing
Address SHA-1
0000000-0003FFF 4ad36ffe8…
0004000-0007FFF 974daf34a… New block
0008000-000BFFF 2d34ff3e1… is created
000C000-000FFFF 974daf34a…
… … with new
SHA-1
sharing
Deduplication
6. Optimization for Disk Access
• Disk prefetch “readahead”
– Linux kernel has a disk prefetch mechanism called “readahead”.
Prefeached data are stored in memory (page cache). The
coverage size of prefeatch is changed dynamically by the hit rate
of page chache.
• System Call “readahead”
– It is not directly related to the disk prefetch but it achieves same
function from user space.
– System Call “readahead” populates the page cache with whole
data from a file. Thus, whole data of a file is stored at page cache.
• It is not efficient for the view of prefeatch.
– We refer this function “u-readahead” in this presentation. 6
7. Performance Issues on CAS
• 2 types of block size mismatch
(1) between File System and LBCAS (Static Mismatch)
• ext2/3 4KB block size
• LBCAS 64KB-512KB chunk size
– Occupancy (Rate of necessary data in a LBCAS chunk) is low.
» Kitagawa[LinuxKongress2006] reported the occupancy was 30% on
KNOPPIX 3.8.2 on 256KB LBCAS.
(2) between readahead and LBCAS (Dynamic Mismatch)
• readahead 4KB-128KB coverage size
• LBCAS 64KB-512KB chunk size
– Size mismatch
» Small readahead causes low occupancy.
» Large readahead requires many LBCAS chunks for an access.
– Alignment problem
» When readahead covers the alignment of LBCAS, redundant
chunk is required. 7
8. Access mismatch in chunk of LBCAS
• Occupancy (necessary data in a chunk) depends on the necessary data.
• Large readahead requires many chunks.
• Wnen an access crosses over the LBCAS alignment, redundant chuck is allocated.
Ext2/3 File System readahead LBCAS
Access request (4K) (4K~128K) (256KB)
Occupancy is low
Small readahead
Many chunk
Large readahead searches and
allocation for an
access
Alignment Access
Redundant
chunk
8
Files Block search Disk access LBCAS Chunk
via readahead
9. Solution
1. (for static mismatch) Increase occupancy by reallocate
necessary data in a LBCAS chunk.
2. (for dynamic mismatch) Keeps large coverage size of
readahead by sequential access and high hit rate of page
cache.
• Increasing locality of reference.
• “ext-optimizer” repacks the data blocks of ext2/3 file
system to be in line.
– The repacking is based on the block access profile.
– As the results, ext-optimizer increases the occupancy and
constant high cache hit rate by sequential access. 9
10. Ext-optimizer: Access profile and reallocation
App ext-optimizer App
User
Access Profile
Kernel (via /proc/ )
VFS VFS
File System Driver (ext2/3) File System Driver (ext2/3)
Profiler
Page Cache (Memory) Page Cache (Memory)
Readahead is
small and many Readahead
(worm-eaten) is sequential
Block Driver Block Driver
access access
(Loopback) (Loopback)
Device
Reallocate 10
scattered gathered
11. Block Relocation: Ext-optimizer [LinuxKongress06]
• Change data blocks to be arranged in line. Structure of meta data is not changed.
• The arrangement is based on the access profile.
• Feature:
– Normal driver is used.
– The fragmentation is occurred from the view of file
– The relocation increases page-cache hit. readahead extend the coverage size.
Mode Mode
Owner info Owner info
Size Size
high
Timestamps Timestamps occupancy
Direct Blocks Direct Blocks
Indirect Blocks Indirect Blocks
Double Indirect Double Indirect
Triple Indirect Triple Indirect
11
12. Performance Analysis
• Confirm the effect of ext-optimizer on LBCAS for Guest
OS booting.
– Ubuntu 9.04 (2.6.28) installed on ext3 (8GB) with KVM-60.
• The ext3 was optimized by ext-optimizer for boot profile.
• The disk image is translated to LBCAS (64KB - 256KB).
• Compare with
– Normal
– u-readahead: user level readahead (system call) for booting
– ext-optimizer
12
13. Disk Image Analyzed by DAVL
(Disk Allocation Viewer for Linux)
Fragmentation 0.21% Fragmentation 1.11%
Data used
System booting are
block made in line
Non
Non-
contiguous
block
contiguous
block
13
normal ext2/3opt
14. Disk Access Trace at boot time
• Ext-optimizer relocate data blocks, which are
required at boot time, at the top of virtual disk.
Red: normal
Blue: ext2/3opt
s)
Time (s
0 2.0 4.0 6.0 8.0
14
Address (GB)
15. Histogram of Access for readahead coverage
• Ext-optimizer reduced small “readahead”.
Frequency
0 32 64 128
15
Coverage size of readahead (KB)
16. Amount of data on each processing level
normal u-readahead ext2/3opt
Amount of files (number, average) 203MB (2,248 Av: 92KB)
Amount of required blocks 127MB
Amount of disk access which 208MB 231MB 140MB
includes coverage of readahead
6,379 5, 827 2,129
(count, average coverage size)
33KB 41KB 67KB
Amount of required chunk MB, Occupancy % (127MB/ Amount of Chunk MB)
LBCAS size normal u-readahead ext2/3opt
64KB 247, 51.5% 272, 46.9% 144, 88.7%
128KB 290, 43.9% 315, 40.3% 149, 85.3%
256KB 358, 35.5% 386, 35.0% 159, 80.0%
512KB 474, 26.9% 508, 25.1% 176, 71.8% 16
17. Discussion
• In this talk, I eliminate the effect of deduplication, but it is
not high on a single disk image, even if the chuck is small.
– Deduplicaion is effective on merging updated images.
– Performance is more important.
• Memory on a virtual machine also has deduplication
mechanism (Differential Engine[OSDI’09], Satori[USENIX’09],
etc). Guest OS should adjust the behavior.
– SLINKY[USENIX05] and our paper [HotSec10] utilizes memory
deduplication for security.
17
18. Conclusion
• Virtual devices have their original restrictions which are
not hidden from the view of performance.
• The guest OS should recognize the feature of virtual
device and adjust the behavior for virtual device with
traditional ti i ti t h i
t diti l optimization techniques.
• We showed an example for CAS(Content Addressable
Storage) with disk prefeatching and block reallocation.
18