Ext2/3optimizer and readahead optimization can improve the performance of LBCAS (LoopBack Content Addressable Storage). Ext2/3optimizer rearranges file system blocks based on access profiles to reduce the number of small readahead requests. This increases the readahead coverage size, reduces redundant block downloads, and improves disk access locality. Performance analysis shows these optimizations reduce I/O utilization and consumption time in LBCAS during system booting.
Similaire à Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS
Similaire à Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)" (20)
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"
1. Effect of readahead and file system block reallocation
for LBCAS (LoopBack Content Addressable Storage)
@ Linux Symposium 2009,
Montreal, Canada, 17/July
Paper: http://www.kernel.org/doc/ols/2009/ols2009-pages-275-286.pdf
Kuniyasu Suzaki †, Toshiki Yagi †,
Kengo Iijima †, Nguyen Anh Quynh †,
Yoshihito Watanabe ††
†
Research Center for Information Security
†† 1
2. Key words
• LBCAS: Loopback Content Addressable Storage
– Virtual block device (network transparent block device)
• readahead
– Disk prefetch mechanism in Linux kernel
• System call “readahead” is different function.
• file system block reallocation
– A kind of defrag tool
– We developed “ext2/3optimizer” which reallocate i-node data block.
Today’s talk is optimization methods using them.
2
3. Today’s Contents
• Motivation
– What is LBCAS used for?
– Correlation among LBCAS, file system block reallocation (ext2/3optimizer),
and disk prefetch (readahead)
• LBCAS: Loopback Content Addressable Storage
• Optimization: ext2/3optimizer and readahead
• Performance Results
• Conclusions
3
4. Motivation
What is “LBCAS” used for?
• LBCAS is developed for OS Circular.
• OS Circular is a project to distribute bootable disk
image for virtual machine and real machine.
– OS Circular project
• http://openlab.jp/oscircular/
4
5. OS Circular (Big Picture)
OS Suppliers
(update timely)
block files on
LBCAS HTTP Server
(Loopback Content Internet
Addressable Storage)
Construct Virtual Disk
from block files
KVM QEMU
Users
Try OS without
installation 5
Virtual Machine Real Machine
6. Performance Issues (Today’s Main Topic)
• LBCAS is sensitive for access patterns.
– Performance is affected by Number and Size of Disk Prefetch
(“readahead” of Linux kernel)
• Number and Size of readahead can be optimized by file
system block reallocation.
– Defrag Tools are not enough. We developed “ext2/3optimzer”.
ext2/3optimizer •Number of readahead
reallocates blocks of is reduced Performance of
ext2/3, which is based LBCAS is increased
•Size of readahead is
on access profile. extended
General Technique
Presentation ③ ② ①
Order 6
7. LBCAS: LoopBack Content Addressable Storage
• LBCAS= CAS + LoopBack
– CAS
• Indirect addressing by SHA-1 digest of block contents
• Benefit: Same blocks are expressed by same SHA-1 digest and reduced
total storage
• Mainly used for Archive. Example: Venti of Plan9 [USENIX FAST’02]
– LoopBack
• Virtual block device. A file is used as a block device.
• The abstraction by file makes easy to treat.
• LBCAS saves each block to a file, which is called “block file”. The
file is named by SHA-1 digest of its contents.
• Block files are managed by “mapping table” file, which is a table of
physical address and SHA-1 file name.
7
8. Block files of LBCAS
Address File Name
00000000-0003FFFF 4ad36ffe8…
00040000-0007FFFF 974daf34a…
00080000-000BFFFF 2d34ff3e1…
Block Device 000C0000-000FFFFF 3310012a…
Mapping Table and … …
block files
4KB Page map01.idx
4ad36ffe8…
ext2 256KB 974daf34a…
… 2d34ff3e1… The block files are re-
3310012a… constructed as a virtual disk
… … … with LBCAS
Block file is named by
SHA-1 digest of its contents
…
compressed
… by zlib
8
9. LBCAS (1/2)
• The image of LBCAS are made from existing
normal block device.
• Original block device is split by fixed size (64KB -
512KB) and compressed by zlib.
• Block files are reconstructed to a loopback file by FUSE
wrapper.
– FUSE is a User-land File System.
• http://fuse.sf.net
• Each block file is measured with the SHA1 file name
when it mapped to loopback file.
9
11. Structure of LBCAS
• Storage Cache
– Suppress download
• Memory Cache
– Suppress disk-access and
uncompress
11
12. LBCAS (2/2)
• When a file is updated or created on the original block device, the
relevant block files are newly created with new SHA1 file name.
The mapping table file is also renewed.
– Old block files are reusable.
• HTTP for file deliver
– Most popular and well designed for Internet.
• Utilize inexpensive Web hosting services, Proxies, and Mirror Servers
for world wide deployment.
• Block files are network/storage transparent.
– If necessary block files are stored in a local storage, network connection is
not necessary.
12
13. Partial Update of LBCAS
Block Device block file
block files named by SHA-1
4KB Page map01.idx
ext2 256KB 4ad36ffe8…
974daf34a…
… 2d34ff3e1…
3310012a…
… … …
Same files
… Reusable for
FUSE
Update 4KB Page map02.idx
256KB 4ad36ffe8…
ext2 FUSE
dd4daf34a… driver
… 2d34ff3e1…
3310012a…
… … …
apt-get install …
Create Once, Use Many
… 13
14. Performance Issues
• LBCAS is sensitive for access patterns.
• 2 types of block size mismatch
(1) between File System and LBCAS (Static Mismatch)
• ext2/3 4KB block size
• LBCAS 64KB-512KB block size
– Occupancy (Rate of necessary data in a block file) is low.
» Kitagawa[LinuxKongress2006] reported the occupancy was 30% on
KNOPPIX 3.8.2 on 256KB LBCAS.
(2) between “readahead(disk prefetch)” and LBCAS (Dynamic
Mismatch)
• readahead 4KB-128KB coverage size
• LBCAS 64KB-512KB lock size
– Small and many access (worm-eaten access to a block file) causes
redundant download and unnecessary uncompress for LBCAS Driver.
14
15. CAUTION for readahead
• Disk prefetch “readahead” and System Call “readahead”
– System Call “readahead” populates the page cache with data
from a file. Thus, whole data of a file is stored at page cache.
The coverage is size of a file.
– It is not directly related to the disk prefetch but it achieves
same function from user space.
– Some boot procedure use the system call “readahead”. The
files, which are populated the page cache at boot time in
advance, are listed at “/etc/readahead/boot,desktop”. We call
this function “u-readahead” in this presentation.
15
16. Block size mismatch
• Solution (increasing locality of reference)
1. (for static mismatch) Increase occupancy by reallocate necessary
data in a block file.
2. (for dynamic mismatch) Extend the coverage size of readahead
by sequential access and high hit rate of page cache.
• “ext2/3optimizer” repacks the data blocks of ext2/3 file
system to be in line.
– The repacking is based on the block access profile at boot time.
– As the results, ext2/3optimizer reduces the number of block files.
16
17. Occupancy in a block file of LBCAS
• Occupancy (necessary data in a block file) depends on the necessary data.
• “Worn-eaten” access (readahead) causes redundant download of block file.
Ext2/3 File System readahead LBCAS
Read Order (4K) (4K~128K) (256KB)
①
② Hit Page-Cache
Occupancy is low
③
Cache missed and the
coverage is shrunk
Redundant block
17
Files Block search Disk access Block files
via readahead downloaded
18. Readahead and LBCAS 1/2
• Readahead is a mechanism of disk prefetch. The data are saved to page cache.
• The coverage size is extended or shrank by the rate of page cache hit rate.
start ahead_start I/O
current_window ahead_window
Extend to “max_readahead”
sequential read
from application
I/O
current_window ahead_window
sequential read
18
from application
19. Readahead and LBCAS 2/2
• When a readahead is issued, a part of block file is required and mapped to the virtual disk.
The size depends on the coverage size of readahead.
– Wide readahead is effective for LBCAS driver.
• When a same block file is required sequentially, the block file is stored on the memory
cache of LBCAS and the uncompression is eliminated.
D3E14…
Download block files Map
LBCAS to loopback device
start ahead_start Low occupancy caused
I/O size mismatch
current_window ahead_window
Extend to “max_readahead”
sequential read
from application 3B441…
Stored D3E14…
Memory cache LBCAS
I/O
current_window ahead_window
sequential read
19
from application
20. Readahead and Block Reallocation
• Readahead can be improved by block reallocation of
File System, if the hit rate of page cache is increased.
• Defrag tools looks work well …
– Unfortunately, current defrag tools are not suitable, because
they are developed from the view of file defrag.
• We developed “ext2/3optimizer” which reallocate the
data blocks of ext2/3 based on access profile.
– It also increases occupancy in a block file.
20
21. Access profile and reallocation
App ext2/3optimizer App
User
Access Profile
Kernel (via /proc/ )
VFS VFS
File System Driver (ext2/3) File System Driver (ext2/3)
Profiler
Page Cache (Memory) Page Cache (Memory)
Readahead is
small and many Readahead is
sequential
(worm-eaten) Block Driver Block Driver
access
access (Loopback) (Loopback)
Device
Reallocate 21
scattered gathered
22. Block Relocation: Ext2/3optimizer [LinuxKongress06]
• Change data blocks to be arranged in line. Structure of meta data is not changed.
• The arrangement is based on the access profile.
• Feature:
– Normal driver is used.
– The fragmentation is occurred from the view of file
– The relocation increases page-cache hit. readahead extend the coverage size.
Mode Mode
Owner info Owner info
Size Size
high
Timestamps Timestamps readahead occupancy
is widen
Direct Blocks Direct Blocks
Indirect Blocks Indirect Blocks
Double Indirect Double Indirect
Triple Indirect Triple Indirect
22
23. Performance Analysis
• Confirm effect of ext2/3optimizer on LBCAS for booting.
– Ubuntu 9.04 (2.6.28) installed on ext3 (8GB) with KVM-60.
• The ext3 was optimized by ext2/3optimizer for boot profile.
• The disk image is translated to LBCAS (64KB - 512KB).
• Compare with
– Normal
– u-readahead: user level readahead (system call) for booting
– ext2/3optimzer
23
24. Static Analyze by DAVL (Disk Allocation Viewer for Linux)
Fragmentation 0.21% Fragmentation 1.11%
System
block
Non-
contiguous
block
contiguous
block
24
normal ext2/3opt
25. Utilization of I/O
• BootChart showed utilization of I/O.
– u-readahead caused spike of I/O.
normal u-readahead ext2/3opt
Reduced I/O
I/O Spike
25
26. Dynamic Analyze: Disk Access at boot time
• Ext2/3optimizer relocate data blocks, which are
required at boot time, at the top of virtual disk.
Red: normal
Blue: ext2/3opt
Time (s)
0 2.0 4.0 6.0 8.0
26
Address (GB)
27. Trace of readahead coverage size
128KB
normal
64KB
32KB
0KB
0 10 20 30 40 50 60
128KB
Time (s)
u-hreadahead
64KB
32KB
0KB
0 10 20 30 40 50 60
128KB Time (s)
Fewer small
ext2/3opt readahead
64KB
32KB
27
0KB
0 10 20 30 40 50 60
28. Frequency for each readahead coverage
• Ext2/3 optimizer reduced small “readahead”.
Frequency
0 32 64 128
28
request size (KB)
29. Volume Transition on processing level
normal u-readahead ext2/3opt
Volume of files (number, average) 203MB (2,248 Av: 92KB) 76MB (67%)
Volume of required blocks 127MB
+81MB +104MB +13MB
Volume of access which includes 208MB 231MB 1/2 140MB
coverage of readahead (frequency,
average size)
freq:6,379 1/3
freq:5, 827 freq:2,129
size:33KB size:41KB 2 size:67KB
• Volume of downloaded block files MB, (uncompressed MB),
Occupancy % (127MB/ uncompressed MB)
LBCAS size normal u-readahead ext2/3opt
64KB 86.1(247), 51.5% 93.4(272), 46.9% 55.3(144), 88.7%
128KB 96.8(290), 43.9% 104(315), 40.3% 55.3(149), 85.3%
256KB 114(358), 35.5% 123(386), 35.0% 55.6(159), 80.0%
512KB 144(474), 26.9% 153(508), 25.1% 55.6(176), 71.8% 29
30. Consumed time in LBCAS
Time (s)
43 43 42 37 43 43 45 38 45 45 46 44
13 13 13 20 14 14 12 19 7 6 6 7
normal u-readahead ext2/3opt
512KB was not efficient
on each optimization
Time (s)
5.0 6.5 9.0 14.0 5.2 6.7 7.3 11.4
2.5 2.8 3.5 4.8
5.7 4.6 4.7 3.1 6.6 5.8 2.9 4.5 3.6 2.7 1.7 1.1
30
normal u-readahead ext2/3opt
31. Total download of LBCAS
• Ext2/3opt reduced the necessary block files (256KB).
140
+ normal
120 □ u-readahead
× ext2/3opt
100
System call “readahead” downloaded
Download (MB)
required files in advance. It caused I/O
spike. It also included redundant data.
80
60
40
20
Time (s) 31
32. I/O Requests are
independent of
LBCAS
Frequency of function in LBCAS
normal Requests (R) Download Storage Uncompress Memory Files per request
(Av: 33KB) (D) Cache (U) Cache (M) R= ①+②+③
(S) D+S=U U+M=①+②*2+③*3
64KB 6,338 3,958 1,663 5,621 3,647 ① 4,148
② 1,450
③ 740
128KB 6,381 2,321 1,729 4,050 3,793 ① 4,919
② 1,462
256KB 6,379 1,435 1,748 3,183 3,908 ① 5,667
② 717
512KB 6,395 848 1,769 2,717 4,019 ① 6,054
② 341
u-readahead (Av: 41KB)
64KB 5,825 4,344 1,172 5,516 3,626 ① 3,537
② 1,259
③ 1,029
128KB 5,834 2,526 1,200 3,726 3,761 ① 4,181
② 1,653
256KB 5,827 1,544 1,179 2,723 3.,908 ① 5,023
② 804
512KB 5,822 1.015 1,172 2,187 4,023 ① 5,434
② 388
download uncompress
ext2/3opt (Av: 67KB) is reduced is reduced
64KB 2,165 2,296 626 2,922 1,311 ① 941
② 380
③ 844
128KB 2,148 1,189 593 1,782 1,398 ① 1,116
② 1,032
256KB 2,129 634 576 1,210 1,409 ① 1,639
② 490
512KB 2,132 353 517 870 1,520 ① 1,874 32
② 258
33. Discussions
• Weak point of ext2/3optimizer
– The reallocation is customized for booting. The other
applications may be subject to adverse effect.
• I guess boot procedure is special and has no strong relation
to other applications.
– The reallocation is customized for a certain version. When a
part of boot procedure is updated, we have to re-optimize the
image.
33
34. Conclusions
• “ext2/3optimzer” is a strong tool to utilize “readahead”,
because it reallocates data blocks which are used by boot
procedure.
– It increased occupancy (rate of necessary data in a block file)
of LBCAS block file.
– It made the coverage of readahead double and reduced the
number of readahead to half.
• “ext2/3optimizer” is not for LBCAS. It is used for
normal Linux Distributions.
34
35. Summary
The some services are available. Just try!
http://openlab.jp/oscircular/
EXT2/3optimizer developers
http://unit.aist.go.jp/itri/knoppix/ext2optimizer/index-en.htm
DAVL developers
http://sourceforge.net/projects/davl/
BootChart
http://www.bootchart.org/
35