Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
1. Fragmentation problem in vdisk enviroment
Dmitry Monakhov
2015-09-19
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 1 / 29
2. Outline
1 Introduction
2 FS fragmentation
3 An Era of Thin Provision Enviroment
4 Future work
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 2 / 29
3. Basic terminology
Filesystem divides it space in to blocks (usually 4k)
Files consists of blocks
File is fragmented if it's blocks are not continious
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 3 / 29
4. FS aging problem
Zillions of block-alloc, block-free iterations result in fs fragmentation
Most lesystem has eective and reliable techniques which prevents
fs aging
Block allocator try to spread data to whole disk
Block allocator try to pack small les together
Block allocator delay allocation untill close(2)/fsync(2)
Online/oine defragmentation tools [still required]
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 4 / 29
5. When defragmentation is required
There are situation when blockallocator tricks are not sucient
Filesystem is almost full (90%)
Weird falloc/unlink/fsync scenario
Special read pattern (boot speedup)
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 5 / 29
6. Fragmentation: More formal terminology
IntrA-le-Fragmentation(IAF) Fragmentation of a single le.
IntEr-le-Fragmentation(IEF) Fragmentation of a group of les
1
1Terminology from DFS paper
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 6 / 29
7. Existing tools
EXT4
Ioctl EXT4_MOVE_IOC (atomic) -
Swap blocks between donor and target file
Util: e4defrag(8) : defrag large files (*IAF*)
XFS
Ioctl XFS_IOC_SWAPEXT (non atomic)
Swap blocks between donor and target file
Util: xfs_fsr(8) defrag large files (*IAF*)
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 7 / 29
9. Virtual Disk: Things got complicated*
New indirection layer
Thin provision driver adds second space management layer, it divides
it space in to allocation blocks aka TPAB or buckets.
Bucket size != FS block size
TPAB is larger than fs block, but less than fs group
1M-4M Ploop, LVM-linear, QCOW2, Ceph(RBD)
64k-256k dm-thin,dm-snap
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 9 / 29
11. Customer's feedback
I've cotnainer with mail server inside which use 10Gb of data.
Your virtual disk use 40Gb of my super-fast SSD
WHY?
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 11 / 29
12. Virtual disk fragmentation example
root@dmlp:~# e2freefrag -c 4096 /dev/dm-1
Device: /dev/dm-1
Blocksize: 4096 bytes
Total blocks: 34126848
Free blocks: 12293324 (36.0%)
Chunksize: 4194304 bytes (1024 blocks)
Total chunks: 33328
Free chunks: 8379 (25.1%)
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 12 / 29
13. ThinProvision fragmentation problem
Visiable eect
Inecient free-space usage (up to 0.4%)
Bad IO performance
Why?
TRIM/Discard is useless
Existing FS defragmentation tools/techniques are useless
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 13 / 29
14. Who are aected?
Worst use-case
Many small les
A lot of create(2)/unlink(2)
Unpredictable lifetime
Massive write(2); sync(2)/fsync(2)
Bad pattern examples
Mail server
News server
Photo server
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 14 / 29
16. New TP defragmentation API wanted
New TP-aware block allocator for FS
New TP-aware defragment tool
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 16 / 29
17. TP-aware defragmentation tool principles
Take in to account TP layout
Relocate group of les to according to one TPAB
The only question left
What to relocate?
Where to relocate?
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 17 / 29
18. TP-aware defragmentation overview
1 Sequential scan of the block bitmap tables. Collect used blocks
(build spextent tree)
2 Scan lesystem hierarchy and collect extents ownership statistics.
3 Rescan lesystem tree prepare list of candidates for IEF
defragmentation.
Fix IntrA-le-Fragmentation(IAF) issues if discovered
4 Process IEF list and perform actual defragmentation
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 18 / 29
19. Pass1
Sequential scan of the block bitmap tables.
Build free-space tree.
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 19 / 29
20. Pass2
Scan lesystem hierarchy and collect extents ownership statistics.
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 20 / 29
21. Pass3
Rescan lesystem tree prepare list of candidates for IEF
defragmentation.
Which candidates are good?
Files which belongs to partly populated claster
Readonly les (old mtime or executable les)
Small les
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 21 / 29
26. Integration
OVZ case
call pcompact(8) nigtly from cron
pcompact invokes e4defrag2 and ploop compact for each ploop
Customer's feedback
Ok ploop image size is now ok, but...
Some times pcompact works all the time.
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 26 / 29
28. [Future works] Stanrard bitmap scan API required
Currently used block info is obtained via e2fsprogs/xfs-progs
XFS: Analog FS-wide analog of FIEMAP
XFS_IOC_FIEMAPFS
Implement ioctl for EXT4
Move userspace to this new IOCTL
Massive testing and ne tuning.
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 28 / 29
29. [Future works2] Smart block allocator
Dave Chinner suggest smart block allocator which encapsulate all
smart-disk internals
Hide SMR internals
Hide TP internals
Garbage collection
Samrt block allocator API proposal
Place my data somewhere, and tell me location
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 29 / 29