Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
VAST-Tree, EDBT'12
1. VAST-Tree: A Vector-Advanced and
Compressed Structure
for Massive Data Tree Traversal
EDBT 2012, March 27th-29th, 2012
Humboldt University, Berlin
1
2. Outline
• Backgrounds & Motivation
– Modern HW and HW-aware algorithms
• Prerequisite Knowledge
– Search keys with SIMD instructions
• Proposed Technique
– Branch compression for high parallelization
• Experimental
– Twitter Public Timeline as a real data set
– Compression ratio and throughput
2
3. Backgrounds: Modern Hardware
• Fast and highly-functional Hardware
– Multi-/Many-core CPUs
Intel Ivy bridge/Haswell/Skylake/Knights Ferry
– GPUs for General Purpose
– ...
• New algorithms advanced by these hardware
– Sorts, Searches, Compression, and DB kernels
A today topic: tree searches on multi-core CPUs
3
4. Backgrounds: Multi-core CPUs
• Highly-advanced instructions
– 128/256/512-bit SIMD, Transactional Memory, ...
• Branch Prediction
– Process “if-then” paths efficiently
– High penalties of branch misses
• Parallelism & Memory
– Many cores on a single processor
– Limited by memory accesses [5][14][15]
4
5. Backgrounds: Tree Traversal
• Search a key from a sequence of values
• Fundamental operations
– Used everywhere, and well-known
search_key
Code Snippet:
48 if (search_key >
node->compare_key)
12 68 node = node->right;
else
node = node->left;
7 20
Ex.) Binary-Tree 5
6. Backgrounds: Tree Traversal
• But, legacy algorithms too inefficient
Actual Execution time: 20-40%
100% 6.0E+03
Ratio of execution time
# of instructions
4.0E+03
complete instructions
50% stall time
branch penalties
# of instructions 2.0E+03
0% 0.0E+00
22(0.161) 24(0.167) 26(0.206) 28(0.319)
log2(# of keys)
6
7. Backgrounds: Existing Algorithms
• Cache-conscious B+Tree [4][10][11][19][20]
– Realigning, prefetching, and buffering nodes
• FAST [14]
– Cache-conscious and branch-free techniques
– SIMD instructions used for branch-free searches
• PALM [24]
– Support incremental updates for FAST
7
9. Prerequisite Knowledge: Searches with SIMD
• Process multiple data with SIMD instructions
– Most x86 processors support 128bit SIMD
– Return 1 or 0 with inequality relation
• FAST compare 3 keys simultaneously
32bit 128bit
Register A: 34 78 91 x
Register B: 79 79 79 x
Register C: 1 1 0 x 9
11. Logical Example: Searches with SIMD
: SIMD blocks compared simultaneously
79 : A search key
Compare keys with SIMD
11
12. Logical Example: Searches with SIMD
: SIMD blocks compared simultaneously
79 : A search key A lookup table
Returned Offset
Values Blocks
Compare keys with SIMD ... ...
1 1 0 x 3
... ...
1 2 3 4
12
13. Logical Example: Searches with SIMD
: SIMD blocks compared simultaneously
79 : A search key
Move to a next SIMD block 13
14. Physical Example: Searches with SIMD
• Arrange SIMD blocks in breadth first order on
physically consecutive memory
14
15. Physical Example: Searches with SIMD
• Arrange SIMD blocks in breadth first order on
physically consecutive memory
36B Offset Jumps!
[34, 78, 91], [2, 11, 23], [35, 39, 49], [80, 87, 88], ...
To high addresses in memory
Each SIMD block is 12B
15
16. Issue: Number of Comparison Keys
• More keys compared simultaneously!
– SIMD supports 1byte and 2byte elements
x x
x x
x x
1byte each and 16 elements 2byte each and 8 elements
16
18. VAST-Tree: Designing Data Structure
• Classify branches into 3 layers
– Apply FAST to P32, and compress keys in P16 and P8
: SBs - SIMD blocks
(H32)
: CBs - Compression blocks
2byte keys, and 7 keys
compared simultaneously (H16)
1byte keys, and 15 keys (H8)
compared simultaneously
18
20. Proposed: Branch Lossy Compression
• Apply to each compression block
– Prefix and suffix bit truncation
• Transform ‘search’ keys similarly for comparison
– Extracted bit location stored in the header of CBs
Remove lower bits 1byte keys
Ascending order
keys in a CB
1
Extract partial bits with
red background 20
21. Penalty: Comparison Errors
• But, some lossy keys compared incorrectly
Example)
value1 - 3220 (1100 1001 01002=20110)
value2 - 3219 (1100 1001 00112=20110)
Original Values: 3220 > 3219 --> Return 0
A error happens!
Compressed Values: 201 ≦ 201 --> Return 1
• Check and correct errors after tree traversal
– Scan leaf nodes sequentially
21
22. Proposed: SIMD-Aligned Layouts
• Load data efficiently to SIMD registers
• A few padding spaces between blocks
– Many blanks caused by page alignment in FAST
Each block is SIMD-length aligned
SBs
CBs
22
23. Proposed: SIMD-Aligned Layouts
• Load data efficiently to SIMD registers
• A few padding spaces between blocks
– Many blanks caused by page alignment in FAST
Each block is SIMD-length aligned
SBs
CBs Padding spaces
23
24. Proposed: Other Topics
• Linear search optimization
– Remove bottom SBs
• Apply P4Delta to leaf nodes
– A lossless compression method Compress fixed k keys
into a chunk
Keys in leaf nodes:
Single chunk Single chunk 24
32. Summary & Future Work
• Proposed lossy compression for high parallelization
– Linear search opt., leaf compression, and others
• Experimental Evaluation
– Compress branch nodes dynamically
– Improve throughput and compression ratio
– Throughput worsen by leaf compression
• Future Work
– Update supports, and more amount of keys
32