2. The Art of Indexing
®
Not all indexing is the same
B-tree is the basis for almost all DB systems
• Data structure invented in 1972
• Has not kept up with hardware trends
‣ Works poorly on modern rotational disks
‣ Works poorly on SSD
Fractal Tree Indexes is the basis of TokuDB
• Scales with hardware
• Fast Indexing ➔ More Indexing ➔ Faster Queries
• Great Compression
• No Fragmentation
• Reduced wear on SSDs
4. How do Fractal Tree
Indexes outperform
B-trees?
First, some facts about storage systems
5. The Art of Indexing
®
Storage is quirky
Hard disks are slow for
random I/O but fast for
sequential I/O
Difference causes problems
like fragmentation, ...
6. The Art of Indexing
®
Storage is quirky
Hard disks are slow for
random I/O but fast for
sequential I/O
SSDs are fast for
random I/O but expensive for
sequential.
Garbage collection
causes artefacts: increased
wear, write cliffs...
Difference causes problems
like fragmentation, ...
7. The Art of Indexing
®
Storage ♡ Big Reads and Writes
Big I/Os
8. The Art of Indexing
®
Storage ♡ Big Reads and Writes
Big I/Os
Hard Disks: No
Fragmentation
9. The Art of Indexing
®
Storage ♡ Big Reads and Writes
Big I/Os
Hard Disks: No
Fragmentation SSDs: Less Garbage
Collection & Wear
10. The Art of Indexing
®
Storage ♡ Big Reads and Writes
Big I/Os
Hard Disks: No
Fragmentation SSDs: Less Garbage
Collection & Wear
But you only get the goodies if
each I/O has lots of new bytes
11. The Art of Indexing
®
Storage ♡ Big Reads and Writes
Big I/Os
Hard Disks: No
Fragmentation SSDs: Less Garbage
Collection & Wear
But you only get the goodies if
each I/O has lots of new bytes
If you read a big block, change one byte, then
write it, you get terrible wear problems
12. The Art of Indexing
®
Storage ♡ Big Reads and Writes
Big I/Os
Hard Disks: No
Fragmentation SSDs: Less Garbage
Collection & Wear
Both: Better
Compression
But you only get the goodies if
each I/O has lots of new bytes
If you read a big block, change one byte, then
write it, you get terrible wear problems
13. The Art of Indexing
®
Caching is great
• But you can’t cache all your data
• For stuff not in memory, you have to go to disk
Goal: Do the best we can for the stuff on disk
Data is big, RAM is Small
Data
RAM
15. The Art of Indexing
®
What’s a B-tree?
B
B B B... ...
B B……………………………………………….………….
Search Tree
16. The Art of Indexing
®
What’s a B-tree?
B
B B B... ...
B B……………………………………………….………….
Each node has B pivots
Search Tree
17. The Art of Indexing
®
What’s a B-tree?
B
B B B... ...
B B……………………………………………….………….
Each node has B pivots
Search Tree
Fetching pivots as a group
saves I/Os
18. The Art of Indexing
®
What’s a B-tree?
B
B B B... ...
B B……………………………………………….………….
Each node has B pivots
Search Tree
Fetching pivots as a group
saves I/Os
Most leaves are on disk, not in RAM
19. The Art of Indexing
®
What’s a B-tree?
B
B B B... ...
B B……………………………………………….………….
Each node has B pivots
Search Tree
Fetching pivots as a group
saves I/Os
Most leaves are on disk, not in RAM Inserting into a leaf that’s not in memory
will require an I/O per insertion
20. The Art of Indexing
®
B-tree Delivery Service
If fast memory is like walking across a room
• Each update in a B-tree is a walking trip from
‣ New York
21. The Art of Indexing
®
B-tree Delivery Service
If fast memory is like walking across a room
• Each update in a B-tree is a walking trip from
‣ New York to St Louis
22. The Art of Indexing
®
B-tree Delivery Service
If fast memory is like walking across a room
• Each update in a B-tree is a walking trip from
‣ New York to St Louis
Each item gets its own round trip!
23. The Art of Indexing
®
Real-world delivery
Keep regional warehouses
• Only move stuff when you can move a lot
24. The Art of Indexing
®
Real-world delivery
Keep regional warehouses
• Only move stuff when you can move a lot
Each item gets moved several times
25. The Art of Indexing
®
Real-world delivery
Keep regional warehouses
• Only move stuff when you can move a lot
Each item gets moved several times
But each trip is vastly cheaper
26. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
27. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
28. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
29. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
30. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
31. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
32. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
33. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
34. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
35. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
36. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
37. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
38. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
39. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
40. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
41. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
42. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
43. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
44. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
45. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
Flush a buffer
when it fills
46. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
Flush a buffer
when it fills
A flush might take an I/O, but it
does lots of useful work
47. The Art of Indexing
®
Fractal Tree Indexes
B
B B BB
Each node has
pivots & Buffers
Buffers fill as
updates arrive
Flush a buffer
when it fills
A flush might take an I/O, but it
does lots of useful work
More changes per write ➔
fewer changes for same write
load ➔ less SSD wear
48. The Art of Indexing
®
Fractal Tree Indexes: Queries
B
B B BB
Lots of buffers have
messages
49. The Art of Indexing
®
Fractal Tree Indexes: Queries
B
B B BB
Lots of buffers have
messages
But query follows
root-leaf path
50. The Art of Indexing
®
Fractal Tree Indexes: Queries
B
B B BB
Lots of buffers have
messages
But query follows
root-leaf path
So every query has the
most up-to-date information
51. The Art of Indexing
®
Fractal Tree Indexes: Queries
B
B B BB
Lots of buffers have
messages
But query follows
root-leaf path
So every query has the
most up-to-date information
Messages can be
insert, update, delete
52. The Art of Indexing
®
Fractal Tree Indexes: Schema Changes
B
B B BB
Schema Changes are
broadcast messages
53. The Art of Indexing
®
Fractal Tree Indexes: Schema Changes
B
B B BB
Schema Changes are
broadcast messages
54. The Art of Indexing
®
Fractal Tree Indexes: Schema Changes
B
B B BB
Schema Changes are
broadcast messages
55. The Art of Indexing
®
Fractal Tree Indexes: Schema Changes
B
B B BB
Schema Changes are
broadcast messages
Insertion into root is fast,
changes are immediate
56. The Art of Indexing
®
Fractal Tree Indexes: Schema Changes
B
B B BB
Schema Changes are
broadcast messages
Leaves get rewritten
when broadcast message
makes it that far
Insertion into root is fast,
changes are immediate
Schema change
message still on root-
leaf path
57. The Art of Indexing
®
Analysis
Delivery system gives goodies:
• Messages get moved, but each I/O pays for a lot of
movement
• You get very fast inserts
• You get Hot Schema Changes
Each flush carries lots of useful information
• So it’s worth it to make nodes big
• No fragmentation, Better Compression
• Much better wear on SSDs