FractalTreeIndex

Fractal Tree Index Akhil Sreenath
1
CMPE 226
Fractal Tree Index
A deep dive Research Report
By
CMPE 226
Akhil Sreenath

2
CMPE 226
Table of Contents
Introduction ..................................................................................................................................................1
What is fractal Tree Index?...........................................................................................................................1
How Fractal Tree works? ..............................................................................................................................2
Performance and analysis of Fractal Tree Index:..........................................................................................4
Improve worst case insertion: ..................................................................................................................5
Search Index Performance:.......................................................................................................................5
Fragmentation: .........................................................................................................................................6
B-Tree:...................................................................................................................................................6
Fractal Tree: ..........................................................................................................................................6
Schema changes:.......................................................................................................................................7
Performance in Hard Disk and SSD:..........................................................................................................8
Hard Disk:..............................................................................................................................................8
SSD: .......................................................................................................................................................8
How Fractal Tree Indexing works in MongoDB:............................................................................................8
Conclusion:..................................................................................................................................................10
References: .................................................................................................................................................10

1
CMPE 226
Introduction:
This paper deals with Fractal Tree index that can be used in MySQL and MongoDB. Fractal Tree
index is a data structure that will enable fast retrieval of the data. Fractal Tree files executes the
same operations as B-tree and effectively replaces small and more frequent writes with large
and less frequent writes which results in better insertion and compression performance.
What is fractal Tree Index?
Tokutek’s has patented Fractal Tree technology. Tokutek’s Experts made a lot of research and
development on cache-oblivious algorithmic before developing Fractal Tree. It is a highly write-
optimized algorithm that radically decreases I/O through astute buffering. Fractal Tree Index is
a data structure that store data in sorted order and allows search and sequential access same
as B-Tree but insertion and deletion are much faster than B-Tree. Each node has a buffer and
insertion, deletion and other changes made are stored in these intermediate location. The main
goal of the Buffer is to schedule a disk write, so that each write on the disk perform a lot of
valuable work. Fractal Tree index are highly optimized for large writes and reads blocks of data.
Fractal Tree index are actually based on Cache oblivious algorithm. In cache oblivious algorithm,
performance is measured by number of block transferred by disk to cache and not on the cache
or block size. So performance is independent of machine architecture and also doesn’t depend
on number of layers of the cache.

2
CMPE 226
All internal nodes have message Buffer
As Buffer overflows they cascade down the tree.
Figure 1: Fractal Tree Index
How Fractal Tree works?
If there are N rows, then fractal index tree has log2N arrays. Each array is either completely full
or empty and all the arrays are sorted[1].
For example if Aj is an array then it can hold maximum of 2 (J-1) rows,
Ex: A1={}, A2 = {},A3={3,7,10,11} In this case maximum value of J is 4
Consider the above sample fractal tree index, If I want to insert 15 in the tree, we will insert in
the first array as it can accommodate one element in its first array. If I want to add one more
Message
BufferLeaf
node
Leaf node Leaf node
Message
Buffer
Message
Buffer
Leaf node Leaf node Leaf node Leaf node

3
CMPE 226
element, I cannot accommodate in any of the existing arrays, as all Arrays are full. I cannot
accommodate the element in A3 as it has 4 empty spaces. Temporary index will be created to
accommodate new element.
So new element is added to the temporary index in a single array. Now we have 15 and 7
occupying first array of both index. Those two single arrays are merged to form the new array
of two fields accommodating both element 7 and 15.
Now two 2-arrays will be merged to form 4-Array in the original index.
Third and Fourth array is completely filled and the rest of the arrays will be empty.

4
CMPE 226
Performance and analysis of Fractal Tree Index:
Time complexity in big O notation:
Average Worst case
Insert O(logB N/Bε
) O(logB N/Bε
)
Delete O(logB N/Bε
) O(logB N/Bε
)
Bsize of Block of Memory
Nsize of array
Fractal Tree index uses smaller branching factor like √𝐵 (less than B),so the depth of the tree
will be O(log√BN) Performance of Fractal tree index is better than traditional B-Tree indexes.
If we consider two array of size N, then cost of merge of two arrays will be 𝑂(
𝑁
𝐵
) block
transfers.
 Merging of two arrays is I/O efficient
 Cost per element to merge will be O(1/B) since O(N) elements are merged
 Maximum number of times each row will be merged is O(log2N)
 Average insertion cost would be O(
𝐿𝑜𝑔 𝑁
𝐵
)

5
CMPE 226
Improve worst case insertion:
Lot of Arrays is merged during the process of insertion of an element as the cost of merging is
low. Separate threads are maintained to merge the arrays. Inserting of elements in the fractal
index will return the result quickly. The thread which is performing merging operation won’t fall
behind as long as we merge Ω(log N) arrays for every insertion.
Now let’s consider the cost of insertion, An insertion takes at O (logBN /√B) which is faster
than B-Tree by O(√𝐵).
Search Index Performance:
To search any particular row in a fractal tree index, perform binary search for all the log N
Arrays and the time complexity would be log2 N. This can be enhanced by keeping forward
pointers from rows in an array to the rows in the next column. In the figure below, 14 points to
the number greater than that in the row ie 25 and 25 points to the 26 to its next row.
This will reduce the search time, as we know the position of the next element to be searched. It
would reduce the search time complexity to O(log2N).

6
CMPE 226
Fragmentation:
Fragmentation reduce the performance of a system or database as scanning through the chunk
of rows causes disk head to move all round the hard drive o search for the net row or element
in the index.
B-Tree:
Both Clustering and Non Clustering B-Trees has a fragmentation. If we insert data in a Non-
Clustering B-Tree, Logical order of the rows is completely unrelated with physical placement on
the disk. For a range queries, Scanner has to go through all chunks of data by moving disk head
around for each row which causes a lot of overhead. Non-Clustering B-Tree index is not
recommended for Range queries.
Fractal Tree:
Fractal Tree is not fragmented. Both Primary and secondary indexes are not fragmented. Also
there is no inherent tradeoff between fragmentation and insertion speed. Fractal trees perform
much better in insertion than B-Tree and with no fragmentation. So B-trees sit on a tradeoff
bend, however not the best conceivable tradeoff curve [2].

7
CMPE 226
Schema changes:
Schema changes will inject broadcast messages, which goes in all the directions by visiting all
the buffers and flushed eventually down to all the leaf. If I want to add column or row into the
table, message can be broadcasted from the root node. So whenever the query generated next
time, it gets to know about the change in column in the buffer as schema change messages are
present in the entire buffer. So results of the query will be according to the new schema.
Performance is highly increased as successful queries are made with changed schema even
before actually writing to the leaf node.
In the figure below, Red color dots are Schema change broadcast messages that is located in all
the buffers.

8
CMPE 226
Performance in Hard Disk and SSD:
Hard Disk:
Performance in Hard Disk is improved as there is no Fragmentation with Fractal Tree indexing.
Whenever the query is made, values are fetched quickly compared to other external memory
indexing like B-Tree.
SSD:
As SSD is very expensive, algorithms or data structure that support better compression
techniques is preferred. Fractal Tree Index supports better compression techniques which
significantly improves storage performance of SSD. Fractal Tree index has bulk and less
frequent writes which is very useful for SSD. This reduces SSD wear out and increase the life
span of SSD
How Fractal Tree Indexing works in MongoDB:
 In Mongo Db all fields in the document are always available for index.
 All the leaf nodes fit in the RAM, so no IO required.
 Messages are buffered in all the internal nodes.
 Nodes are larger than B-Tree (4 MB) that leads to higher compression ratio.
 Whenever deletion or insertion is made there is no need to update leaf node
immediately

9
CMPE 226
 Better compression
When the Buffer is full messages are pushed down to the next level
Inserted 20, 25 and deleted 7
21
11 31
Insert(20)
Insert(25)
delete(7)
1,5,7 12 24,26 35
Insert(15)
21
11 31
1,5 12,20 24,25 ,26 35
Insert(15)

10
CMPE 226
Consider insertions and deletions are made in MongoDB, it is not directly updated in the leaf
node. Initially it is stored in buffer as a message and if the buffer is full it will be pushed down to
the next level.
Conclusion:
Fractal Tree Index is a write optimized Algorithm which can be used in those areas where there
is more Insert, delete or update operations in the table. It significantly improves SSD storage
performance due to the less frequent and Bulky writes. Fractal Tree Index is well suited for
point queries and it is great for range queries. Even Schema changes made very simple and fast.
References:
[1]http://cdn.oreillystatic.com/en/assets/1/event/36/How%20TokuDB%20Fractal%20Tree%20
Databases%20Work%20Presentation.pdf
[2] http://www.tokutek.com/2010/11/avoiding-fragmentation-with-fractal-trees/
[3]https://oracleus.activeevents.com/2013/connect/fileDownload/session/C7B372C894D62F39
5B0EB2C5E0B9AD04/CON4645_Narvaja-MySQL%20Connect%2020130921.pdf
[4] http://www.mongodb.com/presentations/mongodb-boston-2012/mongodb-and-fractal-
tree-indexes
[5] http://www.odbms.org/wp-content/uploads/2013/11/OptimizingMongoDBWithFTI.pdf

FractalTreeIndex

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (7)

Similaire à FractalTreeIndex

Similaire à FractalTreeIndex (20)

FractalTreeIndex