TokuDB v7.5 introduced Read Free Replication, allowing MySQL slaves to run with virtually no read IO. This presentation discusses how Fractal Tree indexes work, what they enable in TokuDB, and they allow TokuDB to uniquely offer this replication innovation.
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Introduction to TokuDB v7.5 and Read Free Replication
1. What’s New in TokuDB 7.5
Read Free Replication
and more!
Tim Callaghan, Tokutek
tim@tokutek.com
@tmcallaghan
2. Company
• Two high-performance database solutions for big data
• NoSQL: TokuMX™ for MongoDB
• NewSQL: TokuDB® for MySQL, MariaDB & Percona
Server
• Radical new storage for larger-than-RAM datasets
• Fractal Tree® indexing technology
• Data science research at M.I.T., Rutgers, Stony Brook
• Open source
• Example: Red Hat (Linux) & Canonical (Ubuntu)
4. Ever seen this?
IO Utilization Graph, write performance is IO limited
5. Agenda
• What is a Fractal Tree index?
• No Read Free Replication without them!
• What Fractal Tree indexes enable in MySQL,
MariaDB, and Percona Server
• TokuDB!
• What’s new in TokuDB 7.5
• Read Free Replication and more
• Q+A
12. Performance is IO limited when data > RAM,
one IO is needed for each insert/update
(actually it’s one IO for every index on the table)
RAM
RAM
DISK
B-tree Overview - performance
22
10 99
2, 3, 4 10,20 22,25 99
14. Fractal Tree indexes
similar to B-trees
• store data in leaf nodes
• use index key for ordering
message
buffer
message
buffer
message
buffer
All internal nodes
have message
buffers
As buffers overflow,
they cascade down
the tree
Messages are
eventually applied
to leaf nodes
different than B-trees
• message buffers
• big nodes (4MB vs. ~16KB)
15. Doesn’t InnoDB Have Buffers?
InnoDB Change
Buffer
• It sure does!
• No buffer for the primary key index
• One buffer for each secondary index
• http://dev.mysql.com/doc/refman/5.5/en/innodb-performance-change_buffering.html
17. InnoDB Buffers Help (for a while)
• Buffering allows for IO amortization
(> 1 operation per IO)
• When data gets large enough the
single buffer can’t help (blue = red)
18. Fractal Tree Indexes - sample data
25
10 99
2,3,4 10,20 22,25 99
Looks a lot like a B-tree!
19. Fractal Tree Indexes - insert
insert 15;
25
10 99
insert (15)
2,3,4 10,20 22,25 99
• search operations must consider messages along the way
• messages cascade down the tree as buffers fill up
• they are eventually applied to the leaf nodes, hundreds or
thousands of operations for a single IO
• CPU and cache are conserved as important data is not ejected
20. Fractal Tree Indexes - other operations
25
add_column(c4 bigint)
10 99
delete(99)
increment(22,+5)
...
insert (100)
delete(8)
delete(2)
insert (8)
2,3,4 10,20 22,25 99
Lots of operations can be messages!
22. 22
What is TokuDB?
• Transactional MySQL Storage Engine - think InnoDB
• Available for MySQL 5.5 and MariaDB 5.5
• Percona Server 5.6 and MariaDB 10.0 too
• ACID and MVCC
• Free/OSS Community Edition
• http://github.com/Tokutek/ft-engine
• Enterprise Edition
• Commercial support + hot backup
Performance + Compression + Agility
32. 32
Schema Changes Without Downtime?
• In TokuDB, column add/drop/expand is instant
• “it’s just a message” – Fractal Tree index
• No need for helper tools
• MySQL 5.6 or Percona Tools
• Operation is still expensive (table rewrite)
• Or, no need to change on slave, then switch
with master and repeat
34. 34
TokuDB 7.5 – Small Stuff
• Updated MySQL and MariaDB to 5.5.39
• Allow XA transactions to skip fsync() in prepare phase
• XA means a multi-statement transaction that includes
TokuDB another another XA engine (InnoDB)
• Community Contribution by Bohu TANG
• Hot backup now supports multiple directories
• datadir plus log_bin, tokudb_data_dir, tokudb_log_dir
• Additional bulk fetch – this is not small!
• Was just “select *”
• Now includes “insert into select …”, “replace into
select …”, “insert ignore select …”, insert into select …
on duplicate key update …”, and “delete from select
…”
36. 36
MySQL Replication - Modes
MySQL supports three replication modes
• Statement Based
• SQL statements are logged and replayed on slaves
• "insert into foo values (1,1);"
• Good for when statement affects a lot of rows
• "insert into foo select * from bar;"
• Row Based
• Before and after images of affected rows are logged and
replayed on slaves
• foo : before (1,1) after (1,2)
• Mixed
• Statement based unless it is determined to be unsafe, at
which point row based
• "update foo set c1=5 limit 5;"
37. 37
MySQL Replication – Read Only
Setting the slave's read_only=1
• Puts the slave in "read only" mode
• Except that user's with the SUPER privilege are allowed
to insert/update/delete
– This can break RFR!
38. 38
MySQL Replication – Slave Apply
Simple (hand wavy) overview of the slave process
• Read a replication event from the binary log
• If "statement"
• Execute the statement on the slave
• If "row"
• Insert, just write the row
• Delete/Update, lookup the row
• If row doesn’t exist, stop replication
39. 39
MySQL Replication – Slave Lag
In MySQL 5.5, replication is single threaded
• Masters support concurrency, slaves do not
• Causes slaves to "lag" behind the master
Improvements exist
• MySQL 5.6 supports multi-threaded slaves (database)
• MariaDB 10.0 it's own parallel replication mechanism
All of these will work with TokuDB's Read Free Replication
41. 41
Read Free Replication - Requirements
• What is required on the master?
• binlog_format=ROW
• What is required on the slave?
• read_only=1
• tokudb_rpl_unique_checks=0 and/or
tokudb_rpl_lookup_rows=0
42. 42
RFR Optimization #1 – Skip Unique Checks
• tokudb_rpl_unique_checks=0
• Why is it OK?
• The master already performed the uniqueness
check
• Why can't InnoDB skip unique checks?
• It could, but…
• InnoDB doesn't support change buffering on the PK
• So, the row must be read for maintenance
• Since it is then in memory, there is little to be gained for
skipping the check
43. RFR Optimization #2 – Skip Read/Modify/Write
43
• tokudb_rpl_lookup_rows=0
• Why is it OK?
• If RBR, master provided before/after row images
• Why can't InnoDB skip read/modify/write?
• InnoDB doesn't support change buffering on the PK
• So, the row must be read for maintenance
• Why can TokuDB skip read/modify/write?
• Everything necessary is in the binary log
• Simple message injection
47. 47
Read Free Replication - Ideas
#1, scale your reads
• HA is nice, but don’t we also want to scale our reads?
Master Slave
IO
IO
Slave
Workload
Workload
Readers
Readers
No
RFR
RFR
Master
IO
IO
50. 50
Can we do even more?
• Yes, Reduce fsync() calls on slaves - writes
• This is the current choke point for RFR slaves
• In 5.5, master.info and relay-log.info are just files
• Each need fsync() for crash safety
• In 5.6, these files can be InnoDB tables
• 3 fsync() operations are now 1
• However, this becomes an XA transaction when
TokuDB tables are in use
• Even more fsync() calls
• Should be able to convert these to TokuDB
51. 51
TokuDB Resources
• Website @ www.tokutek.com
• Documentation @ docs.tokutek.com/tokudb
• Community @ tokudb-user Google Group
• Tokutek Blogs @ www.tokutek.com/tokuview
52. Thank you!
Thank you for attending!
Enter questions into the chat box
Contact us: contact@tokutek.com