SlideShare a Scribd company logo
1 of 16
Download to read offline
Introducing TokuMX:
The Performance Engine for
MongoDB
Leif Walsh	

Senior Engineer, Tokutek	

leif@tokutek.com	

@leifwalsh
®
What is TokuMX?
!

• TokuMX = MongoDB with improved storage
!

• Drop in replacement for MongoDB v2.4 applications
• Including replication and sharding
• Same data model
• Same query language
• Drivers just work
• No Full Text or Geospatial
!

• Open Source
– http://github.com/Tokutek/mongo

®
B-tree Limitations
Performance is IO limited when bigger than RAM:	

try to fit all internal nodes and some leaf nodes
RAM

22

10

99

RAM

DISK
2, 3, 4

10,20

22,25

99

Plus, mmap.
®
TokuMX : Indexed Insertion

4

®
TokuMX : Indexed Insertion

5

®
TokuMX : Concurrency (>RAM)

6

®
TokuMX : Concurrency (<RAM)

7

®
TokuMX : Raw Compression
bittorrent data, size on disk, ~31 million inserts (lower is better)

TokuMX achieved	

11.6:1 compression

8

®
TokuMX : Compression : Field Names
synthetic data, size on disk, 100 million inserts (lower is better)

TokuMX is substantially
smaller, even without
compression

9

®
TokuMX : Compression : Field Names
synthetic data, size on disk, 100 million inserts (lower is better)

MongoDB was ~10%
smaller
In TokuMX, field name length has
almost no impact on size due to
compression

10

®
TokuMX : ACID + MVCC
• ACID
– In MongoDB, multi-insertion operations allow for partial
success
o Asked to store 5 documents, 3 succeeded

– In TokuMX, offer “all or nothing” behavior (atomic)

• MVCC
– In MongoDB, queries can be interrupted by writers.
o The effect of these writers are visible to the reader

– We offer MVCC
o Reads are consistent as of the operation start

11

®
Questions?

Leif Walsh	

Senior Engineer, Tokutek	

leif@tokutek.com	

@leifwalsh

®
TokuMX : Indexed Insertion
!
•

indexed insertion workload (iibench)
• http://github.com/tmcallaghan/iibench-mongodb

!
{ dateandtime: <date-time>,!
cashregisterid: 1..1000,!
customerid: 1..100000,!
productid: 1..10000,!
price: <double> }!

!
•
•

insert only, 1000 documents per insert, 100 million inserts
indexes
• price + customerid
• cashregister + price + customerid
• price + dateandtime + customerid

!

13

®
TokuMX : Concurrency
!

• Sysbench read-write workload
• point and range queries, update, delete, insert
• http://github.com/tmcallaghan/sysbench-mongodb
!

{ _id: 1..10000000,!
k: 1..10000000,!
c: <120 char random string ###-###-###>,!
pad: <60 char random string ###-###-###>}

14

®
TokuMX : Raw Compression
• BitTorrent Peer Snapshot Data (~31 million documents)
• 3 Indexes : peer_id + created, torrent_snapshot_id + created, created

!
{
 
 
 
 
 
 
 
 
 
 
 
 

id: 1,!
peer_id: 9222,!
torrent_snapshot_id: 4,!
upload_speed: 0.0000,!
download_speed: 0.0000,!
payload_upload_speed: 0.0000,!
payload_download_speed: 0.0000,!
total_upload: 0,!
total_download: 0,!
fail_count: 0,!
hashfail_count: 0,!
progress: 0.0000,!
created: "2008-10-28 01:57:35" }!

!
http://cs.brown.edu/~pavlo/torrent/

15

®
TokuMX : Compression : Field Names
!

schema 1 - long field names (10/20/20)
{ first_name
: “Tim”, !
last_name
: “Callaghan”, !
email_address : “tim@tokutek.com” }
!

schema 2
{ fn :
ln :
ea :

- short field names (26 less bytes per doc)
“Tim”, !
“Callaghan”, !
“tim@tokutek.com” }

!

16

®

More Related Content

What's hot

Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using Starling
Erik Osterman
 
HTML5 Programming
HTML5 ProgrammingHTML5 Programming
HTML5 Programming
hotrannam
 

What's hot (19)

Fluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconFluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at Kubecon
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using Starling
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Automation m ysql_and_customer_photo
Automation m ysql_and_customer_photoAutomation m ysql_and_customer_photo
Automation m ysql_and_customer_photo
 
Node.js
Node.jsNode.js
Node.js
 
Building an Efficient AI Training Platform at bilibili with Alluxio
Building an Efficient AI Training Platform at bilibili with AlluxioBuilding an Efficient AI Training Platform at bilibili with Alluxio
Building an Efficient AI Training Platform at bilibili with Alluxio
 
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014
 
Node.js and Ruby
Node.js and RubyNode.js and Ruby
Node.js and Ruby
 
Yet another json rpc library (mole rpc)
Yet another json rpc library (mole rpc)Yet another json rpc library (mole rpc)
Yet another json rpc library (mole rpc)
 
PLNOG 4: Ela Jasińska - (Ab)Using Route Servers
PLNOG 4: Ela Jasińska -  (Ab)Using Route ServersPLNOG 4: Ela Jasińska -  (Ab)Using Route Servers
PLNOG 4: Ela Jasińska - (Ab)Using Route Servers
 
gRPC & Kubernetes
gRPC & KubernetesgRPC & Kubernetes
gRPC & Kubernetes
 
PHP at Density and Scale
PHP at Density and ScalePHP at Density and Scale
PHP at Density and Scale
 
HTML5 Programming
HTML5 ProgrammingHTML5 Programming
HTML5 Programming
 
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 

Viewers also liked

Visualization of large FEM meshes
Visualization of large FEM meshesVisualization of large FEM meshes
Visualization of large FEM meshes
Tomáš Hnilica
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithms
rajatmay1992
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
Francisco Zamora-Martinez
 

Viewers also liked (20)

Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech Tagging
 
Algorithms : Introduction and Analysis
Algorithms : Introduction and AnalysisAlgorithms : Introduction and Analysis
Algorithms : Introduction and Analysis
 
Visualization of large FEM meshes
Visualization of large FEM meshesVisualization of large FEM meshes
Visualization of large FEM meshes
 
Efficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsEfficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based models
 
Integration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingIntegration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs Training
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithms
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
 
Making Static Pivoting Scalable and Dependable
Making Static Pivoting Scalable and DependableMaking Static Pivoting Scalable and Dependable
Making Static Pivoting Scalable and Dependable
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data Processing
 
Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
 
PhD defence
PhD defencePhD defence
PhD defence
 
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
 
The Language of Compression
The Language of CompressionThe Language of Compression
The Language of Compression
 

Similar to Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
Serdar Buyuktemiz
 

Similar to Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10) (20)

Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDB
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free Replication
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to Practice
 
[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue
 
Engage 2019: Introduction to Node-Red
Engage 2019: Introduction to Node-RedEngage 2019: Introduction to Node-Red
Engage 2019: Introduction to Node-Red
 
Benchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible DisastersBenchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible Disasters
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
 
Stackato v5
Stackato v5Stackato v5
Stackato v5
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
 
Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering
 
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
 
Scaling with mongo db (with notes)
Scaling with mongo db (with notes)Scaling with mongo db (with notes)
Scaling with mongo db (with notes)
 

Recently uploaded

Recently uploaded (20)

Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 

Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

  • 1. Introducing TokuMX: The Performance Engine for MongoDB Leif Walsh Senior Engineer, Tokutek leif@tokutek.com @leifwalsh ®
  • 2. What is TokuMX? ! • TokuMX = MongoDB with improved storage ! • Drop in replacement for MongoDB v2.4 applications • Including replication and sharding • Same data model • Same query language • Drivers just work • No Full Text or Geospatial ! • Open Source – http://github.com/Tokutek/mongo ®
  • 3. B-tree Limitations Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes RAM 22 10 99 RAM DISK 2, 3, 4 10,20 22,25 99 Plus, mmap. ®
  • 4. TokuMX : Indexed Insertion 4 ®
  • 5. TokuMX : Indexed Insertion 5 ®
  • 6. TokuMX : Concurrency (>RAM) 6 ®
  • 7. TokuMX : Concurrency (<RAM) 7 ®
  • 8. TokuMX : Raw Compression bittorrent data, size on disk, ~31 million inserts (lower is better) TokuMX achieved 11.6:1 compression 8 ®
  • 9. TokuMX : Compression : Field Names synthetic data, size on disk, 100 million inserts (lower is better) TokuMX is substantially smaller, even without compression 9 ®
  • 10. TokuMX : Compression : Field Names synthetic data, size on disk, 100 million inserts (lower is better) MongoDB was ~10% smaller In TokuMX, field name length has almost no impact on size due to compression 10 ®
  • 11. TokuMX : ACID + MVCC • ACID – In MongoDB, multi-insertion operations allow for partial success o Asked to store 5 documents, 3 succeeded – In TokuMX, offer “all or nothing” behavior (atomic) • MVCC – In MongoDB, queries can be interrupted by writers. o The effect of these writers are visible to the reader – We offer MVCC o Reads are consistent as of the operation start 11 ®
  • 12. Questions? Leif Walsh Senior Engineer, Tokutek leif@tokutek.com @leifwalsh ®
  • 13. TokuMX : Indexed Insertion ! • indexed insertion workload (iibench) • http://github.com/tmcallaghan/iibench-mongodb ! { dateandtime: <date-time>,! cashregisterid: 1..1000,! customerid: 1..100000,! productid: 1..10000,! price: <double> }! ! • • insert only, 1000 documents per insert, 100 million inserts indexes • price + customerid • cashregister + price + customerid • price + dateandtime + customerid ! 13 ®
  • 14. TokuMX : Concurrency ! • Sysbench read-write workload • point and range queries, update, delete, insert • http://github.com/tmcallaghan/sysbench-mongodb ! { _id: 1..10000000,! k: 1..10000000,! c: <120 char random string ###-###-###>,! pad: <60 char random string ###-###-###>} 14 ®
  • 15. TokuMX : Raw Compression • BitTorrent Peer Snapshot Data (~31 million documents) • 3 Indexes : peer_id + created, torrent_snapshot_id + created, created ! {                         id: 1,! peer_id: 9222,! torrent_snapshot_id: 4,! upload_speed: 0.0000,! download_speed: 0.0000,! payload_upload_speed: 0.0000,! payload_download_speed: 0.0000,! total_upload: 0,! total_download: 0,! fail_count: 0,! hashfail_count: 0,! progress: 0.0000,! created: "2008-10-28 01:57:35" }! ! http://cs.brown.edu/~pavlo/torrent/ 15 ®
  • 16. TokuMX : Compression : Field Names ! schema 1 - long field names (10/20/20) { first_name : “Tim”, ! last_name : “Callaghan”, ! email_address : “tim@tokutek.com” } ! schema 2 { fn : ln : ea : - short field names (26 less bytes per doc) “Tim”, ! “Callaghan”, ! “tim@tokutek.com” } ! 16 ®