SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Introducing TokuMX:
The Performance Engine for
MongoDB
Leif Walsh	

Senior Engineer, Tokutek	

leif@tokutek.com	

@leifwalsh
®
What is TokuMX?
!

• TokuMX = MongoDB with improved storage
!

• Drop in replacement for MongoDB v2.4 applications
• Including replication and sharding
• Same data model
• Same query language
• Drivers just work
• No Full Text or Geospatial
!

• Open Source
– http://github.com/Tokutek/mongo

®
B-tree Limitations
Performance is IO limited when bigger than RAM:	

try to fit all internal nodes and some leaf nodes
RAM

22

10

99

RAM

DISK
2, 3, 4

10,20

22,25

99

Plus, mmap.
®
TokuMX : Indexed Insertion

4

®
TokuMX : Indexed Insertion

5

®
TokuMX : Concurrency (>RAM)

6

®
TokuMX : Concurrency (<RAM)

7

®
TokuMX : Raw Compression
bittorrent data, size on disk, ~31 million inserts (lower is better)

TokuMX achieved	

11.6:1 compression

8

®
TokuMX : Compression : Field Names
synthetic data, size on disk, 100 million inserts (lower is better)

TokuMX is substantially
smaller, even without
compression

9

®
TokuMX : Compression : Field Names
synthetic data, size on disk, 100 million inserts (lower is better)

MongoDB was ~10%
smaller
In TokuMX, field name length has
almost no impact on size due to
compression

10

®
TokuMX : ACID + MVCC
• ACID
– In MongoDB, multi-insertion operations allow for partial
success
o Asked to store 5 documents, 3 succeeded

– In TokuMX, offer “all or nothing” behavior (atomic)

• MVCC
– In MongoDB, queries can be interrupted by writers.
o The effect of these writers are visible to the reader

– We offer MVCC
o Reads are consistent as of the operation start

11

®
Questions?

Leif Walsh	

Senior Engineer, Tokutek	

leif@tokutek.com	

@leifwalsh

®
TokuMX : Indexed Insertion
!
•

indexed insertion workload (iibench)
• http://github.com/tmcallaghan/iibench-mongodb

!
{ dateandtime: <date-time>,!
cashregisterid: 1..1000,!
customerid: 1..100000,!
productid: 1..10000,!
price: <double> }!

!
•
•

insert only, 1000 documents per insert, 100 million inserts
indexes
• price + customerid
• cashregister + price + customerid
• price + dateandtime + customerid

!

13

®
TokuMX : Concurrency
!

• Sysbench read-write workload
• point and range queries, update, delete, insert
• http://github.com/tmcallaghan/sysbench-mongodb
!

{ _id: 1..10000000,!
k: 1..10000000,!
c: <120 char random string ###-###-###>,!
pad: <60 char random string ###-###-###>}

14

®
TokuMX : Raw Compression
• BitTorrent Peer Snapshot Data (~31 million documents)
• 3 Indexes : peer_id + created, torrent_snapshot_id + created, created

!
{
 
 
 
 
 
 
 
 
 
 
 
 

id: 1,!
peer_id: 9222,!
torrent_snapshot_id: 4,!
upload_speed: 0.0000,!
download_speed: 0.0000,!
payload_upload_speed: 0.0000,!
payload_download_speed: 0.0000,!
total_upload: 0,!
total_download: 0,!
fail_count: 0,!
hashfail_count: 0,!
progress: 0.0000,!
created: "2008-10-28 01:57:35" }!

!
http://cs.brown.edu/~pavlo/torrent/

15

®
TokuMX : Compression : Field Names
!

schema 1 - long field names (10/20/20)
{ first_name
: “Tim”, !
last_name
: “Callaghan”, !
email_address : “tim@tokutek.com” }
!

schema 2
{ fn :
ln :
ea :

- short field names (26 less bytes per doc)
“Tim”, !
“Callaghan”, !
“tim@tokutek.com” }

!

16

®

Contenu connexe

Tendances

Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using Starling
Erik Osterman
 
HTML5 Programming
HTML5 ProgrammingHTML5 Programming
HTML5 Programming
hotrannam
 

Tendances (19)

Fluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconFluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at Kubecon
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using Starling
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Automation m ysql_and_customer_photo
Automation m ysql_and_customer_photoAutomation m ysql_and_customer_photo
Automation m ysql_and_customer_photo
 
Node.js
Node.jsNode.js
Node.js
 
Building an Efficient AI Training Platform at bilibili with Alluxio
Building an Efficient AI Training Platform at bilibili with AlluxioBuilding an Efficient AI Training Platform at bilibili with Alluxio
Building an Efficient AI Training Platform at bilibili with Alluxio
 
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014
 
Node.js and Ruby
Node.js and RubyNode.js and Ruby
Node.js and Ruby
 
Yet another json rpc library (mole rpc)
Yet another json rpc library (mole rpc)Yet another json rpc library (mole rpc)
Yet another json rpc library (mole rpc)
 
PLNOG 4: Ela Jasińska - (Ab)Using Route Servers
PLNOG 4: Ela Jasińska -  (Ab)Using Route ServersPLNOG 4: Ela Jasińska -  (Ab)Using Route Servers
PLNOG 4: Ela Jasińska - (Ab)Using Route Servers
 
gRPC & Kubernetes
gRPC & KubernetesgRPC & Kubernetes
gRPC & Kubernetes
 
PHP at Density and Scale
PHP at Density and ScalePHP at Density and Scale
PHP at Density and Scale
 
HTML5 Programming
HTML5 ProgrammingHTML5 Programming
HTML5 Programming
 
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 

En vedette

Visualization of large FEM meshes
Visualization of large FEM meshesVisualization of large FEM meshes
Visualization of large FEM meshes
Tomáš Hnilica
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithms
rajatmay1992
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
Francisco Zamora-Martinez
 

En vedette (20)

Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech Tagging
 
Algorithms : Introduction and Analysis
Algorithms : Introduction and AnalysisAlgorithms : Introduction and Analysis
Algorithms : Introduction and Analysis
 
Visualization of large FEM meshes
Visualization of large FEM meshesVisualization of large FEM meshes
Visualization of large FEM meshes
 
Efficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsEfficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based models
 
Integration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingIntegration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs Training
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithms
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
 
Making Static Pivoting Scalable and Dependable
Making Static Pivoting Scalable and DependableMaking Static Pivoting Scalable and Dependable
Making Static Pivoting Scalable and Dependable
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data Processing
 
Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
 
PhD defence
PhD defencePhD defence
PhD defence
 
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
 
The Language of Compression
The Language of CompressionThe Language of Compression
The Language of Compression
 

Similaire à Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
Serdar Buyuktemiz
 

Similaire à Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10) (20)

Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDB
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free Replication
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to Practice
 
[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue
 
Engage 2019: Introduction to Node-Red
Engage 2019: Introduction to Node-RedEngage 2019: Introduction to Node-Red
Engage 2019: Introduction to Node-Red
 
Benchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible DisastersBenchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible Disasters
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
 
Stackato v5
Stackato v5Stackato v5
Stackato v5
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
 
Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering
 
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
 
Scaling with mongo db (with notes)
Scaling with mongo db (with notes)Scaling with mongo db (with notes)
Scaling with mongo db (with notes)
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

  • 1. Introducing TokuMX: The Performance Engine for MongoDB Leif Walsh Senior Engineer, Tokutek leif@tokutek.com @leifwalsh ®
  • 2. What is TokuMX? ! • TokuMX = MongoDB with improved storage ! • Drop in replacement for MongoDB v2.4 applications • Including replication and sharding • Same data model • Same query language • Drivers just work • No Full Text or Geospatial ! • Open Source – http://github.com/Tokutek/mongo ®
  • 3. B-tree Limitations Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes RAM 22 10 99 RAM DISK 2, 3, 4 10,20 22,25 99 Plus, mmap. ®
  • 4. TokuMX : Indexed Insertion 4 ®
  • 5. TokuMX : Indexed Insertion 5 ®
  • 6. TokuMX : Concurrency (>RAM) 6 ®
  • 7. TokuMX : Concurrency (<RAM) 7 ®
  • 8. TokuMX : Raw Compression bittorrent data, size on disk, ~31 million inserts (lower is better) TokuMX achieved 11.6:1 compression 8 ®
  • 9. TokuMX : Compression : Field Names synthetic data, size on disk, 100 million inserts (lower is better) TokuMX is substantially smaller, even without compression 9 ®
  • 10. TokuMX : Compression : Field Names synthetic data, size on disk, 100 million inserts (lower is better) MongoDB was ~10% smaller In TokuMX, field name length has almost no impact on size due to compression 10 ®
  • 11. TokuMX : ACID + MVCC • ACID – In MongoDB, multi-insertion operations allow for partial success o Asked to store 5 documents, 3 succeeded – In TokuMX, offer “all or nothing” behavior (atomic) • MVCC – In MongoDB, queries can be interrupted by writers. o The effect of these writers are visible to the reader – We offer MVCC o Reads are consistent as of the operation start 11 ®
  • 12. Questions? Leif Walsh Senior Engineer, Tokutek leif@tokutek.com @leifwalsh ®
  • 13. TokuMX : Indexed Insertion ! • indexed insertion workload (iibench) • http://github.com/tmcallaghan/iibench-mongodb ! { dateandtime: <date-time>,! cashregisterid: 1..1000,! customerid: 1..100000,! productid: 1..10000,! price: <double> }! ! • • insert only, 1000 documents per insert, 100 million inserts indexes • price + customerid • cashregister + price + customerid • price + dateandtime + customerid ! 13 ®
  • 14. TokuMX : Concurrency ! • Sysbench read-write workload • point and range queries, update, delete, insert • http://github.com/tmcallaghan/sysbench-mongodb ! { _id: 1..10000000,! k: 1..10000000,! c: <120 char random string ###-###-###>,! pad: <60 char random string ###-###-###>} 14 ®
  • 15. TokuMX : Raw Compression • BitTorrent Peer Snapshot Data (~31 million documents) • 3 Indexes : peer_id + created, torrent_snapshot_id + created, created ! {                         id: 1,! peer_id: 9222,! torrent_snapshot_id: 4,! upload_speed: 0.0000,! download_speed: 0.0000,! payload_upload_speed: 0.0000,! payload_download_speed: 0.0000,! total_upload: 0,! total_download: 0,! fail_count: 0,! hashfail_count: 0,! progress: 0.0000,! created: "2008-10-28 01:57:35" }! ! http://cs.brown.edu/~pavlo/torrent/ 15 ®
  • 16. TokuMX : Compression : Field Names ! schema 1 - long field names (10/20/20) { first_name : “Tim”, ! last_name : “Callaghan”, ! email_address : “tim@tokutek.com” } ! schema 2 { fn : ln : ea : - short field names (26 less bytes per doc) “Tim”, ! “Callaghan”, ! “tim@tokutek.com” } ! 16 ®