SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Acunu & OCaml:
Experience Report

      Tom Wilkie
Founder & VP Engineering

   tom@acunu.com
     @tom_wilkie
What do we do?
        1990

    Small databases
     BTree indexes
   BTree File systems
         RAID
     Old hardware
What do we do?
                          2010
     Distributed, shared-nothing databases
Write-optimised indexes          Write-optimised indexes

BTree file systems                BTree file systems
       RAID                ...          RAID
 New hardware                     New hardware
What do we do?
                 2011

  Distributed, shared-nothing databases


   Castle                      Castle
                   ...
New hardware               New hardware
What does this
have to do with
  Functional
Programming?
Big
                                                                                 Appl
                                                              Java,




                              Amazon S3 compatible
           Open API                                          Erlang,
OCaml
          Management
                                                                C




                                                       ...
          Deployment

          Monitoring


Python,                                                        ...




                                                                     ...
                                                                           ...      ...




 Bash,         Acunu Storage Core                              C
 Perl
               ...                               ...


                                                                                 Cros
                                                                                 Manag
Management Stack
Autogeneranted                   HTML5/JavaScript                        External Monitoring
               OCaml CLI                       User Interface                          Tools (Munin etc)




                                                  Routerd                                                Another Routerd
                                        enumeration, routing, clustering                               on a different machine




   FSd       Cassandrad        S3d                  Miscd                  Clusterd           Statsd                AlertsD


                                                                                              Default_               Alert_
  Disk        Keyspace        Bucket                 Base                    Host
                                                                                               Report                Rule

              Column                             NamedObje
 Version                                                                    Group             Report                  Alert
              Family                                cts


Collection                                                                                      Stat


             Cassandra_
Filesystem                    S3_Node                                       Service           Source
               Node




  Castle     Cassandra         BigS3
FSd       Cassandrad    S3d          Miscd


  Disk        Keyspace    Bucket        Base


              Column                  NamedObje
 Version
              Family                     cts

                                      Bridges to
Collection
                                    other systems
             Cassandra_
Filesystem                S3_Node
               Node




  Castle     Cassandra     BigS3
Miscd         Clusterd   Statsd     AlertsD


                               Default_   Alert_
      Base            Host
                                Report    Rule
   Clustering
    NamedObje        Group     Report      Alert
       cts
Failure Detection
                                Stat
   Monitoring
                     Service   Source
    Alerting
neranted                  HTML5/JavaScript                        External M
ml CLI                     User Interface                          Tools (Mu




                              Routerd
                    enumeration, routing, clustering




drad        S3d                 Miscd                  Clusterd           St

                     Routing & Aggregation                               De
ace        Bucket                Base                    Host
                                                                          Re
Successes / Failures
Prototype
“Filesystem”
Aim: Investigate
       algorithms for KV
            storage
• CoW BTrees        • Fractional Cascading
• Mod List BTrees   • Stratified DAs
• LSM Trees         • Multidimensional keys
• Doubling Arrays   • Z curve packing
Doubling Array


2   2   9


9
Doubling Array
                       Inserts


11          2   9       2   8   9   11


 8          8   11
                                              etc...



Similar to log-structured merge trees (LSM), cache-
oblivious lookahead array (COLA), ...
Demo
https://acunu-videos.s3.amazonaws.com/dajs.html
8KB @ 100MB/s, w/ 8ms seek      100 / 5
                        = 100 IOs/s          = 20 updates/s
~ log (2^30)/log 100
= 5 IOs/update
                                            Range Query
                           Update
                                               (Size Z)
 Log Structured             O(logB N)             O(Z/B)
     B-Tree                random IOs           random IOs

                           O((log N)/B)           O(Z/B)
 Doubling Array           sequential IOs       sequential IOs



  ~ log (2^30)/100       8KB @ 100MB/s             13k / 0.2
= 0.2 IOs/update          = 13k IOs/s          = 65k updates/s

     B = “block size”, say 8KB at 100 bytes/entry ~= 100 entries
Block Index   BTree Disk Trace




                  Time (s)
Block Index   Doubling Array Disk Trace




                        Time (secs)
Insertion Rate (kvps/s)   OCaml Prototype Performance




                                # inserted kvps
The Dark Side...
Insert Rate (keys/s)   Java Prototype Performance




                                Time (s)
What about
 Castle?
Castle Performance
One more thing...
SH OT S*
SN AP
         * And clones!
I’ll explain how....


 “Castle: Re-inventing Storage
         For Big Data”
   London, 27th September
      http://bit.ly/rduBia
Questions?
    tom@acunu.com
      @tom_wilkie

 http://www.acunu.com
http://bitbucket.org/acunu
 http://github.com/acunu
References
[LSM] The Log-Structured Merge-Tree (LSM-Tree)
Patrick O'Neil, Edward Cheng, Dieter Gawlick,
Elizabeth O'Neil                                           Stratified B-trees and versioned dictionaries, - Andy
    http://staff.ustc.edu.cn/~jpq/paper/flash/1996-The      Twigg, Andrew Byde, Grzegorz Miłoś, Tim Moreton,
     %20Log-Structured%20Merge-Tree%20%28LSM-              John Wilkes, Tom Wilkie, HotStorage’11
                                          Tree%29.pdf          http://www.usenix.org/event/hotstorage11/tech/
                                                                                            final_files/Twigg.pdf
[COLA] Cache-Oblivious Streaming B-trees,
Michael A. Bender et al                                    [RDA] Random duplicate storage strategies for
        http://www.cs.sunysb.edu/~bender/newpub/           load balancing in multimedia servers, 2000, Joep
                                 BenderFaFi07.pdf          Aerts and Jan Korst and Sebastian Egner
                                                                             http://www.win.tue.nl/~joep/IPL.ps
[DSST] Making Data Structures Persistent - J. R.
Driscoll, N. Sarnak, D. D. Sleator, R. E. Tarjan, Making   Apache, Apache Cassandra, Cassandra, Hadoop, and
Data Structures Persistent, Journal of Computer              the eye and elephant logos are trademarks of the
and System Sciences,Vol. 38, No. 1, 1989                                        Apache Software Foundation.
    http://www.cs.cmu.edu/~sleator/papers/making-
                        data-structures-persistent.pdf

Contenu connexe

Tendances

Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
DataWorks Summit
 
MongoDB at the energy frontier
MongoDB at the energy frontierMongoDB at the energy frontier
MongoDB at the energy frontier
Valentin Kuznetsov
 
Parallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web ServicesParallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web Services
stephenjbarr
 
OB9-G-language-Arakawa
OB9-G-language-ArakawaOB9-G-language-Arakawa
OB9-G-language-Arakawa
tutorialsruby
 
Captura de pacotes no KernelSpace
Captura de pacotes no KernelSpaceCaptura de pacotes no KernelSpace
Captura de pacotes no KernelSpace
PeslPinguim
 

Tendances (17)

Cloumon enterprise
Cloumon enterpriseCloumon enterprise
Cloumon enterprise
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
 
Open stack in sina
Open stack in sinaOpen stack in sina
Open stack in sina
 
MongoDB at the energy frontier
MongoDB at the energy frontierMongoDB at the energy frontier
MongoDB at the energy frontier
 
Taming Jcr With Sling
Taming Jcr With SlingTaming Jcr With Sling
Taming Jcr With Sling
 
Workshop de Ruby on Rails
Workshop de Ruby on RailsWorkshop de Ruby on Rails
Workshop de Ruby on Rails
 
HA Clustering of PostgreSQL(replication)@2012.9.29 PG Study.
HA Clustering of PostgreSQL(replication)@2012.9.29 PG Study.HA Clustering of PostgreSQL(replication)@2012.9.29 PG Study.
HA Clustering of PostgreSQL(replication)@2012.9.29 PG Study.
 
Data Aggregation System
Data Aggregation SystemData Aggregation System
Data Aggregation System
 
stream processing engine
stream processing enginestream processing engine
stream processing engine
 
Cisco 刘洋 从“路由”回归“交换”
Cisco 刘洋 从“路由”回归“交换”Cisco 刘洋 从“路由”回归“交换”
Cisco 刘洋 从“路由”回归“交换”
 
Parallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web ServicesParallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web Services
 
OB9-G-language-Arakawa
OB9-G-language-ArakawaOB9-G-language-Arakawa
OB9-G-language-Arakawa
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
 
MXF & AAF
MXF & AAFMXF & AAF
MXF & AAF
 
Captura de pacotes no KernelSpace
Captura de pacotes no KernelSpaceCaptura de pacotes no KernelSpace
Captura de pacotes no KernelSpace
 
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
 
Barcamp PT
Barcamp PTBarcamp PT
Barcamp PT
 

Similaire à Acunu & OCaml: Experience Report, CUFP

Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Baruch Sadogursky
 
Running your Java EE 6 applications in the Cloud
Running your Java EE 6 applications in the CloudRunning your Java EE 6 applications in the Cloud
Running your Java EE 6 applications in the Cloud
IndicThreads
 
AWS Summit 2011: AWS 101 Overview
AWS Summit 2011: AWS 101 OverviewAWS Summit 2011: AWS 101 Overview
AWS Summit 2011: AWS 101 Overview
Amazon Web Services
 
Automated testing with OffScale and MongoDB
Automated testing with OffScale and MongoDBAutomated testing with OffScale and MongoDB
Automated testing with OffScale and MongoDB
Omer Gertel
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
Sergey Bushik
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
Haseeb Alam
 

Similaire à Acunu & OCaml: Experience Report, CUFP (20)

Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Servers fail, who cares?
Servers fail, who cares? Servers fail, who cares?
Servers fail, who cares?
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
Introduction to AWS tools
Introduction to AWS toolsIntroduction to AWS tools
Introduction to AWS tools
 
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
 
Efficient Parallel Set-Similarity Joins Using MapReduce - Poster
Efficient Parallel Set-Similarity Joins Using MapReduce - PosterEfficient Parallel Set-Similarity Joins Using MapReduce - Poster
Efficient Parallel Set-Similarity Joins Using MapReduce - Poster
 
Running your Java EE 6 applications in the Cloud
Running your Java EE 6 applications in the CloudRunning your Java EE 6 applications in the Cloud
Running your Java EE 6 applications in the Cloud
 
Open stack@ebay
Open stack@ebayOpen stack@ebay
Open stack@ebay
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
AWS Summit 2011: AWS 101 Overview
AWS Summit 2011: AWS 101 OverviewAWS Summit 2011: AWS 101 Overview
AWS Summit 2011: AWS 101 Overview
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
Lessons learned scaling big data in cloud
Lessons learned   scaling big data in cloudLessons learned   scaling big data in cloud
Lessons learned scaling big data in cloud
 
NoSQL with MySQL
NoSQL with MySQLNoSQL with MySQL
NoSQL with MySQL
 
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
 
Automated testing with OffScale and MongoDB
Automated testing with OffScale and MongoDBAutomated testing with OffScale and MongoDB
Automated testing with OffScale and MongoDB
 
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
 
Apache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and PerformanceApache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and Performance
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
 

Plus de Acunu

Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
Acunu
 

Plus de Acunu (20)

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on Cassandra
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational Aspirin
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra Apps
 
All Your Base
All Your BaseAll Your Base
All Your Base
 
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraRealtime Analytics with Apache Cassandra
Realtime Analytics with Apache Cassandra
 
Realtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonRealtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX London
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with Cassandra
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra London
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with Cassandra
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: Cassandra
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into Cassandra
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation Cassandra
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Acunu & OCaml: Experience Report, CUFP

  • 1. Acunu & OCaml: Experience Report Tom Wilkie Founder & VP Engineering tom@acunu.com @tom_wilkie
  • 2. What do we do? 1990 Small databases BTree indexes BTree File systems RAID Old hardware
  • 3. What do we do? 2010 Distributed, shared-nothing databases Write-optimised indexes Write-optimised indexes BTree file systems BTree file systems RAID ... RAID New hardware New hardware
  • 4. What do we do? 2011 Distributed, shared-nothing databases Castle Castle ... New hardware New hardware
  • 5.
  • 6. What does this have to do with Functional Programming?
  • 7. Big Appl Java, Amazon S3 compatible Open API Erlang, OCaml Management C ... Deployment Monitoring Python, ... ... ... ... Bash, Acunu Storage Core C Perl ... ... Cros Manag
  • 9. Autogeneranted HTML5/JavaScript External Monitoring OCaml CLI User Interface Tools (Munin etc) Routerd Another Routerd enumeration, routing, clustering on a different machine FSd Cassandrad S3d Miscd Clusterd Statsd AlertsD Default_ Alert_ Disk Keyspace Bucket Base Host Report Rule Column NamedObje Version Group Report Alert Family cts Collection Stat Cassandra_ Filesystem S3_Node Service Source Node Castle Cassandra BigS3
  • 10. FSd Cassandrad S3d Miscd Disk Keyspace Bucket Base Column NamedObje Version Family cts Bridges to Collection other systems Cassandra_ Filesystem S3_Node Node Castle Cassandra BigS3
  • 11. Miscd Clusterd Statsd AlertsD Default_ Alert_ Base Host Report Rule Clustering NamedObje Group Report Alert cts Failure Detection Stat Monitoring Service Source Alerting
  • 12. neranted HTML5/JavaScript External M ml CLI User Interface Tools (Mu Routerd enumeration, routing, clustering drad S3d Miscd Clusterd St Routing & Aggregation De ace Bucket Base Host Re
  • 15. Aim: Investigate algorithms for KV storage • CoW BTrees • Fractional Cascading • Mod List BTrees • Stratified DAs • LSM Trees • Multidimensional keys • Doubling Arrays • Z curve packing
  • 17. Doubling Array Inserts 11 2 9 2 8 9 11 8 8 11 etc... Similar to log-structured merge trees (LSM), cache- oblivious lookahead array (COLA), ...
  • 19. 8KB @ 100MB/s, w/ 8ms seek 100 / 5 = 100 IOs/s = 20 updates/s ~ log (2^30)/log 100 = 5 IOs/update Range Query Update (Size Z) Log Structured O(logB N) O(Z/B) B-Tree random IOs random IOs O((log N)/B) O(Z/B) Doubling Array sequential IOs sequential IOs ~ log (2^30)/100 8KB @ 100MB/s 13k / 0.2 = 0.2 IOs/update = 13k IOs/s = 65k updates/s B = “block size”, say 8KB at 100 bytes/entry ~= 100 entries
  • 20. Block Index BTree Disk Trace Time (s)
  • 21. Block Index Doubling Array Disk Trace Time (secs)
  • 22. Insertion Rate (kvps/s) OCaml Prototype Performance # inserted kvps
  • 24. Insert Rate (keys/s) Java Prototype Performance Time (s)
  • 27.
  • 29. SH OT S* SN AP * And clones!
  • 30. I’ll explain how.... “Castle: Re-inventing Storage For Big Data” London, 27th September http://bit.ly/rduBia
  • 31. Questions? tom@acunu.com @tom_wilkie http://www.acunu.com http://bitbucket.org/acunu http://github.com/acunu
  • 32. References [LSM] The Log-Structured Merge-Tree (LSM-Tree) Patrick O'Neil, Edward Cheng, Dieter Gawlick, Elizabeth O'Neil Stratified B-trees and versioned dictionaries, - Andy http://staff.ustc.edu.cn/~jpq/paper/flash/1996-The Twigg, Andrew Byde, Grzegorz Miłoś, Tim Moreton, %20Log-Structured%20Merge-Tree%20%28LSM- John Wilkes, Tom Wilkie, HotStorage’11 Tree%29.pdf http://www.usenix.org/event/hotstorage11/tech/ final_files/Twigg.pdf [COLA] Cache-Oblivious Streaming B-trees, Michael A. Bender et al [RDA] Random duplicate storage strategies for http://www.cs.sunysb.edu/~bender/newpub/ load balancing in multimedia servers, 2000, Joep BenderFaFi07.pdf Aerts and Jan Korst and Sebastian Egner http://www.win.tue.nl/~joep/IPL.ps [DSST] Making Data Structures Persistent - J. R. Driscoll, N. Sarnak, D. D. Sleator, R. E. Tarjan, Making Apache, Apache Cassandra, Cassandra, Hadoop, and Data Structures Persistent, Journal of Computer the eye and elephant logos are trademarks of the and System Sciences,Vol. 38, No. 1, 1989 Apache Software Foundation. http://www.cs.cmu.edu/~sleator/papers/making- data-structures-persistent.pdf