SlideShare a Scribd company logo
1 of 43
Computational Research Division Lawrence Berkeley National Laboratory Dan Gunter
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Terminology: NOSQL and “Schemaless” ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NOSQL past and present Pre-RDBMS RDBMS era NOSQL
Pre-relational structured storage systems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Computer Systems News , 11/28/83
The relational model ,[object Object],[object Object],[object Object],A 1 ... A n Value 1 ... Value n R Relation (Table) Relation variable (Table name) Attribute (Column) {unordered} Heading Tuple (Row) {unordered}
Recent NOSQL database products Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak
Why NOSQL? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
CAP Theorem ,[object Object],[object Object],[object Object],[object Object],[object Object],All robust distributed systems live here Forfeit partition-tolerance Forfeit availability Forfeit consistency Single-site databases, cluster databases, LDAP  Distributed databases w/pessimistic locking, majority protocols Coda, web caching, DNS,  Dynamo
CAP, ACID, and BASE ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ACID BASE
Pioneers ,[object Object],[object Object],These implementations are  not  publicly available, but the distributed-system techniques that they integrated to build huge databases have been imitated, to a greater or lesser extent, by every implementation that followed.
Google BigTable ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BigTable’s Data Model Google’s Bigtable is essentially a massive, distributed 3-D spreadsheet. It doesn’t do SQL, there is limited support for atomic transactions, nor does it support the full relational database model. In short, in these and other areas, the Google team made design trade-offs to enable the scalability and fault-tolerance Google apps require. - Robin Harris, StorageMojo (blog), 2006-09-08 t 6 t 5 t 3 name contents: anchor:cnnsi.com ... anchor:my.look.ca ... “ com.cnn.www” “ CNN” ... “ CNN.com” ... “ <html>...” “ <html>...” “ <html>...”
Tablets and SSTables ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Use of Bloom Filters to optimize lookups ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],w  is not in {  x, y, z  } because it hashes to one position with a 0 1 1 1 0 0 1 0 1 0 1 0 0 1 0 { x,  w y, z }
Chubby and Paxos ,[object Object],Each “DB” is a replica Each server runs on its own host Google tends to run 5 servers, with only one being the “master” at any one time Chubby server  DB Chubby server  DB Chubby server  DB Chubby server  DB Chubby server  DB Master
What about CAP? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Amazon’s Dynamo ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dynamo data partitioning and replication Virtual node Host “node” Host “node” Virtual node Virtual node Virtual node Virtual node Virtual node Virtual node . . Hash ring using consistent hashing Host “node” Virtual node Virtual node Virtual node Virtual node 4 4 3 Item Hashes to this spot coordinator node replicas
Eventual consistency and sloppy quorum ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Replica synchronization with Merkle trees ,[object Object],[object Object],[object Object],For Dynamo, the “data” are the keys stored in a given virtual node Each node is a hash of its children If two top hashes match, then the trees are the same
Infrastructure (at scale) is fractal ,[object Object],[object Object],[object Object]
The Gold Rush Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Key/Value Store Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Project Voldemort Type Key/Value Store License Apache 2.0 Language Java Company Linked-In Web project-voldemort.com
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],result = self.client.add(bucket.get_name()).map(&quot;Riak.mapValuesJson” .reduce(&quot;Riak.reduceSum”.run() Riak Example: Map/reduce with the Python API Type Key/Value Store License Open-Source Language Erlang Company Basho Web wiki.basho.com/display/RIAK/Riak/
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Hibari Type Key/Value Store  License Open-Source Language Erlang Company Gemini Mobile Web sourceforge.net/projects/hibari/
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Cassandra ,[object Object],Type Extensible column store License Apache 2.0 Language Java Company Apache Software Foundation Web cassandra.apache.org
[object Object],[object Object],[object Object],[object Object],[object Object],SimpleDB Document Store CouchDB MongoDB Lotus Domino Mnesia
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],CouchDB Type Document store License Apache 2.0 Language Erlang Company Apache Software Foundation Web couchdb.org
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],MongoDB ,[object Object],http://www.slideshare.net/mongodb/mongodb-replica-sets Type Document store License GPL Language C++ Company 10gen Web mongodb.org
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Mnesia * Mozilla Public License modified to conform with laws of Sweden (more herring) Type Document store License EPL* Language Erlang Company Ericsson Web www.erlang.org Papers http://www.erlang.se/publications/mnesia_overview.pdf
Why do we care about Mnesia / OTP? ,[object Object],[object Object],females() -> F = fun() -> Q = query [E.name || E <- table(employee),     E.sex = female] end, mnemosyne:eval(Q) end, mnesia:transaction(F).  Erlang query for “all females” in company* *I know, but it’s not  my  example. This is right out of the manual.
Comparison of MongoDB and CouchDB ,[object Object],[object Object],[object Object],[object Object],[object Object],Database Inserts/sec MongoDB 16,000 CouchDB 70 CouchDB, batch 1,800
Schemaless data modeling http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/
Example from distributed monitoring ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],All of these are data modeling “anti-patterns” for relational DBs
What’s wrong with EAV? ,[object Object],[object Object]
What about queries?
SQL vs. M/R and other models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Selected references ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.Navdeep Charan
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureDATAVERSITY
 

What's hot (20)

NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Big Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud DetectionBig Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud Detection
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 

Similar to Schemaless Databases

05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.pptAnandKonj1
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.pptssuser8c8fc1
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, HowIgor Moochnick
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Max Neunhöffer
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsFirat Atagun
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilledrICh morrow
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 

Similar to Schemaless Databases (20)

05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
 
NoSQL Basics - A Quick Tour
NoSQL Basics - A Quick TourNoSQL Basics - A Quick Tour
NoSQL Basics - A Quick Tour
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
No sql
No sqlNo sql
No sql
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
Datastores
DatastoresDatastores
Datastores
 
No sql
No sqlNo sql
No sql
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
 
NOSQL
NOSQLNOSQL
NOSQL
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
 

Recently uploaded

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 

Recently uploaded (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 

Schemaless Databases

  • 1. Computational Research Division Lawrence Berkeley National Laboratory Dan Gunter
  • 2.
  • 3.
  • 4. NOSQL past and present Pre-RDBMS RDBMS era NOSQL
  • 5.
  • 6.
  • 7. Recent NOSQL database products Columnar or Extensible record Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. BigTable’s Data Model Google’s Bigtable is essentially a massive, distributed 3-D spreadsheet. It doesn’t do SQL, there is limited support for atomic transactions, nor does it support the full relational database model. In short, in these and other areas, the Google team made design trade-offs to enable the scalability and fault-tolerance Google apps require. - Robin Harris, StorageMojo (blog), 2006-09-08 t 6 t 5 t 3 name contents: anchor:cnnsi.com ... anchor:my.look.ca ... “ com.cnn.www” “ CNN” ... “ CNN.com” ... “ <html>...” “ <html>...” “ <html>...”
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Dynamo data partitioning and replication Virtual node Host “node” Host “node” Virtual node Virtual node Virtual node Virtual node Virtual node Virtual node . . Hash ring using consistent hashing Host “node” Virtual node Virtual node Virtual node Virtual node 4 4 3 Item Hashes to this spot coordinator node replicas
  • 20.
  • 21.
  • 22.
  • 23. The Gold Rush Columnar or Extensible record Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Schemaless data modeling http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/
  • 37.
  • 38.
  • 40.
  • 41.
  • 42.
  • 43.

Editor's Notes

  1. This talk really comes out of my attempt to orient myself in this space. Background is in monitoring distributed systems, concerned with scalable collection and data analysis. But also want to know what I can use for semi-structured data “in the small”.
  2. Where it applies, the distinction between relatively fixed schemas and dynamic ones is more technically significant than what query syntax is used to access the data, as has been shown by a number of products that provided a dialect of SQL as an alternative query language either alongside or on top of their native syntax.
  3. PICK -- MultiValue (aka PICK) databases are developed at TRW in 1965. M[umps] -- According to comment from Scott Jones M[umps] is developed at Mass General Hospital in 1966. It is a programming language that incorporates a hierarchical database with B+ tree storage. IBM IMS -- IBM IMS, a hierarchical database, is developed with Rockwell and Caterpillar for the Apollo space program in 1966. ISM -- InterSystems develops the ISM product family succeeded by the Open M product, all M[umps] implementations. See comment from Scott Jones below. ANSI M -- M[umps] is approved as a ANSI standard language in 1977. AT&amp;T DBM -- in 1979 Ken Thompson creates DBM which is released by AT&amp;T. At it&apos;s core it is a file-based hash. TDBM -- TDBM supporting atomic transactions NDBM -- NDBM was the Berkeley version of DBM supporting having multiple databases open at the same time. SDBM -- SDBM - another clone of DBM mainly for licensing reasons. GT.M -- GT.M is the first version of a key-value store with focus on high performance transaction processing. It is open sourced in 2000. BerkeleyDB -- BerkeleyDB is created at Berkeley in the transition from 4.3BSD to 4.4BSD. Sleepycat software is started as a company in 1996 when Netscape needed new features for BerkeleyDB. Later acquired by Oracle which still sell and maintain BerkeleyDB. Lotus Domino -- Lotus Notes or rather the server part, Lotus Domino, which really is a document database has it&apos;s initial release in 1989, now sold by IBM. It has evolved a lot from the early versions and is now a full office and collaboration suite. GDBM -- GDBM is the Gnu project clone of DBM Mnesia -- Mnesia is developed by Ericsson as a soft real-time database to be used in telecom. It is relational in nature but does not use SQL as query language but rather Erlang itself. Cache -- InterSystems CachÈ launched in 1997 and is a hybrid so-called post-relational database. It has object interfaces, SQL, PICK/MultiValue and direct manipulation of data structures. It is a M[umps] implementation. See Scott Jones comment below for more on the history of InterSystems Metakit -- Metakit is started in 1997 and is probably the first document oriented database. Supports smaller datasets than the ones in vogue nowadays. Neo4j -- Graph database Neo4j is started in 2000. db4o -- db4o an object database for java and .net is started in 2000 QDBM -- QDBM is a re-implementation of DBM with better performance by Mikio Hirabayashi. Memcached -- Memcached is started in 2003 by Danga to power Livejournal. Memcached isn&apos;t really a database since it&apos;s memory-only but there is soon a version with file storage called memcachedb. Infogrid graph DB -- Infogrid graph database is started as closed source in 2005, open sourced in 2008 CouchDB -- CouchDB is started in 2005 and provides a document database inspired by Lotus Notes. The project moves to the Apache Foundation in 2008. Google BigTable -- Google BigTable is started in 2004 and the research paper is released in 2006. JackRabbit -- JackRabbit is started in 2006 as an implementation of JSR 170 and 283. Tokyo Cabinet -- Tokyo Cabinet is a successor to QDBM by (Mikio Hirabayashi) started in 2006 Dynamo -- The research paper on Amazon Dynamo is released in 2007. MongoDB -- The document database MongoDB is started in 2007 as a part of a open source cloud computing stack and first standalone release in 2009. Cassandra -- Facebooks open sources the Cassandra project in 2008 Voldemort -- Project Voldemort is a replicated database with no single point-of-failure. Started in 2008. Dynomite -- Dynomite is a Dynamo clone written in Erlang. Terrastore -- Terrastore is a scalable elastic document store started in 2009 Redis -- Redis is persistent key-value store started in 2009 Riak -- Riak Another dynamo-inspired database started in 2009. HBase -- HBase is a BigTable clone for the Hadoop project while Hypertable is another BigTable type database also from 2009. Vertexdb -- Vertexdb another graph database is started in 2009 Term: NOSQL -- Eric Evans of Rackspace, a committer on the Cassandra project, introduces the term NoSQL often used in the sense of Not only SQL to describe the surge of new projects and products.
  4. Both of these systems are still used. An open-source version of M, called GT.M, is available (since 2000). M is still used by the US Dept of Veterans Affairs, and also by Ameritrade (Cache’: 12B transactions a day), ING Direct, and others in the financial industry. The IBM IMS system is still very actively used today, in particular for the US Federal Reserve. According to Wikipedia, odds are good your ATM transaction hits an IMS database. Chinese banks have purchased IMS technology. IMS includes a separate “transaction management” (TM) system.
  5. E. F. Codd’s seminal 1970 paper, “ A Relational Model of Data for Large Shared Data Banks” laid out a solid mathematical basis for databases in contrast to the hierarchical and network models of the time, relational algebra, an offshoot of first-order logic, provided a declarative means of reasoning about the data that did not depend on the implementation SQL is “loosely based” on relational algebra
  6. This taxonomy will be explored in more detail later, the point for now is that there are several different types of datastores and a number of examples of each and, referring back to the timeline, most of these implementations have occurred in the past few years..
  7. Corporations (once again) found themselves at the forefront of systems research. But what was that research? (Read on..)
  8. If nothing else, being able to refer to the “CAP theorem” the next time your networked demo breaks..
  9. In his talk, Brewer said “there is almost no work in this area”. I think that the existence of scalable (schemaless) database systems is proof that this has changed.
  10. Pictured is Parliament, pioneers of funk!
  11. Trivia: what major movie was about producing a script called “Chubby Rain”?
  12. Example of a BIgTable that stores web pages (directly out of the paper). The row names are reversed URLs (so sorted rows tend to group things by the same domain) There are two column families, “contents” and “anchor” In this example, each anchor cell has one version, and the contents column has 3
  13. Paxos is an old and well-known algorithm. The Chubby “Database” is really a set of directories with small “lockfiles”. Each tablet server gets one Chubby directory, and each of its tablets is a lockfile.
  14. These core services included the Amazon e-commerce shopping cart.
  15. Each virtual node is responsible for keys between itself and its predecessor on the ring. The mapping of a single node to a variable number of virtual nodes on the hash ring accounts for heterogeneity (host “power”) in the system.
  16. The quorum is “sloppy” because R and W refer to the number of healthy nodes, which may change between the write and subsequent read of the key.
  17. (Who knows what this is?) The picture is a close-up of a vegetable: the “ Chou Romanesco&amp;quot; cauliflower
  18. Particularly appropriate analogy because of the industry’s tendency to rush towards shiny new technologies! Following sections will examine each of these categories and walk through one publicly available product (or more) for each. With the exception of graph databases, which I simply haven’t taken the time to grok yet.
  19. Both Voldemort and the next database, Riak, claim they were “inspired” by the early Dynamo paper
  20. In the diagram, the green nodes are head; orange middle; red are tails. The white arrows are write requests, grey read requests, and red are (all) replies.
  21. Developed by former engineers from BigTable and Dynamo projects, in heavy use at Facebook. For consistency level, zero = totally async.; Any= 1 node, including hinted handoff; Quorum = R/2+1 where R = #replicas Reads of 0 or Any don’t make sense. 0=no data, Any=wrong node; can’t do read-repairs, just the handed-off version
  22. Has a nice Web UI called “Futon”. Yes, everything is a reclining furniture pun.
  23. Obviously, this is at best a micro-benchmark. YCSB stands for Yahoo! Cloud Serving Benchmark
  24. I won’t attempt to actually cover Map/Reduce, and don’t know Erlang. Instead: what impact do these databases have on data modeling efforts?