SlideShare a Scribd company logo
1 of 45
Download to read offline
Is NoSQL the Future of Data
         Storage?
        By Gary Short
      Developer Express
Introduction
•   Gary Short
•   Technical Evangelist for Developer Express
•   C# MVP
•   garys@devexpress.com
•   www.garyshort.org
•   @garyshort.



                                                 2
What About You Guys?




                       3
Breadth First Look @ NoSQL




                             4
Be Doing 3 Things
1. Define NoSQL databases
2. Look at scenarios where you can use NoSQL
3. Drill into a specific use case.




                                               5
6
Where Does NoSQL Originate?
• 1998
  – OS relational database
     •   Created by Carlo Strozzi
     •   Didn’t expose an SQL interface
     •   Called NoSQL
     •   The author said:
     •   “departs from the relational model altogether...”
     •   “...should have been called ‘NoREL”.



                                                             7
More Recently...
• Eric Evans reintroduced the term in 2009
  – Johan Oskarsson (last.fm)
     • Event to discuss OS distributed databases
• This labels growing number datastores
  – Open source
  – Non-relational
  – Distributed
  – (often) don’t guarantee ACID.

                                                   8
Atlanta 2009
• No:sql(east) conference
• Billed as “conference of no-rel datastores”
• Worst tag line ever
  – SELECT fun, profit FROM real_world WHERE rel=false.




                                                          9
Not Ant-RDBMS




                10
Let’s Talk a Bit About What NoSQL DBs
               Look Like...




                                    11
Key Attributes of NoSQL Databases
•   Don’t require fixed table schemas
•   Non-relational
•   (Usually) avoid join operations
•   Scale horizontally
    – Adding more nodes to a storage system.




                                               12
What Does the Taxonomy Look Like?




                                    13
Document Store
•   RavenDB
•   Apache Jackrabbit
•   CouchDB
•   MongoDB
•   SimpleDB
•   XML Databases
    – MarkLogic Server
    – eXist.

                                14
Document What?




                 15
Graph Storage
•   Trinity
•   AllegroGraph
•   Core Data
•   Neo4j
•   DEX
•   FlockDB.



                               16
Which Means?
• Graph consists of
  – Node (‘stations’ of the graph)
  – Edges (lines between them)
• FlockDB
  – Created by the Twitter folks
  – Nodes = Users
  – Edges = Nature of relationship between nodes.


                                                    17
Social Graph




               18
Key/Value Stores
• On disk
• Cache in Ram
• Eventually Consistent
   – Weak Definition
      • “If no updates occur for a period, eventually all updates will
        propagate through the system and all replicas will be consistent”
   – Strong Definition
      • “for a given update and a given replica eventually either the
        update reaches the replica or the replica retires”
• Ordered
   – Distributed Hash Table allows lexicographical processing.

                                                                            19
Object Databases
•   Db4o
•   GemStone/S
•   InterSystems Caché
•   Objectivity/DB
•   ZODB.




                                20
How the &*$% do You Index
         That?!




                            21
Okay got it, Now Let’s Compare Some
       Real World Scenarios




                                  22
You Need Constant Consistency
•   You’re dealing with financial transactions
•   You’re dealing with medical records
•   You’re dealing with bonded goods
•   Best you use a RDMBS ☺.




                                                 23
You Need Horizontal Scalability
• You’re working across defined geographic regions
• You’re working with large quantities of data
• Game server sharding
• Use NoSQL
   – Something like Cassandra.




                                                     24
Up in the Clouds Baby




                        25
26
Frequently Written Rarely Read
•   Think web counters and the like
•   Every time a user comes to a page = ctr++
•   But it’s only read when the report is run
•   Use NoSQL (key-value storage/memcache).




                                                27
I Got Big Data!




                  28
Binary Baby!
•   If you are YouTube
•   Flickr
•   Twitpic
•   Spotify
•   NoSQL (Amazon S3).




                              29
Here Today Gone Tomorrow
• Transient data like..
  – Web Sessions
  – Locks
  – Short Term Stats
     • Shopping cart contents
• Use NoSQL (Memcache).



                                30
Data Replication
• Same data in two or more locations
  – Music Library
     • Web browser
     • iPone App
• NoSQL (CouchDB).




                                       31
Hit me Baby One More Time!
• High Availability
  – High number of important transactions
     • Online gambling
     • Pay Per view
        – Ahem!
     • Online Auction
• NoSQL (Cassandra – automatic clustering).



                                              32
Give me a Real World Example
• Twitter
  – The challenges
     • Needs to store many graphs
        – Who you are following
        – Who’s following you
        – Who you receive phone notifications from etc
     • To deliver a tweet requires rapid paging of followers
     • Heavy write load as followers are added and removed
     • Set arithmetic for @mentions (intersection of users).


                                                               33
What Did They Try?
• Relational Databases
• Key-Value storage of denormalized lists




                                            34
Did it Work?




               35
What Did They Need?
• Simplest possible thing that would work
• Allow for horizontal partitioning
• Allow write operations to
  – Arrive out of order
  – Or be processed more than once
• Failures should result in redundant work
  – Not lost work!


                                             36
The Result was FlockDB
• Stores graph data
• Not optimised for graph traversal operations
• Optimised for large adjacency lists
  – List of all edges in a graph
     • Each entry is a set of end points (or tuple if directed)
• Optimised for fast read and write
• Optimised for page-able set arithmetic.


                                                                  37
How Does it Work?
• Stores graphs as sets of edges between nodes
• Data is partitioned by node
  – All queries can be answered by a single partition
• Write operations are idempotent
  – Can be applied multiple times without changing
    the result
• And commutative
  – Changing the order of operands doesn’t change
    the result.

                                                        38
A Little More About Idempotency
• Applied several times with no change to the
  result
• A operation ’O’ on set S is called idempotent
  if, for all x in S, x O x = x.
• Set union
  – A U B = {X: X E A or X E B}
• Set intersection
  – A n B = {X: X E A and X E B}

                                                  39
A Little More About Commutative
• Changing the order of operands doesn’t
  change the result.
  3+2=5
• Can be combined with idempotency
• Let’s look at the follow command in Twitter
   • Let X = follow person X
   • Let Y = follow person Y
   • Then 3X + 2Y = 2Y + 3X
   • And 2X + 3Y = 3X + 2Y
• Note: it’s only true for the same operation.
                                                 40
Commutative Writes Help Bring up
            Partitions
• Partition can receive write traffic immediately
• Receive dump of data in the background
• Live for read as soon as the dump is complete.




                                                41
Performance?
• Currently store 13 billion edges
• 20K writes / second
• 100K reads / second.




                                     42
Punchline?
• Under all the bells and whistles...
  – Its MySQL ☺.




                                        43
So is this the Future?
• Yes!
• And No!




                                 44
What?! How Can That be?!




                           45

More Related Content

What's hot

Ichii mysql-osc2011tokyofall
Ichii mysql-osc2011tokyofallIchii mysql-osc2011tokyofall
Ichii mysql-osc2011tokyofall
Takashi Ichii
 

What's hot (20)

The MySQL Server ecosystem in 2016
The MySQL Server ecosystem in 2016The MySQL Server ecosystem in 2016
The MySQL Server ecosystem in 2016
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
Cloud conference - mongodb
Cloud conference - mongodbCloud conference - mongodb
Cloud conference - mongodb
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 
TechTalk #14 Grokking: Couchbase - NoSQL + Memcached + Real-time + Offline!
TechTalk #14 Grokking:  Couchbase - NoSQL + Memcached + Real-time + Offline!TechTalk #14 Grokking:  Couchbase - NoSQL + Memcached + Real-time + Offline!
TechTalk #14 Grokking: Couchbase - NoSQL + Memcached + Real-time + Offline!
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino API
 
How Shit Works: Storage
How Shit Works: StorageHow Shit Works: Storage
How Shit Works: Storage
 
MongoDB
MongoDBMongoDB
MongoDB
 
Is the database a solved problem?
Is the database a solved problem?Is the database a solved problem?
Is the database a solved problem?
 
Ichii mysql-osc2011tokyofall
Ichii mysql-osc2011tokyofallIchii mysql-osc2011tokyofall
Ichii mysql-osc2011tokyofall
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
The Wix Microservice Stack
The Wix Microservice StackThe Wix Microservice Stack
The Wix Microservice Stack
 
Beware of your Hype Value Stores
Beware of your Hype Value StoresBeware of your Hype Value Stores
Beware of your Hype Value Stores
 
JavaOne_2010
JavaOne_2010JavaOne_2010
JavaOne_2010
 
Cassandra@Coursera: AWS deploy and MySQL transition
Cassandra@Coursera: AWS deploy and MySQL transitionCassandra@Coursera: AWS deploy and MySQL transition
Cassandra@Coursera: AWS deploy and MySQL transition
 

Similar to Is NoSQL The Future of Data Storage?

Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
jbellis
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 

Similar to Is NoSQL The Future of Data Storage? (20)

Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will Win
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloud
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey
[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey
[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
My sql tutorial-oscon-2012
My sql tutorial-oscon-2012My sql tutorial-oscon-2012
My sql tutorial-oscon-2012
 
NoSQL
NoSQLNoSQL
NoSQL
 

More from Saltmarch Media

More from Saltmarch Media (18)

Concocting an MVC, Data Services and Entity Framework solution for Azure
Concocting an MVC, Data Services and Entity Framework solution for AzureConcocting an MVC, Data Services and Entity Framework solution for Azure
Concocting an MVC, Data Services and Entity Framework solution for Azure
 
Caring about Code Quality
Caring about Code QualityCaring about Code Quality
Caring about Code Quality
 
Learning Open Source Business Intelligence
Learning Open Source Business IntelligenceLearning Open Source Business Intelligence
Learning Open Source Business Intelligence
 
Java EE 7: the Voyage of the Cloud Treader
Java EE 7: the Voyage of the Cloud TreaderJava EE 7: the Voyage of the Cloud Treader
Java EE 7: the Voyage of the Cloud Treader
 
Introduction to WCF RIA Services for Silverlight 4 Developers
Introduction to WCF RIA Services for Silverlight 4 DevelopersIntroduction to WCF RIA Services for Silverlight 4 Developers
Introduction to WCF RIA Services for Silverlight 4 Developers
 
Integrated Services for Web Applications
Integrated Services for Web ApplicationsIntegrated Services for Web Applications
Integrated Services for Web Applications
 
Gaelyk - Web Apps In Practically No Time
Gaelyk - Web Apps In Practically No TimeGaelyk - Web Apps In Practically No Time
Gaelyk - Web Apps In Practically No Time
 
CDI and Seam 3: an Exciting New Landscape for Java EE Development
CDI and Seam 3: an Exciting New Landscape for Java EE DevelopmentCDI and Seam 3: an Exciting New Landscape for Java EE Development
CDI and Seam 3: an Exciting New Landscape for Java EE Development
 
JBoss at Work: Using JBoss AS 6
JBoss at Work: Using JBoss AS 6JBoss at Work: Using JBoss AS 6
JBoss at Work: Using JBoss AS 6
 
WF and WCF with AppFabric – Application Infrastructure for OnPremise Services
WF and WCF with AppFabric – Application Infrastructure for OnPremise ServicesWF and WCF with AppFabric – Application Infrastructure for OnPremise Services
WF and WCF with AppFabric – Application Infrastructure for OnPremise Services
 
“What did I do?” - T-SQL Worst Practices
“What did I do?” - T-SQL Worst Practices“What did I do?” - T-SQL Worst Practices
“What did I do?” - T-SQL Worst Practices
 
Building RESTful Services with WCF 4.0
Building RESTful Services with WCF 4.0Building RESTful Services with WCF 4.0
Building RESTful Services with WCF 4.0
 
Building Facebook Applications on Windows Azure
Building Facebook Applications on Windows AzureBuilding Facebook Applications on Windows Azure
Building Facebook Applications on Windows Azure
 
Architecting Smarter Apps with Entity Framework
Architecting Smarter Apps with Entity FrameworkArchitecting Smarter Apps with Entity Framework
Architecting Smarter Apps with Entity Framework
 
Agile Estimation
Agile EstimationAgile Estimation
Agile Estimation
 
Alternate JVM Languages
Alternate JVM LanguagesAlternate JVM Languages
Alternate JVM Languages
 
A Cocktail of Guice and Seam, the missing ingredients for Java EE 6
A Cocktail of Guice and Seam, the missing ingredients for Java EE 6A Cocktail of Guice and Seam, the missing ingredients for Java EE 6
A Cocktail of Guice and Seam, the missing ingredients for Java EE 6
 
A Bit of Design Thinking for Developers
A Bit of Design Thinking for DevelopersA Bit of Design Thinking for Developers
A Bit of Design Thinking for Developers
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Is NoSQL The Future of Data Storage?

  • 1. Is NoSQL the Future of Data Storage? By Gary Short Developer Express
  • 2. Introduction • Gary Short • Technical Evangelist for Developer Express • C# MVP • garys@devexpress.com • www.garyshort.org • @garyshort. 2
  • 3. What About You Guys? 3
  • 4. Breadth First Look @ NoSQL 4
  • 5. Be Doing 3 Things 1. Define NoSQL databases 2. Look at scenarios where you can use NoSQL 3. Drill into a specific use case. 5
  • 6. 6
  • 7. Where Does NoSQL Originate? • 1998 – OS relational database • Created by Carlo Strozzi • Didn’t expose an SQL interface • Called NoSQL • The author said: • “departs from the relational model altogether...” • “...should have been called ‘NoREL”. 7
  • 8. More Recently... • Eric Evans reintroduced the term in 2009 – Johan Oskarsson (last.fm) • Event to discuss OS distributed databases • This labels growing number datastores – Open source – Non-relational – Distributed – (often) don’t guarantee ACID. 8
  • 9. Atlanta 2009 • No:sql(east) conference • Billed as “conference of no-rel datastores” • Worst tag line ever – SELECT fun, profit FROM real_world WHERE rel=false. 9
  • 11. Let’s Talk a Bit About What NoSQL DBs Look Like... 11
  • 12. Key Attributes of NoSQL Databases • Don’t require fixed table schemas • Non-relational • (Usually) avoid join operations • Scale horizontally – Adding more nodes to a storage system. 12
  • 13. What Does the Taxonomy Look Like? 13
  • 14. Document Store • RavenDB • Apache Jackrabbit • CouchDB • MongoDB • SimpleDB • XML Databases – MarkLogic Server – eXist. 14
  • 16. Graph Storage • Trinity • AllegroGraph • Core Data • Neo4j • DEX • FlockDB. 16
  • 17. Which Means? • Graph consists of – Node (‘stations’ of the graph) – Edges (lines between them) • FlockDB – Created by the Twitter folks – Nodes = Users – Edges = Nature of relationship between nodes. 17
  • 19. Key/Value Stores • On disk • Cache in Ram • Eventually Consistent – Weak Definition • “If no updates occur for a period, eventually all updates will propagate through the system and all replicas will be consistent” – Strong Definition • “for a given update and a given replica eventually either the update reaches the replica or the replica retires” • Ordered – Distributed Hash Table allows lexicographical processing. 19
  • 20. Object Databases • Db4o • GemStone/S • InterSystems Caché • Objectivity/DB • ZODB. 20
  • 21. How the &*$% do You Index That?! 21
  • 22. Okay got it, Now Let’s Compare Some Real World Scenarios 22
  • 23. You Need Constant Consistency • You’re dealing with financial transactions • You’re dealing with medical records • You’re dealing with bonded goods • Best you use a RDMBS ☺. 23
  • 24. You Need Horizontal Scalability • You’re working across defined geographic regions • You’re working with large quantities of data • Game server sharding • Use NoSQL – Something like Cassandra. 24
  • 25. Up in the Clouds Baby 25
  • 26. 26
  • 27. Frequently Written Rarely Read • Think web counters and the like • Every time a user comes to a page = ctr++ • But it’s only read when the report is run • Use NoSQL (key-value storage/memcache). 27
  • 28. I Got Big Data! 28
  • 29. Binary Baby! • If you are YouTube • Flickr • Twitpic • Spotify • NoSQL (Amazon S3). 29
  • 30. Here Today Gone Tomorrow • Transient data like.. – Web Sessions – Locks – Short Term Stats • Shopping cart contents • Use NoSQL (Memcache). 30
  • 31. Data Replication • Same data in two or more locations – Music Library • Web browser • iPone App • NoSQL (CouchDB). 31
  • 32. Hit me Baby One More Time! • High Availability – High number of important transactions • Online gambling • Pay Per view – Ahem! • Online Auction • NoSQL (Cassandra – automatic clustering). 32
  • 33. Give me a Real World Example • Twitter – The challenges • Needs to store many graphs – Who you are following – Who’s following you – Who you receive phone notifications from etc • To deliver a tweet requires rapid paging of followers • Heavy write load as followers are added and removed • Set arithmetic for @mentions (intersection of users). 33
  • 34. What Did They Try? • Relational Databases • Key-Value storage of denormalized lists 34
  • 36. What Did They Need? • Simplest possible thing that would work • Allow for horizontal partitioning • Allow write operations to – Arrive out of order – Or be processed more than once • Failures should result in redundant work – Not lost work! 36
  • 37. The Result was FlockDB • Stores graph data • Not optimised for graph traversal operations • Optimised for large adjacency lists – List of all edges in a graph • Each entry is a set of end points (or tuple if directed) • Optimised for fast read and write • Optimised for page-able set arithmetic. 37
  • 38. How Does it Work? • Stores graphs as sets of edges between nodes • Data is partitioned by node – All queries can be answered by a single partition • Write operations are idempotent – Can be applied multiple times without changing the result • And commutative – Changing the order of operands doesn’t change the result. 38
  • 39. A Little More About Idempotency • Applied several times with no change to the result • A operation ’O’ on set S is called idempotent if, for all x in S, x O x = x. • Set union – A U B = {X: X E A or X E B} • Set intersection – A n B = {X: X E A and X E B} 39
  • 40. A Little More About Commutative • Changing the order of operands doesn’t change the result. 3+2=5 • Can be combined with idempotency • Let’s look at the follow command in Twitter • Let X = follow person X • Let Y = follow person Y • Then 3X + 2Y = 2Y + 3X • And 2X + 3Y = 3X + 2Y • Note: it’s only true for the same operation. 40
  • 41. Commutative Writes Help Bring up Partitions • Partition can receive write traffic immediately • Receive dump of data in the background • Live for read as soon as the dump is complete. 41
  • 42. Performance? • Currently store 13 billion edges • 20K writes / second • 100K reads / second. 42
  • 43. Punchline? • Under all the bells and whistles... – Its MySQL ☺. 43
  • 44. So is this the Future? • Yes! • And No! 44
  • 45. What?! How Can That be?! 45