SlideShare a Scribd company logo
1 of 22
NoSQL databases and MapReduce J Singh Early Stage IT
What’s so fun about databases? Traditional database discussions talked about Employee records Bank records Now we talk about Web search Data mining The collective intelligence of tweets Scientific and medical databases
How much data can a database hold? The biggest OLTP databases 2001: 1.1 – 10.3 TB. 2003: 9.1 – 29.2 TB. 2005: 17.7 – 100.4 TB. 2010: ~2.5 PB. The trend will continue Very large databases bring new unique challenges
Historical Context Late 1990’s. The web scales out. Suddenly, databases not adequate for holding the data being accumulated Scale out vs. Scale up
Brewer’s Conjecture (p1) Source: Eric Brewer’s July 2000 PODC Keynote Main points: Classic “Distributed Systems” don’t work They focus on computation, not data Distributing computation is easy, distributing data is hard DBMS research is about ACID (mostly) Atomicity, Consistency, Isolation and Durability But we forfeit “C” and “I” for availability, graceful degradation and performance – this tradeoff is fundamental BASE Basically Available Soft-state Eventual Consistency
Brewer’s Conjecture (p2) BASE Weak consistency stale data OK Availability first Best effort Approximate answers OK Aggressive (optimistic) Simpler! Faster Easier evolution ACID Strong consistency Isolation Focus on “commit” Nested transactions Availability? Conservative (pessimistic) Difficult evolution (e.g. schema) But I think it’s a spectrum Eric Brewer
CAP Theorem Since then, Brewer’s conjecture formally proved: Gilbert & Lynch, 2002 Thus Brewer’s conjecture became the CAP theorem… …and contributed to the birth of the NoSQL movement But the theory is not settled While http://nosql-database.org/ lists 122 NoSQL databases
What is NoSQL? Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema nor do they use the concept of joins All NoSQL offerings relax one or more of the ACID properties
Forces at Work Three major papers were the seeds of the NoSQL movement CAP Theorem (discussed above) BigTable(Google) Dynamo (Amazon) Some types of data could not be modeled well in RDBMS Document Storage and Indexing Recursive Data and Graphs Time Series Data Genomics Data
NoSQL Databases Key-Value Stores A storage system that stores values, indexed by a key. Example: Voldemort, Dynomite, Tokyo Cabinet BigTable Clones (aka "ColumnFamily") A tabular model where each row (at least in theory) can have an individual configuration of columns. Example: HBase, Hypertable, Cassandra, Amazon SimpleDB
NoSQL Databases Document Databases Collections of documents, which contain key-value collections (called "documents") Example: CouchDB, MongoDB, Riak Graph Databases Nodes & relationships, both of which can hold key-value pairs Example: AllegroGraph, InfoGrid, Neo4j
Amazon SimpleDB Key-value store Written in Erlang, (as is CouchDB) Data is modeled in terms of Domain, a container of entities, Item, an entity and  Attribute and Value, a property of an Item Eventually Consistent, except when ReadConsistent flag specified Impressive performance numbers,  e.g., .7 sec to store 1 million records SQL-like SELECT select output_list from domain_name [where expression] [sort_instructions] [limit limit]
Google Datastore Part of App Engine; also used for internal applications Used for all storage Incorporates a transaction model to ensure high consistency Optimistic locking Transactions can fail CAP implications Datastore isn’t just “eventually consistent” They offer two commercial options (with different prices) Master/Slave  Low latency but also lower availability Asynchronous replication High Replication Strong availability at the cost of higher latency
[object Object]
For more info, see video of Ryan Barrett’s talk at Google I/ODatastore Application at Google
Databases and Key-Value Stores http://browsertoolkit.com/fault-tolerance.png
MapReduce Conceptual Underpinnings Programming model from Lisp and other functional languages (map square '(1 2 3 4))  (1 4 9 16) (reduce + '(1 4 9 16)) 30  Easy to distribute Nice failure/retry semantics
MapReduce Flow
HadoopMapReduce An Open Source project of the Apache Foundation Other Hadoop-related projects at Apache include: Cassandra™: A scalable multi-master database with no single points of failure. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Pig™: A high-level data-flow language and execution framework for parallel computation. See the Apache Hadoop website for more.
Hadoop Availability Run on your laptop Run on your server Run on Amazon Cloud Introduction at IBM DeveloperWorks Run on Google App Engine It’s not Hadoop, it’s Google’s implementation of MapReduce
MapReduce Statistics @ GOOG Take-away message: MapReduce is not a “new-fangled technology of the future” It is here, it is proven, use it!
End of an Era? The Relational Model is not necessarily the answer It was excellent for data processing Not a natural fit for Data Warehouses Web-oriented search Real-time analytics, and Semi-structured data i.e., Semantic Web SQL is not the answer Coupling between modern programming languages and SQL are “ugly beyond belief” Programming languages have evolved while SQL has remained static Pascal C/C++ Java The little languages: Python, Perl, PHP, Ruby ,[object Object],A critique of the “one size fits all” assumption in DBMS

More Related Content

What's hot

NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
Advanced SQL For Data Scientists
Advanced SQL For Data ScientistsAdvanced SQL For Data Scientists
Advanced SQL For Data ScientistsDatabricks
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
 
Apache Arrow and Python: The latest
Apache Arrow and Python: The latestApache Arrow and Python: The latest
Apache Arrow and Python: The latestWes McKinney
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark Aakashdata
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideWhizlabs
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDBvaluebound
 

What's hot (20)

NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Advanced SQL For Data Scientists
Advanced SQL For Data ScientistsAdvanced SQL For Data Scientists
Advanced SQL For Data Scientists
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
SQL & NoSQL
SQL & NoSQLSQL & NoSQL
SQL & NoSQL
 
Sqoop
SqoopSqoop
Sqoop
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Apache Arrow and Python: The latest
Apache Arrow and Python: The latestApache Arrow and Python: The latest
Apache Arrow and Python: The latest
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
SQL - RDBMS Concepts
SQL - RDBMS ConceptsSQL - RDBMS Concepts
SQL - RDBMS Concepts
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 

Viewers also liked

Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesArangoDB Database
 
Query Languages for Document Stores
Query Languages for Document StoresQuery Languages for Document Stores
Query Languages for Document StoresInteractiveCologne
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
Amadeus big data
Amadeus big dataAmadeus big data
Amadeus big dataMayank Shah
 
Chaordic - BigData e MapReduce - Robson Motta
Chaordic - BigData e MapReduce - Robson Motta Chaordic - BigData e MapReduce - Robson Motta
Chaordic - BigData e MapReduce - Robson Motta Chaordic
 
(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014
(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014
(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014Amazon Web Services
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation Ericsson Labs
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsMethod360
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessDataWorks Summit
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Alexandre Morgaut
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessDataWorks Summit
 
The NoSQL Way in Postgres
The NoSQL Way in PostgresThe NoSQL Way in Postgres
The NoSQL Way in PostgresEDB
 
A testbeszéd
A testbeszédA testbeszéd
A testbeszédnovigi
 

Viewers also liked (20)

Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databases
 
CouchDB Map/Reduce
CouchDB Map/ReduceCouchDB Map/Reduce
CouchDB Map/Reduce
 
Query Languages for Document Stores
Query Languages for Document StoresQuery Languages for Document Stores
Query Languages for Document Stores
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Amadeus big data
Amadeus big dataAmadeus big data
Amadeus big data
 
Chaordic - BigData e MapReduce - Robson Motta
Chaordic - BigData e MapReduce - Robson Motta Chaordic - BigData e MapReduce - Robson Motta
Chaordic - BigData e MapReduce - Robson Motta
 
Drupal 6 Database layer
Drupal 6 Database layerDrupal 6 Database layer
Drupal 6 Database layer
 
Ejemplos acid
Ejemplos acidEjemplos acid
Ejemplos acid
 
Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
 
(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014
(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014
(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine Business
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine Business
 
The NoSQL Way in Postgres
The NoSQL Way in PostgresThe NoSQL Way in Postgres
The NoSQL Way in Postgres
 
A testbeszéd
A testbeszédA testbeszéd
A testbeszéd
 
Arckifejezések
ArckifejezésekArckifejezések
Arckifejezések
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
MapReduce入門
MapReduce入門MapReduce入門
MapReduce入門
 

Similar to NoSQL databases and MapReduce: From Big Data to distributed computing models

Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, HowIgor Moochnick
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12mark madsen
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastInside Analysis
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for roboticsJoão Gabriel Lima
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldJongwook Woo
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...South London Geek Nights
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilledrICh morrow
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and HowBigBlueHat
 
NoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsNoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsAtulKabbur
 
The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage SystemSchubert Zhang
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasThoughtworks
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 

Similar to NoSQL databases and MapReduce: From Big Data to distributed computing models (20)

Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for robotics
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data World
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
Nosql
NosqlNosql
Nosql
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
NoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsNoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbms
 
The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage System
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
NoSQL Basics - A Quick Tour
NoSQL Basics - A Quick TourNoSQL Basics - A Quick Tour
NoSQL Basics - A Quick Tour
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 

More from J Singh

OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashingJ Singh
 
Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big dataJ Singh
 
Open LSH - september 2014 update
Open LSH  - september 2014 updateOpen LSH  - september 2014 update
Open LSH - september 2014 updateJ Singh
 
PaaS - google app engine
PaaS  - google app enginePaaS  - google app engine
PaaS - google app engineJ Singh
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)J Singh
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsJ Singh
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceJ Singh
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data LaboratoryJ Singh
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceSocial Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceJ Singh
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data AnalysisJ Singh
 
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitJ Singh
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlJ Singh
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query OptimizationJ Singh
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementJ Singh
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index StructuresJ Singh
 
CS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceCS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceJ Singh
 
CS 542 Overview of query processing
CS 542 Overview of query processingCS 542 Overview of query processing
CS 542 Overview of query processingJ Singh
 

More from J Singh (20)

OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
 
Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big data
 
Open LSH - september 2014 update
Open LSH  - september 2014 updateOpen LSH  - september 2014 update
Open LSH - september 2014 update
 
PaaS - google app engine
PaaS  - google app enginePaaS  - google app engine
PaaS - google app engine
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and Tradeoffs
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/Reduce
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceSocial Media Mining using GAE Map Reduce
Social Media Mining using GAE Map Reduce
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data Analysis
 
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed Commit
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency Control
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index Structures
 
CS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceCS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and Performance
 
CS 542 Overview of query processing
CS 542 Overview of query processingCS 542 Overview of query processing
CS 542 Overview of query processing
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

NoSQL databases and MapReduce: From Big Data to distributed computing models

  • 1. NoSQL databases and MapReduce J Singh Early Stage IT
  • 2. What’s so fun about databases? Traditional database discussions talked about Employee records Bank records Now we talk about Web search Data mining The collective intelligence of tweets Scientific and medical databases
  • 3. How much data can a database hold? The biggest OLTP databases 2001: 1.1 – 10.3 TB. 2003: 9.1 – 29.2 TB. 2005: 17.7 – 100.4 TB. 2010: ~2.5 PB. The trend will continue Very large databases bring new unique challenges
  • 4. Historical Context Late 1990’s. The web scales out. Suddenly, databases not adequate for holding the data being accumulated Scale out vs. Scale up
  • 5. Brewer’s Conjecture (p1) Source: Eric Brewer’s July 2000 PODC Keynote Main points: Classic “Distributed Systems” don’t work They focus on computation, not data Distributing computation is easy, distributing data is hard DBMS research is about ACID (mostly) Atomicity, Consistency, Isolation and Durability But we forfeit “C” and “I” for availability, graceful degradation and performance – this tradeoff is fundamental BASE Basically Available Soft-state Eventual Consistency
  • 6. Brewer’s Conjecture (p2) BASE Weak consistency stale data OK Availability first Best effort Approximate answers OK Aggressive (optimistic) Simpler! Faster Easier evolution ACID Strong consistency Isolation Focus on “commit” Nested transactions Availability? Conservative (pessimistic) Difficult evolution (e.g. schema) But I think it’s a spectrum Eric Brewer
  • 7. CAP Theorem Since then, Brewer’s conjecture formally proved: Gilbert & Lynch, 2002 Thus Brewer’s conjecture became the CAP theorem… …and contributed to the birth of the NoSQL movement But the theory is not settled While http://nosql-database.org/ lists 122 NoSQL databases
  • 8. What is NoSQL? Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema nor do they use the concept of joins All NoSQL offerings relax one or more of the ACID properties
  • 9. Forces at Work Three major papers were the seeds of the NoSQL movement CAP Theorem (discussed above) BigTable(Google) Dynamo (Amazon) Some types of data could not be modeled well in RDBMS Document Storage and Indexing Recursive Data and Graphs Time Series Data Genomics Data
  • 10. NoSQL Databases Key-Value Stores A storage system that stores values, indexed by a key. Example: Voldemort, Dynomite, Tokyo Cabinet BigTable Clones (aka "ColumnFamily") A tabular model where each row (at least in theory) can have an individual configuration of columns. Example: HBase, Hypertable, Cassandra, Amazon SimpleDB
  • 11. NoSQL Databases Document Databases Collections of documents, which contain key-value collections (called "documents") Example: CouchDB, MongoDB, Riak Graph Databases Nodes & relationships, both of which can hold key-value pairs Example: AllegroGraph, InfoGrid, Neo4j
  • 12. Amazon SimpleDB Key-value store Written in Erlang, (as is CouchDB) Data is modeled in terms of Domain, a container of entities, Item, an entity and Attribute and Value, a property of an Item Eventually Consistent, except when ReadConsistent flag specified Impressive performance numbers, e.g., .7 sec to store 1 million records SQL-like SELECT select output_list from domain_name [where expression] [sort_instructions] [limit limit]
  • 13. Google Datastore Part of App Engine; also used for internal applications Used for all storage Incorporates a transaction model to ensure high consistency Optimistic locking Transactions can fail CAP implications Datastore isn’t just “eventually consistent” They offer two commercial options (with different prices) Master/Slave Low latency but also lower availability Asynchronous replication High Replication Strong availability at the cost of higher latency
  • 14.
  • 15. For more info, see video of Ryan Barrett’s talk at Google I/ODatastore Application at Google
  • 16. Databases and Key-Value Stores http://browsertoolkit.com/fault-tolerance.png
  • 17. MapReduce Conceptual Underpinnings Programming model from Lisp and other functional languages (map square '(1 2 3 4))  (1 4 9 16) (reduce + '(1 4 9 16)) 30 Easy to distribute Nice failure/retry semantics
  • 19. HadoopMapReduce An Open Source project of the Apache Foundation Other Hadoop-related projects at Apache include: Cassandra™: A scalable multi-master database with no single points of failure. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Pig™: A high-level data-flow language and execution framework for parallel computation. See the Apache Hadoop website for more.
  • 20. Hadoop Availability Run on your laptop Run on your server Run on Amazon Cloud Introduction at IBM DeveloperWorks Run on Google App Engine It’s not Hadoop, it’s Google’s implementation of MapReduce
  • 21. MapReduce Statistics @ GOOG Take-away message: MapReduce is not a “new-fangled technology of the future” It is here, it is proven, use it!
  • 22.
  • 23. Take Aways NoSQL databases are a solution to web-scale problems A lot of data lives outside relational databases With SQLnix.org, we are starting a local resource for NoSQL database knowledge Taking on projects to apply the technology, not just read about it. If you want to work on it, please contact us. Thanks