SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
NoSQL Data Modeling
Concepts and Cases


Shashank Tiwari
blog: shanky.org | twitter: @tshanky
st@treasuryofideas.com
NoSQL?
NoSQL : Various Shapes and Sizes

• Document Databases


• Column-family Oriented Stores


• Key/value Data stores


• XML Databases


• Object Databases


• Graph Databases
Key Questions

• How do I model data for my application?


• How do I determine which one is right for me?


• Can I easily shift from one database to the other?


• Is there a standard way of storing, accessing, and querying data?
Agenda for this session

• Explore some of the main NoSQL products


• Understand how they are similar and different


• How best to use these products in the stack


•
Document Databases




• also GenieDB, SimpleDB
What is a document db?

• One that stores documents


• Popular options:


  • MongoDB -- C++


  • CouchDB -- Erlang


  • Also Amazon’s SimpleDB


• ...what exactly is a document?
In the real world




• (Source: http://guide.couchdb.org/draft/why.html)
In terms of JSON

• {name: “John Doe”,


• zip: 10001}
What about db schema?

• Schema-less


• Different documents could be stored in a single collection
Data types: MongoDB

• Essential JSON types:


• string


• integer


• boolean


• double
Data types: MongoDB (...cont)

• Additional JSON types


• null, array and object


• BSON types -- binary encoded serialization of JSON like documents


   • date, binary data, object id, regular expression and code


   • (Reference: bsonspec.org)
A BSON example: object id
Data types: CouchDB

• Everything JSON


• Large objects: attachments
CRUD operations for documents

• Create


• Read


• Update


• Delete
MongoDB: Create Document

• use mydb


• w = {name: “John Doe”, zip: 10001};


• db.location.save(w);
Create db and collection

• Lazily created


• Implicitly created


• use mydb


• db.collection.save(w)
MongoDB: Read Document

• db.location.find({zip: 10001});


• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe",
  "zip" : 10001 }
MongoDB: Read Document (...cont)

• db.location.find({name: "John Doe"});


• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe",
  "zip" : 10001 }
MongoDB: Update Document

• Atomic operations on single documents


• db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
CouchDB: RESTful

• Supports REST verbs: GET, HEAD, PUT, POST, DELETE


• Supports Replication


• Supports the notion of attachments


• Could work in offline modes and supports small footprint profiles
Sorted Ordered Column-family Datastores

• Sorted


• Ordered


• Distributed


• Map
Essential schema
Multi-dimensional View
A Map/Hash View

•{


• "row_key_1" : { "name" : {


•     "first_name" : "Jolly", "last_name" : "Goodfellow"


•     } } },


•    "location" : { "zip": "94301" },
Architectural View (HBase)
The Persistence Mechanism
Model Wrappers (The GAE Way)

• Python


  • Model, Expando, PolyModel


• Java


  • JDO, JPA
HBase Data Access

• Thrift + Avro


• Java API -- HTable, HBaseAdmin


• Hive (SQL like)


• MapReduce -- sink and/or source
Transactions

• Atomic row level


• GAE Entity Groups
Indexes

• Row ordered


• Secondary indexes


• GAE style multiple indexes


  • thinking from output to query
Use cases

• Many Google’s Products


• Facebook Messaging


• StumbleUpon


  • Open TSDB


• Mahalo, Ning, Meetup, Twitter, Yahoo!


• Lily -- open source CMS built on HBase & Solr
Brewer’s CAP Theorem




• http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf


• http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
Distributed Systems & Consistency (case: success)
Distributed Systems & Consistency (case: failure)
Binding by Transactions
Consistency Spectrum
Inconsistency Window
RWN Math

• R – Number of nodes that are read from.


• W – Number of nodes that are written to.


• N – Total number of nodes in the cluster.




• In general: R < N and W < N for higher availability
R+W>N

• Easy to determine consistent state


• R + W = 2N


  • absolutely consistent, can provide ACID gaurantee


• In all cases when R + W > N there is some overlap between read and write
  nodes.
R = 1, W = N

• more reads than writes


•W=N


  • 1 node failure = entire system unavailable
R = N, W =1

•W=N


 • Chance of data inconsistency quite high


•R=N


 • Read only possible when all nodes in the cluster are available
R = W = ceiling ((N + 1)/2)
Effective quorum for eventual consistency
Eventual consistency variants

• Causal consistency -- A writes and informs B then B always sees updated
  value


• Read-your-writes-consistency -- A writes a new value and never see the old
  one


• Session consistency -- read-your-writes-consistency within a client session


• Monotonic read consistency -- once seen a new value, never return previous
  value


• Monotonic write consistency -- serialize writes by the same process
Dynamo Techniques

• Consistent Hashing (Incremental scalability)


• Vector clocks (high availability for writes)


• Sloppy quorum and hinted handoff (recover from temporary failure)


• Gossip based membership protocol (periodic, pair wise, inter-process
  interactions, low reliability, random peer selection)


• Anti-entropy using Merkle trees


• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-
  dynamo-sosp2007.pdf)
Consistent Hashing
CouchDB MVCC Style




• (Source: http://guide.couchdb.org/draft/consistency.html)
Key/value Stores

• Memcached


• Membase


• Redis


• Tokyo Cabinet


• Kyoto Cabinet


• Berkeley DB
Questions?




• blog: shanky.org | twitter: @tshanky


• st@treasuryofideas.com

Contenu connexe

Tendances

MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2Fabio Fumarola
 
MongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL DatabaseMongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL DatabaseGaurav Awasthi
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and HowBigBlueHat
 
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
Benefits of using MongoDB: Reduce Complexity & Adapt to ChangesBenefits of using MongoDB: Reduce Complexity & Adapt to Changes
Benefits of using MongoDB: Reduce Complexity & Adapt to ChangesAlex Nguyen
 
Cool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDBCool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDBJan Hentschel
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBLee Theobald
 
Azure doc db (slideshare)
Azure doc db (slideshare)Azure doc db (slideshare)
Azure doc db (slideshare)David Green
 
Introduction à DocumentDB
Introduction à DocumentDBIntroduction à DocumentDB
Introduction à DocumentDBMSDEVMTL
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsSpringPeople
 
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced FeaturesAndrew Liu
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo dbRohit Bishnoi
 
MongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesMongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesHadi Ariawan
 

Tendances (20)

Mongo DB
Mongo DBMongo DB
Mongo DB
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
 
Azure DocumentDB
Azure DocumentDBAzure DocumentDB
Azure DocumentDB
 
No sql
No sqlNo sql
No sql
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
SQL & NoSQL
SQL & NoSQLSQL & NoSQL
SQL & NoSQL
 
MongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL DatabaseMongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL Database
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
Benefits of using MongoDB: Reduce Complexity & Adapt to ChangesBenefits of using MongoDB: Reduce Complexity & Adapt to Changes
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
 
Cool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDBCool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDB
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Azure doc db (slideshare)
Azure doc db (slideshare)Azure doc db (slideshare)
Azure doc db (slideshare)
 
Introduction à DocumentDB
Introduction à DocumentDBIntroduction à DocumentDB
Introduction à DocumentDB
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
 
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
 
The What and Why of NoSql
The What and Why of NoSqlThe What and Why of NoSql
The What and Why of NoSql
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesMongoDB Schema Design by Examples
MongoDB Schema Design by Examples
 

En vedette

Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big DataDATAVERSITY
 
Ocean base海量结构化数据存储系统 hadoop in china
Ocean base海量结构化数据存储系统 hadoop in chinaOcean base海量结构化数据存储系统 hadoop in china
Ocean base海量结构化数据存储系统 hadoop in chinaknuthocean
 
Couchdb and me
Couchdb and meCouchdb and me
Couchdb and meiammutex
 
Mysql HandleSocket技术在SNS Feed存储中的应用
Mysql HandleSocket技术在SNS Feed存储中的应用Mysql HandleSocket技术在SNS Feed存储中的应用
Mysql HandleSocket技术在SNS Feed存储中的应用iammutex
 
Consistency Models in New Generation Databases
Consistency Models in New Generation DatabasesConsistency Models in New Generation Databases
Consistency Models in New Generation Databasesiammutex
 
8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slide8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slideiammutex
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed SystemsShane Johnson
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingDATAVERSITY
 
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سوم
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سومآموزش مدیریت بانک اطلاعاتی اوراکل - بخش سوم
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سومfaradars
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Modelsiammutex
 
Data Modeling for Integration of NoSQL with a Data Warehouse
Data Modeling for Integration of NoSQL with a Data WarehouseData Modeling for Integration of NoSQL with a Data Warehouse
Data Modeling for Integration of NoSQL with a Data WarehouseDaniel Upton
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkDvir Volk
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDBrogerbodamer
 
Coherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architectureCoherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architectureUniversity of Pisa
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed SystemsDATAVERSITY
 

En vedette (20)

Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big Data
 
Ocean base海量结构化数据存储系统 hadoop in china
Ocean base海量结构化数据存储系统 hadoop in chinaOcean base海量结构化数据存储系统 hadoop in china
Ocean base海量结构化数据存储系统 hadoop in china
 
Couchdb and me
Couchdb and meCouchdb and me
Couchdb and me
 
Ooredis
OoredisOoredis
Ooredis
 
Mysql HandleSocket技术在SNS Feed存储中的应用
Mysql HandleSocket技术在SNS Feed存储中的应用Mysql HandleSocket技术在SNS Feed存储中的应用
Mysql HandleSocket技术在SNS Feed存储中的应用
 
Consistency Models in New Generation Databases
Consistency Models in New Generation DatabasesConsistency Models in New Generation Databases
Consistency Models in New Generation Databases
 
8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slide8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slide
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data Modeling
 
skip list
skip listskip list
skip list
 
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سوم
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سومآموزش مدیریت بانک اطلاعاتی اوراکل - بخش سوم
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سوم
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Models
 
Data Modeling for Integration of NoSQL with a Data Warehouse
Data Modeling for Integration of NoSQL with a Data WarehouseData Modeling for Integration of NoSQL with a Data Warehouse
Data Modeling for Integration of NoSQL with a Data Warehouse
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
Coherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architectureCoherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architecture
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
 

Similaire à SDEC2011 NoSQL Data modelling

SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
Webinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDBWebinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDBMongoDB
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011David Funaro
 
NoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooNoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooAndrew Brust
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooAndrew Brust
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
No SQL : Which way to go? Presented at DDDMelbourne 2015
No SQL : Which way to go?  Presented at DDDMelbourne 2015No SQL : Which way to go?  Presented at DDDMelbourne 2015
No SQL : Which way to go? Presented at DDDMelbourne 2015Himanshu Desai
 
Object Relational Database Management System
Object Relational Database Management SystemObject Relational Database Management System
Object Relational Database Management SystemAmar Myana
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewAntonio Pintus
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 

Similaire à SDEC2011 NoSQL Data modelling (20)

SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Webinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDBWebinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDB
 
Mongodb my
Mongodb myMongodb my
Mongodb my
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011
 
NoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooNoSQL and The Big Data Hullabaloo
NoSQL and The Big Data Hullabaloo
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
No sql Database
No sql DatabaseNo sql Database
No sql Database
 
No SQL : Which way to go? Presented at DDDMelbourne 2015
No SQL : Which way to go?  Presented at DDDMelbourne 2015No SQL : Which way to go?  Presented at DDDMelbourne 2015
No SQL : Which way to go? Presented at DDDMelbourne 2015
 
NoSQL, which way to go?
NoSQL, which way to go?NoSQL, which way to go?
NoSQL, which way to go?
 
Object Relational Database Management System
Object Relational Database Management SystemObject Relational Database Management System
Object Relational Database Management System
 
Drop acid
Drop acidDrop acid
Drop acid
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overview
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
NoSQL Introduction
NoSQL IntroductionNoSQL Introduction
NoSQL Introduction
 

Plus de Korea Sdec

SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerKorea Sdec
 
SDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionSDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionKorea Sdec
 
SDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopSDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopKorea Sdec
 
Sdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoopSdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoopKorea Sdec
 
SDEC2011 Essentials of Pig
SDEC2011 Essentials of PigSDEC2011 Essentials of Pig
SDEC2011 Essentials of PigKorea Sdec
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutKorea Sdec
 
SDEC2011 Essentials of Hive
SDEC2011 Essentials of HiveSDEC2011 Essentials of Hive
SDEC2011 Essentials of HiveKorea Sdec
 
Sdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopSdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopKorea Sdec
 
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveSDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveKorea Sdec
 
SDEC2011 Rapidant
SDEC2011 RapidantSDEC2011 Rapidant
SDEC2011 RapidantKorea Sdec
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACCKorea Sdec
 
SDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesSDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesKorea Sdec
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedKorea Sdec
 
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudSDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudKorea Sdec
 

Plus de Korea Sdec (15)

SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuer
 
SDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionSDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestion
 
SDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopSDEC2011 Introducing Hadoop
SDEC2011 Introducing Hadoop
 
Sdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoopSdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoop
 
SDEC2011 Essentials of Pig
SDEC2011 Essentials of PigSDEC2011 Essentials of Pig
SDEC2011 Essentials of Pig
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
SDEC2011 Essentials of Hive
SDEC2011 Essentials of HiveSDEC2011 Essentials of Hive
SDEC2011 Essentials of Hive
 
Sdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopSdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
 
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveSDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
 
SDEC2011 Rapidant
SDEC2011 RapidantSDEC2011 Rapidant
SDEC2011 Rapidant
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACC
 
SDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesSDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & Experiences
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
 
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudSDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloud
 

Dernier

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Dernier (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

SDEC2011 NoSQL Data modelling

  • 1. NoSQL Data Modeling Concepts and Cases Shashank Tiwari blog: shanky.org | twitter: @tshanky st@treasuryofideas.com
  • 3. NoSQL : Various Shapes and Sizes • Document Databases • Column-family Oriented Stores • Key/value Data stores • XML Databases • Object Databases • Graph Databases
  • 4. Key Questions • How do I model data for my application? • How do I determine which one is right for me? • Can I easily shift from one database to the other? • Is there a standard way of storing, accessing, and querying data?
  • 5. Agenda for this session • Explore some of the main NoSQL products • Understand how they are similar and different • How best to use these products in the stack •
  • 6. Document Databases • also GenieDB, SimpleDB
  • 7. What is a document db? • One that stores documents • Popular options: • MongoDB -- C++ • CouchDB -- Erlang • Also Amazon’s SimpleDB • ...what exactly is a document?
  • 8. In the real world • (Source: http://guide.couchdb.org/draft/why.html)
  • 9. In terms of JSON • {name: “John Doe”, • zip: 10001}
  • 10. What about db schema? • Schema-less • Different documents could be stored in a single collection
  • 11. Data types: MongoDB • Essential JSON types: • string • integer • boolean • double
  • 12. Data types: MongoDB (...cont) • Additional JSON types • null, array and object • BSON types -- binary encoded serialization of JSON like documents • date, binary data, object id, regular expression and code • (Reference: bsonspec.org)
  • 13. A BSON example: object id
  • 14. Data types: CouchDB • Everything JSON • Large objects: attachments
  • 15. CRUD operations for documents • Create • Read • Update • Delete
  • 16. MongoDB: Create Document • use mydb • w = {name: “John Doe”, zip: 10001}; • db.location.save(w);
  • 17. Create db and collection • Lazily created • Implicitly created • use mydb • db.collection.save(w)
  • 18. MongoDB: Read Document • db.location.find({zip: 10001}); • { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 19. MongoDB: Read Document (...cont) • db.location.find({name: "John Doe"}); • { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 20. MongoDB: Update Document • Atomic operations on single documents • db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
  • 21. CouchDB: RESTful • Supports REST verbs: GET, HEAD, PUT, POST, DELETE • Supports Replication • Supports the notion of attachments • Could work in offline modes and supports small footprint profiles
  • 22. Sorted Ordered Column-family Datastores • Sorted • Ordered • Distributed • Map
  • 25. A Map/Hash View •{ • "row_key_1" : { "name" : { • "first_name" : "Jolly", "last_name" : "Goodfellow" • } } }, • "location" : { "zip": "94301" },
  • 28. Model Wrappers (The GAE Way) • Python • Model, Expando, PolyModel • Java • JDO, JPA
  • 29. HBase Data Access • Thrift + Avro • Java API -- HTable, HBaseAdmin • Hive (SQL like) • MapReduce -- sink and/or source
  • 30. Transactions • Atomic row level • GAE Entity Groups
  • 31. Indexes • Row ordered • Secondary indexes • GAE style multiple indexes • thinking from output to query
  • 32. Use cases • Many Google’s Products • Facebook Messaging • StumbleUpon • Open TSDB • Mahalo, Ning, Meetup, Twitter, Yahoo! • Lily -- open source CMS built on HBase & Solr
  • 33. Brewer’s CAP Theorem • http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf • http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
  • 34. Distributed Systems & Consistency (case: success)
  • 35. Distributed Systems & Consistency (case: failure)
  • 39. RWN Math • R – Number of nodes that are read from. • W – Number of nodes that are written to. • N – Total number of nodes in the cluster. • In general: R < N and W < N for higher availability
  • 40. R+W>N • Easy to determine consistent state • R + W = 2N • absolutely consistent, can provide ACID gaurantee • In all cases when R + W > N there is some overlap between read and write nodes.
  • 41. R = 1, W = N • more reads than writes •W=N • 1 node failure = entire system unavailable
  • 42. R = N, W =1 •W=N • Chance of data inconsistency quite high •R=N • Read only possible when all nodes in the cluster are available
  • 43. R = W = ceiling ((N + 1)/2) Effective quorum for eventual consistency
  • 44. Eventual consistency variants • Causal consistency -- A writes and informs B then B always sees updated value • Read-your-writes-consistency -- A writes a new value and never see the old one • Session consistency -- read-your-writes-consistency within a client session • Monotonic read consistency -- once seen a new value, never return previous value • Monotonic write consistency -- serialize writes by the same process
  • 45. Dynamo Techniques • Consistent Hashing (Incremental scalability) • Vector clocks (high availability for writes) • Sloppy quorum and hinted handoff (recover from temporary failure) • Gossip based membership protocol (periodic, pair wise, inter-process interactions, low reliability, random peer selection) • Anti-entropy using Merkle trees • (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon- dynamo-sosp2007.pdf)
  • 47. CouchDB MVCC Style • (Source: http://guide.couchdb.org/draft/consistency.html)
  • 48. Key/value Stores • Memcached • Membase • Redis • Tokyo Cabinet • Kyoto Cabinet • Berkeley DB
  • 49. Questions? • blog: shanky.org | twitter: @tshanky • st@treasuryofideas.com