SlideShare une entreprise Scribd logo
1  sur  20
SQL vs. NoSQL 
Making the right choice 
18 September 2014
The contenders 
SQL NoSQL 
© Copyright Dimension Data 18 September 2014 2
SQL Databases 
• RDBMS 
• Standardized 
• Mature 
• Reliable 
• Well understood 
• Queryable 
• ACID 
© Copyright Dimension Data 18 September 2014 3
NoSQL scalability argument 
• Scale-Up vs Scale-Out 
• Use of commodity hardware 
• Locking / Latching 
• Consistency over partitions 
• Availability of partitions 
• Referential integrity 
Cost of scaling 
SQL NoSQL 
© Copyright Dimension Data 18 September 2014 4
Other RDBMS / SQL Database drawbacks 
• One-solution-fits-all 
• Slow for certain tasks 
• ACID is not always needed 
• ORM required 
• Lack of flexibility 
• Rigid schema 
• Management complexity 
• Add-on solutions 
• XML-fields, Filestreams 
• Full-text indexes 
© Copyright Dimension Data 18 September 2014 5
CAP theorem (Brewer's theorem) 
© Copyright Dimension Data 18 September 2014 6
NoSQL Use Cases 
• Bigness / Avoid hitting the wall 
• Massive write performance 
• Write availability 
• Fast key-value access 
• Flexible schema and flexible datatypes 
• Schema migration 
• No single point of failure 
• Generally available parallel computing 
• Easier maintainability, administration and operations 
• Programmer ease of use 
• Use the right data model for the right problem 
• Tunable CAP tradeoffs 
© Copyright Dimension Data 18 September 2014 7
ACID Transactions 
Atomicity 
Consistancy 
Isolation 
Durability 
© Copyright Dimension Data 18 September 2014 8
NoSQL ACID Trade-offs 
• Dropping Atomicity lets you shorten the 
time tables (sets of data) are locked. 
MongoDB, CouchDB. 
• Dropping Consistency lets you scale up 
writes across cluster nodes. 
Riak, Cassandra. 
• Dropping Durability lets you respond to 
write commands without flushing to disk. 
Memcache, Redis. 
© Copyright Dimension Data 18 September 2014 9
NoSQL Database Main Types 
• Key-Value Store 
• A basic dictionary design storing values under unique keys 
• The database does not care about the structure of the value 
• Examples: 
• Memcache 
• Riak 
• Azure Blob Storage 
• Good at: 
• Handles size well 
• Processing a constant stream of small reads and writes 
• Fast 
• Programmer friendly 
© Copyright Dimension Data 18 September 2014 10
NoSQL Database Main Types 
• Column Store 
• A column is a tuple of 3 elements: unique name of value, a typed 
value, timestamp 
• Columns may be part of column families 
• Columns need not appear in every record 
• Example: 
• Hbase 
• Hypertable 
• Cassandra 
• Azure Table Storage 
• Good at: 
• Handles size well 
• Stream massive write loads 
• High availability 
• Multiple-data centers 
• MapReduce. 
© Copyright Dimension Data 18 September 2014 11
NoSQL Database Main Types 
• Document Store 
• Use a unique key to store and retrieve a JSON document 
• Documents are schemaless 
• Metadata is added to the document to aid querying 
• Indexing of documents and metadata speeds up retrieval 
• Example: 
• CouchDB 
• MongoDB 
• RavenDB 
• Azure DocumentDB service (Preview) 
• Good at: 
• Natural data modeling 
• Programmer friendly 
• Rapid development 
• Web friendly 
• CRUD 
© Copyright Dimension Data 18 September 2014 12
NoSQL Database Main Types 
• Graph Database 
• Uses graph structures with nodes, edges, and properties to represent 
and store data 
• Every element contains a direct pointer to its adjacent elements 
• Example: 
• AllegroGraph 
• InfoGrid 
• Neo4j 
• Good at: 
• Complicated graph problems 
• Topographical data 
• Fast 
© Copyright Dimension Data 18 September 2014 13
NoSql Database Type Comparison 
Data Model Performance Scalability Flexibility Complexity Functionality 
Key–Value 
Store 
high high high none variable (none) 
Column- 
Oriented Store 
high high moderate low minimal 
Document- 
Oriented Store 
high variable (high) high low variable (low) 
Graph 
Database 
variable variable high high graph theory 
Relational 
Database 
variable variable low moderate 
relational 
algebra 
© Copyright Dimension Data 18 September 2014 14
Things to consider when choosing 
• Where are you starting from? 
• What are you trying to accomplish? 
• Things to Consider... 
• Your Problem 
• Access pattern, scalability, consistency, durability 
• Money 
• Scaling, admins, license, operating cost 
• Programming 
• Flexible schema, JSON, REST, language, graphs 
• Performance 
• Reads, writes, consistency, workload, eventual consistency 
• Features 
• Cross datacenter, upgrades, indexes, persistence, tunability 
• The vendor 
• Viability, future direction, responsiveness, partnerships 
© Copyright Dimension Data 18 September 2014 15
Big Data – Petabyte range 
Microsoft HDInsight 
= 
Hadoop as a service on Azure (+ .NET) 
© Copyright Dimension Data 18 September 2014 16
Hadoop components 
© Copyright Dimension Data 18 September 2014 17
Using Hadoop 
© Copyright Dimension Data 18 September 2014 18
Hadoop cluster size 
Yahoo! wins with a massive 42000 node cluster 
© Copyright Dimension Data 18 September 2014 19
Questions 
USE [Euricom] 
SELECT [Question] 
FROM [dbo].[FAQ] 
WHERE [Answer] IS NULL 
(0 row(s) affected) 
© Copyright Dimension Data 18 September 2014 20

Contenu connexe

Tendances

Tendances (20)

SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
Rdbms vs. no sql
Rdbms vs. no sqlRdbms vs. no sql
Rdbms vs. no sql
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Sql vs. NoSql
Sql vs. NoSqlSql vs. NoSql
Sql vs. NoSql
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
SQL vs NoSQL
SQL vs NoSQLSQL vs NoSQL
SQL vs NoSQL
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data Modeling
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
 

En vedette

Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
ArangoDB Database
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 

En vedette (15)

Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Un acercamiento a las bases de datos NoSQL
Un acercamiento a las bases de datos NoSQLUn acercamiento a las bases de datos NoSQL
Un acercamiento a las bases de datos NoSQL
 
How to Speed up your Database
How to Speed up your DatabaseHow to Speed up your Database
How to Speed up your Database
 
Introducción al mundo NoSQL
Introducción al mundo NoSQLIntroducción al mundo NoSQL
Introducción al mundo NoSQL
 
MONGODB - NOSQL
MONGODB - NOSQLMONGODB - NOSQL
MONGODB - NOSQL
 
MongoDB
MongoDBMongoDB
MongoDB
 
SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL Databases
 
NoSQL: Introducción a las Bases de Datos no estructuradas
NoSQL: Introducción a las Bases de Datos no estructuradasNoSQL: Introducción a las Bases de Datos no estructuradas
NoSQL: Introducción a las Bases de Datos no estructuradas
 
Distributed applications using Hazelcast
Distributed applications using HazelcastDistributed applications using Hazelcast
Distributed applications using Hazelcast
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 
SQL vs. NoSQL
SQL vs. NoSQLSQL vs. NoSQL
SQL vs. NoSQL
 
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4jBases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databases
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 

Similaire à Sql vs nosql

A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
Qian Lin
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
Optimize MySQL For Developers-Qcon2011
Optimize MySQL For Developers-Qcon2011Optimize MySQL For Developers-Qcon2011
Optimize MySQL For Developers-Qcon2011
Yiwei Ma
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
Sergey Bushik
 

Similaire à Sql vs nosql (20)

introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Practical Design Patterns for Building Applications Resilient to Infrastructu...
Practical Design Patterns for Building Applications Resilient to Infrastructu...Practical Design Patterns for Building Applications Resilient to Infrastructu...
Practical Design Patterns for Building Applications Resilient to Infrastructu...
 
Mongo db model relationships with documents
Mongo db model relationships with documentsMongo db model relationships with documents
Mongo db model relationships with documents
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQL
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
Optimize MySQL performance for developers
Optimize MySQL performance for developersOptimize MySQL performance for developers
Optimize MySQL performance for developers
 
Optimize MySQL For Developers-Qcon2011
Optimize MySQL For Developers-Qcon2011Optimize MySQL For Developers-Qcon2011
Optimize MySQL For Developers-Qcon2011
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
 
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayCassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
 

Dernier

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Dernier (20)

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Sql vs nosql

  • 1. SQL vs. NoSQL Making the right choice 18 September 2014
  • 2. The contenders SQL NoSQL © Copyright Dimension Data 18 September 2014 2
  • 3. SQL Databases • RDBMS • Standardized • Mature • Reliable • Well understood • Queryable • ACID © Copyright Dimension Data 18 September 2014 3
  • 4. NoSQL scalability argument • Scale-Up vs Scale-Out • Use of commodity hardware • Locking / Latching • Consistency over partitions • Availability of partitions • Referential integrity Cost of scaling SQL NoSQL © Copyright Dimension Data 18 September 2014 4
  • 5. Other RDBMS / SQL Database drawbacks • One-solution-fits-all • Slow for certain tasks • ACID is not always needed • ORM required • Lack of flexibility • Rigid schema • Management complexity • Add-on solutions • XML-fields, Filestreams • Full-text indexes © Copyright Dimension Data 18 September 2014 5
  • 6. CAP theorem (Brewer's theorem) © Copyright Dimension Data 18 September 2014 6
  • 7. NoSQL Use Cases • Bigness / Avoid hitting the wall • Massive write performance • Write availability • Fast key-value access • Flexible schema and flexible datatypes • Schema migration • No single point of failure • Generally available parallel computing • Easier maintainability, administration and operations • Programmer ease of use • Use the right data model for the right problem • Tunable CAP tradeoffs © Copyright Dimension Data 18 September 2014 7
  • 8. ACID Transactions Atomicity Consistancy Isolation Durability © Copyright Dimension Data 18 September 2014 8
  • 9. NoSQL ACID Trade-offs • Dropping Atomicity lets you shorten the time tables (sets of data) are locked. MongoDB, CouchDB. • Dropping Consistency lets you scale up writes across cluster nodes. Riak, Cassandra. • Dropping Durability lets you respond to write commands without flushing to disk. Memcache, Redis. © Copyright Dimension Data 18 September 2014 9
  • 10. NoSQL Database Main Types • Key-Value Store • A basic dictionary design storing values under unique keys • The database does not care about the structure of the value • Examples: • Memcache • Riak • Azure Blob Storage • Good at: • Handles size well • Processing a constant stream of small reads and writes • Fast • Programmer friendly © Copyright Dimension Data 18 September 2014 10
  • 11. NoSQL Database Main Types • Column Store • A column is a tuple of 3 elements: unique name of value, a typed value, timestamp • Columns may be part of column families • Columns need not appear in every record • Example: • Hbase • Hypertable • Cassandra • Azure Table Storage • Good at: • Handles size well • Stream massive write loads • High availability • Multiple-data centers • MapReduce. © Copyright Dimension Data 18 September 2014 11
  • 12. NoSQL Database Main Types • Document Store • Use a unique key to store and retrieve a JSON document • Documents are schemaless • Metadata is added to the document to aid querying • Indexing of documents and metadata speeds up retrieval • Example: • CouchDB • MongoDB • RavenDB • Azure DocumentDB service (Preview) • Good at: • Natural data modeling • Programmer friendly • Rapid development • Web friendly • CRUD © Copyright Dimension Data 18 September 2014 12
  • 13. NoSQL Database Main Types • Graph Database • Uses graph structures with nodes, edges, and properties to represent and store data • Every element contains a direct pointer to its adjacent elements • Example: • AllegroGraph • InfoGrid • Neo4j • Good at: • Complicated graph problems • Topographical data • Fast © Copyright Dimension Data 18 September 2014 13
  • 14. NoSql Database Type Comparison Data Model Performance Scalability Flexibility Complexity Functionality Key–Value Store high high high none variable (none) Column- Oriented Store high high moderate low minimal Document- Oriented Store high variable (high) high low variable (low) Graph Database variable variable high high graph theory Relational Database variable variable low moderate relational algebra © Copyright Dimension Data 18 September 2014 14
  • 15. Things to consider when choosing • Where are you starting from? • What are you trying to accomplish? • Things to Consider... • Your Problem • Access pattern, scalability, consistency, durability • Money • Scaling, admins, license, operating cost • Programming • Flexible schema, JSON, REST, language, graphs • Performance • Reads, writes, consistency, workload, eventual consistency • Features • Cross datacenter, upgrades, indexes, persistence, tunability • The vendor • Viability, future direction, responsiveness, partnerships © Copyright Dimension Data 18 September 2014 15
  • 16. Big Data – Petabyte range Microsoft HDInsight = Hadoop as a service on Azure (+ .NET) © Copyright Dimension Data 18 September 2014 16
  • 17. Hadoop components © Copyright Dimension Data 18 September 2014 17
  • 18. Using Hadoop © Copyright Dimension Data 18 September 2014 18
  • 19. Hadoop cluster size Yahoo! wins with a massive 42000 node cluster © Copyright Dimension Data 18 September 2014 19
  • 20. Questions USE [Euricom] SELECT [Question] FROM [dbo].[FAQ] WHERE [Answer] IS NULL (0 row(s) affected) © Copyright Dimension Data 18 September 2014 20

Notes de l'éditeur

  1. Consistency (all nodes see the same data at the same time) Availability (a guarantee that every request receives a response about whether it was successful or failed) Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
  2. Atomicity requires that each transaction is "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. To the outside world, a committed transaction appears (by its effects on the database) to be indivisible ("atomic"), and an aborted transaction does not happen. The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code) but merely that any programming errors do not violate any defined rules. The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e. one after the other. Providing isolation is the main goal of concurrency control. Depending on concurrency control method, the effects of an incomplete transaction might not even be visible to another transaction. Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory.
  3. Apache Hadoop is a framework that allows for the distributed processing of such large data sets across clusters of machines. Apache Hadoop, at its core, consists of 2 sub-projects ? Hadoop MapReduce and Hadoop Distributed File System. Hadoop MapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. HDFS is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. Other Hadoop-related projects at Apache include Chukwa, Hive, HBase, Mahout, Sqoop and ZooKeeper. HDFS - Filesystems that manage the storage across a network of machines are called distributed filesystems. HDFS is designed for storing very large files with write-once-ready-many-times patterns, running on clusters of commodity hardware. MapReduce - MapReduce is a framework for processing highly distributable problems across huge datasets using a large number of computers (nodes), collectively referred to as a cluster. The framework is inspired by the map and reduce functions commonly used in functional programming. Chukwa - Chukwa is a Hadoop subproject devoted to large-scale log collection and analysis. Chukwa is built on top of HDFS and MapReduce framework and inherits Hadoop’s scalability and robustness. Hive - Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis. HiveServer provides a Thrift interface and a JDBC / ODBC server. HBase - HBase is the Hadoop application to use when you require real-time read/write random-access to very large datasets. It is a distributed column-oriented database built on top of HDFS. Mahout - Mahout is an open source machine learning library from Apache. It’s highly scalable. Mahout aims to be the machine learning tool of choice when the collection of data to be processed is very large, perhaps far too large for a single machine. Sqoop/Flume - Sqoop allows easy import and export of data from structured data stores such as relational databases, enterprise data warehouses, and NoSQL systems. The dataset being transferred is sliced up into different partitions and a map-only job is launched with individual mappers responsible for transferring a slice of this dataset. ZooKeeper - ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming.