Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
Chargement dans…3
×

Consultez-les par la suite

1 sur 84 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Publicité

Similaire à NoSQL (20)

NoSQL

  1. 1. Data, data, data. I cannot make bricks without clay. <br />Sherlock Holmes, Sherlock Holmes [2009]<br />
  2. 2. Data<br />Qualitative or Quantitative attributes of a variable or set of variables<br />Lowest level of abstraction from which information and then knowledge are derived.<br />Representation of a fact, figure and idea.<br />
  3. 3. A well organized newspaper or a clumsy, cluttered one?<br />
  4. 4. Data explosion<br />From Gigabytes to Terabytes to Petabytes to perhaps (I’m out of nomenclature)-bytes<br />
  5. 5. NoSQL<br />= Not Only SQL!= No to SQL<br />!= Never SQL<br />
  6. 6. Open Source<br />Abridged version of this presentation and notes will be available for everyone.<br />Distributed under no License<br />FREE AS IN SPEECH AND BEER<br />
  7. 7. WEB 2.0<br />DDBMS<br />RDBMS performance<br />OODB<br />RnD<br />Cloud Computing<br />Multiple Solutions<br />Necessity is the mother of Invention<br />
  8. 8. SQL Databases, the ‘Hammer’<br />It’s a wonderful tool<br />
  9. 9. Commercial SQL Databases<br />Even Gods use it<br />Design<br />Power<br />Ergonomics<br />Ease of use<br />Features<br />Warranty<br />Upgrades<br />Apart from<br />Hole in the Pocket<br />
  10. 10. Nail is a nail, Screw is a screw<br />Hammering a screw or Screw driving a nail is FOOLISHNESS!<br />
  11. 11. Non-relational next generation operational data stores and databases<br />What?<br />NoSQL is a new look at data to deliver:<br /><ul><li>High Performance
  12. 12. Unlimited horizontal scalability
  13. 13. Economic, common, unreliable hardware
  14. 14. Auto Sharding
  15. 15. Support for wide range of data
  16. 16. Recursive, Hierarchical
  17. 17. Non-Rigid
  18. 18. High Availability</li></li></ul><li>What? (Continued…)<br />Partly or completely independent of RDBMS concepts<br />No specific implementation<br />Breakthrough Approaches<br />Key:<br />Non-relational approach<br />Non-ACIDness<br />A STEP BACKWARDS, THEN MANY STEPS FORWARD<br />
  19. 19. NoSQL, the ‘screwdriver’<br />Yet another tool in our repository to go along with the hammer<br />
  20. 20. NoSQL is about choice<br />Not all problems are nails.<br />Not all screws are same.<br />GOOD PROGRAMMING PRACTICE: <br />Know your tools and use them appropriately<br />
  21. 21. SQL Databases<br />Data<br />Relational<br />Tabular – Rows/Columns<br />Interface<br />Sql<br />Basic Design Inspiration<br />Set Theory<br />ACID Design<br />Scale Up Design<br /><ul><li>Oracle
  22. 22. MySQL
  23. 23. Teradata
  24. 24. SQLite
  25. 25. SQL Server</li></ul>And many more<br />
  26. 26. Why?<br /><ul><li> Is all data really relational?
  27. 27. If Consistency is ensured, do we have to enforce/check it again at the database level.
  28. 28. Are RDBMS ready for challenges of the future like:
  29. 29. Dynamic schema/metadata
  30. 30. Huge amounts of data
  31. 31. Through horizontal auto scaling
  32. 32. Ability to handle complex data types
  33. 33. Images, Videos, Audios and much more</li></ul>Not Really!<br />
  34. 34. Why? (Continued…)<br />RDBMS drawbacks:<br />Scalability<br />CRUD<br />Performance<br />Write Overhead<br />Limited by single disk architecture<br />Lack of In Memory design<br />Rigid schema design<br />And more …..<br />
  35. 35. HAMMERS<br />Are under some<br />Hammering<br />
  36. 36. DRAWBACKS<br />E<br />E<br />P<br />D<br />I<br />V<br />E<br />
  37. 37. Scalability<br />True Scalability<br />Horizontal Scaling<br />Transparency to the application<br />No single point of failure<br />Problems with SQL databases<br />Vertical Scaling<br />Partitioning aka Sharding<br />Read Slaves<br />Anti Patterns<br />Normalized Data<br />Joins<br />ACID Transactions<br />
  38. 38. No Breadcrumbs<br />CRUD is crude<br />Delete/Update strategy is improper<br />CRA!<br />Create, Read, Archive – way to go ahead<br />Audit information is lost in CRUD but not in the case of CRA<br />
  39. 39. Naive Data Support<br />Not designed for <br />Complex Data Structures<br />Recursive<br />Hierarchical<br />Ordered List<br />Circular<br />Dynamic Metadata<br />
  40. 40. Logical/Physical separation concerns<br />Relational model -> Logical Model<br />RDBMS implement it at physical level<br />Using Multiple indices<br />Artificial overhead in managing the database<br />Frequent drop and create index to make DB perform<br />
  41. 41. Spinning Disk Storage<br />Design flaw for most RDBMS systems<br />With cheaper memory, Memory based approach should also be included in the design<br />Defiance of Moore’s law<br />Disk reads grew only 12.5 times in about 50 years<br />Disk writes much lesser.<br />Disk write is expensive.<br />RDBMS make things worse by writing more.<br />ACID rains are UNHEALTHY<br />
  42. 42. Think ‘Out of the ROM’<br />
  43. 43. At Snail’s pace<br />RDBMS engine growth – SLOW<br />Optimizations have been minor since initial days<br />Majority of growth due to Moore’s law<br />Faster hardware<br />Slightly faster storage<br />Faster memory<br />What when Moore’s law diminishes thanks to external factors like heat generated.<br />
  44. 44. Database size limits<br />RDBMS are too slow<br />Over multiterabyte and petabyte databases<br />Purpose designed parallel processing would be needed to handle such capacities of data in a RDBMS.<br />
  45. 45. RDBMS<br /> has been there since years <br />and is proven technology<br />What aboutNoSQL<br />
  46. 46. RDBMS<br />grew fast but <br />growth slowed down over time and <br />might eventually reach a stale point<br />NoSQL<br />unarguably a new immature tool, <br />has been growing faster than RDBMS ever did<br />and is being supported by the Big Players<br />
  47. 47. Did you say<br />BIG PLAYERS!<br />WHO?<br />
  48. 48. NoSQL Real World Implementations<br /><ul><li>Google – BigTable
  49. 49. Facebook – Hbase
  50. 50. Digg – Cassandra
  51. 51. Amazon – Dynamo
  52. 52. Trend Micro – Hbase
  53. 53. Netflix – Amazon SimpleDB
  54. 54. Shutterfly – MongoDB
  55. 55. LinkedIn – Voldemort</li></ul>and more<br />Microsoft is considering NoSQL as well for Azure services so is Twitter<br />Are we next?<br />Major IT Companies have implemented or even better created their own NoSQL to manage huge Data stores which couldn’t be managed by SQL Databases.<br />
  56. 56. We are used to <br />SQL and relatedness, <br />why can’t they just fix RDBMS<br />to handle Big Data<br />STORAGE SEEK RATES<br />Large writes and ACID being a huge limitation<br />Big Data can be handled via <br />Scale Out/Partitionability across Multiple Nodes<br />
  57. 57. CAP Theorem<br />Applies to distributed shared data system<br />
  58. 58. CAP THEOREM <br />
  59. 59. A Deeper look<br />Consistency: The system is in a consistent state after an operation<br />All clients see the same data<br />Strong Consistency(ACID) vs. Eventual (BASE)<br />Availability: ‘Always On’ mode, no downtime<br />All clients can find some available replica<br />Software/hardware upgrade tolerance<br />Partition Tolerance: The system continues to function even when split into disconnected subsets (by a network disruption)<br />Reads and Writes combined<br />
  60. 60. CP<br /><ul><li>Some data maybe inaccessible but rest is accurate/consistent
  61. 61. Sharded database
  62. 62. TERADATA comes here</li></ul>CA<br /><ul><li>Single Site Clusters</li></ul>RDBMS<br />Paxos<br />NoSQL<br />AP<br /><ul><li>System is still available under partitioning but some of the data returned may be inaccurate</li></li></ul><li>All of the operations in the transaction will complete, or none will.<br />The database will be in a consistent state when the transaction begins and ends.<br />The transaction will behave as if it is the only operation being performed upon the database.<br />Upon completion of the transaction, the operation will not be reversed.<br />Atomicity<br />Consistency<br />Isolation<br />Durability<br />
  63. 63. Basically<br />Available<br />Soft State<br />Eventually<br /> Consistent<br />When Availability and Partitionability are prioritized over Consistency, think in terms of BASE<br />
  64. 64. Eventual Consistency<br />If no new updates are made to the object, eventually all accesses will return the last updated value.<br />Ex: Domain Name System (DNS)<br />
  65. 65. Types of Eventual Consistency<br />Read-your-write consistency<br />Session consistency<br />Monotonic read consistency<br />Monotonic write consistency<br />Causal consistency<br />Practically, Read-your-write consistency and monotonic read consistency are desirable in an eventually consistent system<br />
  66. 66. Hash()<br />Different Apps – Different CAP requirement<br />Prioritize among<br />Consistency – Availability<br />Availability – Partitionability<br />Consistency - Partitionability<br />
  67. 67. WHERE?<br />So will NoSQL eventually replace RDBMSs everywhere?No, RDBMS are there to stay.<br />NoSQL is here to help.<br />
  68. 68. Wherever you want to take<br />Advantage<br />of <br />NoSQL<br />
  69. 69. Big Data<br />Denormalize<br />Shard<br />Scale Out<br />And look no further than NoSQL<br />
  70. 70. Write Intensive Applications<br />I/OpS of the Best storage device <<< n * I/OpS of relatively cheaper storage devices in simple terms: ‘HARNESS THE POWER OF YOUR CLOUD’<br />
  71. 71. Fast Key-Value Access<br />NoSQL – ‘User, you are looking for $value’<br />RDBMS – ‘Query executing ….’<br />A O(1) Hash operation or O(log n) B+/B tree traversals<br />
  72. 72. Flexible Schema and Data types<br />‘I once was a integer, then a string then a date; What am I’ - FieldRDBMS – ‘WTH! Whatever you are, You are beyond my scope’<br />
  73. 73. Transient Data<br />Data – ‘I’m here only for a while and want to get my work done fast’<br />RDBMS – ‘You are data and you shall be treated like the rest’<br />NoSQL – ‘Okay, I’ll allot you space in the RAM using Memcached If available otherwise you still have my cloud’<br />
  74. 74. High Write Availability<br />Warning - Incoming data ….NoSQL – ‘Anytime you like, user’<br />RDBMS – ‘This is insane, I’m already busy with other things’<br />
  75. 75. ECONOMICS<br />RDBMS – ‘I’m powered by a wonderful, beautiful rabbit’<br />NoSQL – ‘I’m powered by many cute little hamsters’ <br />
  76. 76. No Single Point of Failure<br />Designed to run over<br />Economic<br />Commonly Available<br />Unreliable hardware<br />
  77. 77. Full table scan operations<br />MapReduce:<br />Map: <br />To define your problems into optimal sub problems which can be computed in parallel and reduced later<br />Reduce:<br />To merge the sub optimal solutions into the result<br />Divide and Conquer your way to Victory<br />Powered by MapReduce! Or something similar<br />
  78. 78. Ability to restore, maintain, repair itself<br />No DBA required Design<br />
  79. 79. HOW?<br />Let us welcome <br />Keys, Values, Collections, Data Structures, Objects, Documents Graphs<br />
  80. 80. NoSQL View<br />The basic approach at data:<br />Key/Value store<br />Run on multiple machines<br />Partitions and Replication across these machines<br />Relax consistency<br />Aim at Eventual Consistency<br />Asynchronous replication<br />But not all NoSQL take the same path.<br />
  81. 81. Document Store<br />Key-Value Store<br />Object<br />NoSQL<br />Multivalue<br />Graph Stores<br />BigTable Clones<br />Tuble Store<br />
  82. 82. Key-Value Stores<br />One key, one value, no duplicates and crazy fast<br />Distributed hash tables<br />The value is stored as binary object – BLOB<br />The DB doesn’t understand it and doesn’t want to<br />Ex: Amazon Dynamo, MemcacheDB<br />
  83. 83. Key4<br />Key3<br />Key2<br />Key1<br />Key/Value store doesn’t know what is in here<br />
  84. 84. Document Store<br />Key-value store, but the value is structured and understood by the DB<br />Querying data is possible<br />On not just the key<br />Ex: MongoDB, CouchDB, Riaketc<br />
  85. 85. Each database has collections<br />Each collection has a set of documents<br />They are well-designed for access through applications<br />Suitable for web applications<br />Few Document databases provide SQL Like query interface now<br />
  86. 86. Key4<br />Key3<br />Key2<br />Key1<br />Name: $NameValue: $Value<br />Version: $Version<br />Type: $Type<br />Emb Object1<br />Objects inside Objects<br />CRAZY!<br />Emb Object2<br />
  87. 87. BigTable & its Clones<br />Database, tables, rows, columns and ’ SuperColumn’<br />Row consists of columns and SuperColumns<br />Few supercolumns can be made a must<br />Each supercolumn – arbitrary set of columns<br />Rows are typically versioned by a system assigned timestamp.<br />
  88. 88. Intended for tables with huge number of columns<br />Millions can also be supported very easily<br />‘a sparse, distributed multi-dimensional sorted map’<br />Also referred to as Wide Column stores<br />Ex: Google BigTable, Cassandra, Hbase, Voldemort, Azure Tables<br />
  89. 89. Key1<br />Key2<br />Key3<br />
  90. 90. Graph Databases<br />Nodes, Edges, Properties<br />Replace traditional tables, columns, rows<br />Graph database can be implement in different ways<br />Key/value store, columnar, bigtable clone or even combination of these<br />Fields are used to directly store the id of another entity forming the edge<br />
  91. 91. Graph database is a multi-relational graph<br />No need for secondary indexes<br />Relationships in RDBMS are ‘weak’<br />Relationships in Graphs are ‘strong’<br />The rest don’t really care about relations at db level<br />
  92. 92. Address<br />Age: 32<br />Matt<br />Mobile<br />April<br />Is related to<br />SSN<br />Spouse<br />owns<br />Drives<br />Honda<br />Model<br />City<br />registration<br />
  93. 93. Key-Value Store<br />Size<br />Document Store<br />BigTable Clone<br />Graph Databases<br />Complexity<br />
  94. 94. Too Many Cooks and Recipes<br />No specific recipe!<br />Major implementations:<br />Graph<br />Document store<br />Tabular<br />Key value store<br />Eventually consistent<br />Hierarchical<br />Ordered<br />Other Known Recipes:<br />Multivalue<br />Object<br />Tuble Store<br />
  95. 95. The Menu<br />On Disk<br />BigTable<br />Membase<br />Tokyo Cabinet<br />In RAM<br />Memcached<br />Velocity<br />Eventually Consistent<br />Cassandra<br />Dynamo<br />Riak<br />Hierarchical<br />GT.M<br />Ordered<br />Berkeley DB<br />NMDB<br />C-ISAM<br />Multivalue<br />eXe<br />OpenQM<br />Document Store<br />CouchDB<br />Lotus Notes<br />MongoDB<br />Graph<br />AllegroGraph<br />Neo4j<br />DEX<br />Tabular<br />BigTable<br />Hbase<br />HyperTable<br />The list isn’t even a quarter of the whole<br />
  96. 96. _theOpenSourceIssue<br />Most of them are open source <br />Thus fork-ablelike Linux<br />The first of the lot<br />Google’s BigTable<br />Amazon’s Dynamo<br />All in all, there are about 10 roots with 4 major ones.<br />
  97. 97. No single database to rule them all<br />
  98. 98. Real World Implementations<br />Digg’s 3TB for Green Badges [CASSANDRA]<br />Facebook’s 50TB for Inbox Search [HBASE]<br />eBay’s 2PB overall data<br />Google’s <br />
  99. 99. Naïve Recipe<br />
  100. 100. MongoDB<br />Document Store<br />JSON Storage<br />REST ….. Not out of the box<br />Map/Reduce<br />Master slave replication<br />Strong suite of query APIs<br />Good support for SQL<br />Work in Progress:<br />Autosharding based scalability<br />Failover support<br />Open Source<br />Non Relational<br />Scalable<br />Schemaless<br />Queryable<br />
  101. 101. Document Oriented<br />Mongo stores documents in collections<br />Documents are slightly enhanced JSON Objects<br />Complex data structures is very much possible<br />Data Modelling is a more natural process<br />
  102. 102. Embeddable Objects<br />Complexity.begin()<br />Embed objects within a single document<br />Document is an enhanced form of object like mentioned earlier<br />The same thing in RDBMS can be achieved using multiple tables and joining them together<br />Consider our requirement is to store a blogging post with this information<br />Post Content<br />Post Title<br />Post Author <br />Comments<br />Comment order<br />Comment content<br />Comment author <br />
  103. 103. RDBMS solution<br />
  104. 104. MongoDB Solution<br />Documents …. Each one of them is a post<br />{ Name: $name, <br />Author: $author,<br />Comment: [ { Author: $author1, <br />Comment: $comment1} , <br /> { Author: $author2,<br />Comment: $comment2,<br />Replies: [ { Author: $author3,<br />Comment: $comment3} ] } <br /> ]<br /> }<br />
  105. 105. RDBMS Viewpoint<br />
  106. 106. ODF<br />Mongodb’ed<br />
  107. 107.
  108. 108. Schema-less<br />No database enforced Schema<br />Addition, Deletion of columns are simple<br />Its about how the application uses APIs<br />Data definition need not be defined up front.<br />
  109. 109. Other Features<br />Data Tagging<br />Caching<br />Real Time Analytics<br />Image Storage<br />Dynamic Queries<br />Binary Storage<br />
  110. 110. MongoDB - Why Not? <br />Lacks transactions<br />Doesn’t completely support SQL<br />Lacks built-in revisioning system like CouchDB<br />Lacks full text searching features<br />
  111. 111. Try MongoDB @<br />http://try.mongodb.org/<br />
  112. 112. <br />EOL<br />
  113. 113. Calm down!<br />Eventually Answered System<br />All your questions will be answered eventually<br />

Notes de l'éditeur

  • SQL Databases approach data in the form of sets and tables. Incidentally its strength soon become its weakness.Assumptions made:Data is represented in the form of tables. Row and ColumnsData in each table can be related to data in another.Data can/has to be searchable through all columns.Strengths:Data manipulation through Set theory.Enforce relational constraints with its management system.Weakness:Relational ness becomes an overhead once data becomes real huge.Large amounts of writes in a SQL database is a lot of burden on the DBMS apart from the storage disk.
  • NoSQL is a collection of databases which elude from the drawbacks of RDBMS without completely giving up on Relational Models. They are not stringent when it comes to certain core RDBMS concepts like ACID complianceand other integrity constraints.The priority is to support high levels of scalability through easy partitioning abilities across multiple cheap naïve hardware by giving up on Consistency which SQL databases look at delivering apart from some amount of relatedness from the data.
  • The CAP theorem states that any shared-data system can only achieve two of these three.Consistency (All database clients see the same data, even with concurrent updates.)Availability (All database clients are able to access some version of the data.)Partition tolerance (The database can be split over multiple servers.)http://www.julianbrowne.com/article/viewer/brewers-cap-theoremhttp://devblog.streamy.com/2009/08/24/cap-theorem/http://www.royans.net/arch/brewers-cap-theorem-on-distributed-systems/

×