Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

No sql databases

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
No SQL
No SQL
Chargement dans…3
×

Consultez-les par la suite

1 sur 25 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à No sql databases (20)

Publicité

Plus récents (20)

No sql databases

  1. 1. NoSQL DATABASES
  2. 2. What is NOSQL ? • NOSQL is not a standard. • NOSQL does not mean "No SQL", rather “Not Only SQL” • But is also not a RDBMS replacement. • CAP [Consistency Availability Partition Tolerance] Theorem • BASE [ Basic Availability, Soft--‐state, Eventual Consistency] v/s ACID
  3. 3. Characteristics of a NoSQL Database • Flexible schema / schema less • Non relational • Often Distributed (Partitioned) • Often Replicated • Horizontally Scalable • Eventually consistent • Cheaper compared to Big names RDBMS systems • Simple API as compared to SQL (but not standard across products or even versions).
  4. 4. NoSQL pros/cons Advantages – Massive scalability – High availability – Lower cost (than competitive solutions at that scale) – (usually) predictable elasticity – Schema flexibility, sparse & semi-structured data
  5. 5. Disadvantages – Limited query capabilities (so far) – Eventual consistency is not intuitive to program for • Makes client applications more complicated – No standardizatrion • Portability might be an issue – Insufficient access control
  6. 6. Different types of NoSQL Databases • NoSQL databases are classified in four major data models: 1. Key-value 2. Document 3. Column family 4. Graph
  7. 7. 1. Key-value data model • The main idea is the use of a hash table • Access data (values) by strings called keys • Data has no required format – data may have any format • Data model: (key, value) pairs • Basic Operations: Insert(key , value), Fetch(key),Update(key), Delete(key)
  8. 8. Contd.. • key/value store • can be in memory only, or backed by disk persistence. • supports versioning • e.g. Voldemort (LinkedIn), Amazon SimpleDB, Memcache, BerkeleyDB, Oracle NoSQL
  9. 9. 1.1 Voldemort • Distributed key-value store – Based on Dynamo • Originally developed by LinkedIn, now open source • Features – Simple data model (no joins or complex queries, no RI, …) – P2P – Scale-out / elastic • Consistent hashing of keyspace • Fixed partitions (no splits, but owner may change when re-balancing) – Eventual consistency / High Availability – Replication – Failure handling
  10. 10. 2. Riak • Like Voldemort , Riak was based on Dynamo database • Offers key/value interface • Designed to run on large distributed clusters • Uses consistent hashing to avoid the need for the kind of centralized index server • Querying is handled using MapReduce functions written in JavaScript • It’s a open source for enterprise customers
  11. 11. 2. Document-based datamodel • Similar to Key-Value model, except value is a document. • Usually JSON like interchange model. • Query Model: JavaScript-like or custom. • Aggregations: Map/Reduce • Indexes are done via B-Trees. • unlike simple key-value stores, both keys and values are fully searchable in document databases. • e.g. Couchbase, MongoDB, RavenDB, ArangoDB, MarkLogic, OrientDB, RavenDB, Redis, RethinkDB
  12. 12. 2.1 CouchDB • Schema-free, document oriented database – Documents stored in JSON format (XML in old versions) – B-tree storage engine – MVCC model, no locking – no joins, no PK/FK (UUIDs are auto assigned) – Implemented in Erlang • 1st version in C++, 2nd in Erlang and 500 times more scalable (source: “Erlang Programming” by Cesarini & Thompson) – Replication (incremental) • Documents – UUID, version – Old versions retained
  13. 13. 2.2 MongoDB • Another popular Document Database • Data is stored on Disks but cached in memory for speed • Supports Replication and Partitioning (Sharding) • Very popular in Web Applications • Data is stored internally as BSON and exchanged with applications as JSON. • Very easy to setup and get started. • Not open--‐source but free to use (even commercially) and support license option.
  14. 14. A sample MongoDB query MySQL: MongoDB:
  15. 15. 2.3 Redis • Often referred to as a Data Structure Server • Supports storing strings, hashes, lists, sets , sorted sets bitmaps and hyperloglogs. • Data is kept in Memory • Extremely popular for short lived data (Session, cache) • Can be used as a Push/Pull Message Queue
  16. 16. 3. Column family data model • The column is lowest/smallest instance of data. • It is a tuple that contains a name, a value and a timestamp • Multiple columns (values) per key. • e.g. Cassandra, Hbase, Amazon Redshift, HP Vertica, Teradata, BigTable, Hypertable
  17. 17. 3.1 Cassandra • Data is stored column wise as opposed to row--‐wise • Supports partitioning (sharding) and replication even across data centers. • Can be used to store > Petabytes of data. • Supports SQL like CQL interface. • Open--‐source but commercially supported by DataStax.
  18. 18. 3.1 Cassandra – data model, partitioning • Data model – Same as BigTable – Super Columns (nested Columns) and Super Column Families – column order in a CF can be specified (name, time) • Dynamic partitioning – Consistent hashing – Ring of nodes – Nodes can be “moved” on the ring for load balancing
  19. 19. 3.2 BigTable • Sparse, distributed, persistent multidimensional sorted map • (row, column, timestamp) dimensions, value is string • Key features – Hybrid row/column store – Single master (stand-by replica) – Versioning – Compression
  20. 20. BigTable - architecture • Master server – Assign tablets to Tablet Servers – Balance TS load – Garbage collection – Schema management – Client data does not move through the MS (directly through TS) – Tablet location not handled by MS • Tablet server (many) – thousands of tablets per TS – Manages Read / Write / Split of its tablets
  21. 21. 3.3 HBase • Developed by Powerset, now Apache • Based on BigTable – HDFS (GFS), ZooKeeper (Chubby) – Master Node (Master Server), Region Servers (Tablet Servers) – HStore (tablet), memcache (memtable), MapFile (SSTable) • Features – Data is stored sorted (no real indexes) – Automatic partitioning – Automatic re-balancing / re-partitioning – Fault tolerance (HDFS, 3 replicas)
  22. 22. HBase - architecture
  23. 23. 3.4 Hypertable • It’s a open source clone of BigTable • Written in C++ • Has increased performance
  24. 24. 4. Graph data model • Based on Graph Theory. • Scale vertically, no clustering. • You can use graph algorithms easily • Transactions • ACID • For modeling the structure of Data • Uses Property Graph Data Model (Nodes, Relationships, properties) • e.g. Neo4j, InfiniteGraph, OrientDB, Titan GraphDB
  25. 25. Other Types / Special Purpose • Search DBs Solr, Elasticsearch • Object Databases • XML Databases

×