Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Lviv EDGE 2 - NoSQL

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 37 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (19)

Similaire à Lviv EDGE 2 - NoSQL (20)

Publicité

Plus récents (20)

Publicité

Lviv EDGE 2 - NoSQL

  1. 1. NoSQL By Zenyk Matchyshyn Staff Engineer, Lohika 1
  2. 2. Agenda • History • Architecture vs Technology • Classification • Pros and Cons of usage • Trends • Q/A 2
  3. 3. HISTORY 3
  4. 4. 4
  5. 5. History • NoSQL Technologies are not new • Many ideas originate from distributed computing, grid computing and parallel computing • Main drivers: • Scalability • Parallelization • Costs 5
  6. 6. Google • In the beginning… there was Google! • Google shared scientific papers: • “The Google File System”, October 2003 • “MapReduce: Simplified Data Processing on Large Clusters”, December 2004 • “Bigtable: A Distributed Storage System for Structured Data”, November 2006 • “The Chubby Lock Service for Loosely- Coupled Distributed Systems”, November 2006 6
  7. 7. Amazon • … and Amazon! • “Dynamo: Amazon Highly Available key/value Store”, October 2007 7
  8. 8. New technologies! • Creators of Lucene wanted to create a full search solution • Ended up with Hadoop and Hadoop Distributed File System (HDFS) • Success helped adoption and new solutions emerged 8
  9. 9. ARCHITECTURE VS TECHNOLOGY 9
  10. 10. Architecture vs Technology • SQL is not bad, it’s just different • You can use SQL DB in NoSQL way, e.g. MySQL as a key-value database • You can do SQL queries on Hadoop data 10
  11. 11. Architecture • The way you store data • The way you query data • Technology environment 11
  12. 12. CLASSIFICATION 12
  13. 13. Terms • ACID – Atomicity, Consistency, Isolation, Durability • CAP Theorem – Consistency, Availability, Partition tolerance • Eventual consistency • Hashing • Schema 13
  14. 14. Classification • Column oriented stores • Key/Value stores • Key/Value stores with configurable consistency • Document stores • Graph stores 14
  15. 15. Chart memcached Scalability & Performance Key/value Column oriented Document store RDBMS Depth of Functionality 15
  16. 16. Column oriented • Based on Google Bigtable • Column oriented is a revers of Row oriented • Assumption is that datacenters are transcontinental and connected using standard Internet • C and P from CAP Theorem • Data consistent and partitioned but trouble with availability 16
  17. 17. HBase • Spin off from Hadoop project - http://hbase.apache.org/ • Written in Java • A lot of interfaces – Thrift, REST, JRuby, etc. • SQL-like access through Hive - http://hive.apache.org/ • HBase ORM – Surus - https://github.com/mushkevych/surus • Used by Facebook, Hulu, Yahoo!, Ning, etc. 17
  18. 18. Hypertable • Developed by Zvents, open sourced • Written in C++ • Running on top of distributed file system • Used by Baidu 18
  19. 19. Key/Value • Key/Value Store – Oracle Berkley DB (Oracle NoSQL), Redis, Kyoto Cabinet • Can store strings, arrays, hashes 19
  20. 20. Oracle NoSQL • Sign of things to come! • http://www.oracle.com/technetwork/database/ nosqldb/overview/index.html • Written in Java • Configurable consistency • BerkleyDB as a backend • No single node of failure • Transactions 20
  21. 21. Redis • http://redis.io/ • Lots of bindings • Written in C • In-memory, with optional durability • Also a document store 21
  22. 22. Key/Value – eventual consistency • K/V Availability over Consistency • Inspired by Amazon Dynamo • Dynamo based on assumption of high speed network links between data centers and datacenters are close to each other • A and P from CAP Theorem • Achieve eventual consistency through replication and verification • Consistency is eventual 22
  23. 23. Cassandra • http://cassandra.apache.org/ • Multidimensional map indexed by key • No single point of failure • Decentralized • Tunable consistency • Used by Facebook, Cisco, IBM, Rackspace 23
  24. 24. Voldemort • http://project-voldemort.com/ • Developed by LinkedIn • Written in Java • Developers oriented – a lot of modules are pluggable • Strictly key/value 24
  25. 25. Document stores • Document Databases • Document oriented stores are semi structured • Mostly JSON oriented • Also called schema free rows • Can query by field 25
  26. 26. MongoDB • http://www.mongodb.org/ • Schema-free, document-oriented • Written in C++ • Lots of interfaces • JSON documents • Query language, supports indexing • Map/Reduce 26
  27. 27. CouchDB • http://couchdb.apache.org/ • RESTful API • JSON documents • Written in Erlang • Supports ACID • Map/Reduce • Eventual consistency 27
  28. 28. Graph • Provide ways to store graphs • Provide traversing • Graph oriented functionality 28
  29. 29. Neo4j • http://neo4j.org/ • Written in Java • Stores and navigates graphs • Stable and proven • Commercial and free licenses 29
  30. 30. PROS AND CONS OF USAGE 30
  31. 31. Pros and Cons • Scalability • Transactional Integrity and Consistency • Data Modeling • Query Support • Access and Interface Availability 31
  32. 32. Typical Usage • Large amount of data • Read/Write balanced? • Read Heavy • Write Heavy • Scan • Geospatial • Map/Reduce • Social data 32
  33. 33. Is it for you? • Technology is still developing • Be ready to patch • SQL is easier • Not all startups will end up being Facebooks • Some things can be solvable only with NoSQL 33
  34. 34. TRENDS 34
  35. 35. Trends • Oracle released Oracle NoSQL! • Adoption of Hadoop soars • SQL like access to NoSQL stores taking form – UnSQL - http://www.unqlspec.org/display/UnQL/Home • You can participate! 35
  36. 36. Opportunities • Spring Data - http://www.springsource.org/spring-data • Cloud Foundry PaaS - http://www.cloudfoundry.com/ • ORM/Simplification 36
  37. 37. Q/A 37

×