Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

1. Lecture1_NOSQL_Introduction.pdf

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 29 Publicité

Plus De Contenu Connexe

Plus récents (20)

Publicité

1. Lecture1_NOSQL_Introduction.pdf

  1. 1. Introduction To NOSQL Databases 1 Lecture 1 Dr. Shaimaa Galal
  2. 2. Course Resources • Text book: Part I (Chapters 1-7) - copyright 2015 Next Generation Databases NoSQL, NewSQL, and Big Data • Lab tool: Apache Cassandra on https://www.datastax.com/ 2
  3. 3. Additional Resources • YouTube: NoSQL Database Tutorial – Full Course for Beginners https://www.coursera.org/learn/nosql-databases • Coursera : NOSQL Systems https://www.coursera.org/learn/nosql-databases 3
  4. 4. Background • Relational databases → mainstay of business • Web-based applications caused spikes • explosion of social media sites (Facebook, Twitter) with large data needs and various data types such as images, videos, and documents. • rise of cloud-based solutions such as Amazon S3 (simple storage solution) • Hooking RDBMS to web-based application becomes trouble due to big data. 4
  5. 5. Current day trends • Big Data • Cloud Computing • Solid State Disk 5 The solution is to scale up
  6. 6. Issues with scaling up • Limits to scaling up (or vertical scaling: make a “single” machine more powerful) → dataset is just too big! 6
  7. 7. Issues with scaling out • Scaling out (or horizontal scaling: adding more smaller/cheaper servers) is a better choice • Different horizontal scaling approaches (multi-node database): • Master/Slave. • Sharding (partitioning) 7
  8. 8. Scaling out RDBMS: Master/Slave • Master/Slave • All writes are written to the master • All reads performed against the replicated slave databases • Critical reads may be incorrect as writes may not have been propagated down • Large datasets can pose problems as master needs to duplicate data to slaves 8 Writes Reads
  9. 9. Multi-Master replication 9 • INSERT only, not UPDATES/DELETES • No JOINs, thereby reducing query time (this involves de- normalizing data)
  10. 10. Scaling out RDBMS: Sharding • Sharding (Partitioning) involves partitioning the data across multiple databases based on a key-attribute. • Scales well for both reads and writes • Not transparent, application needs to be partition-aware • Can no longer have relationships/joins across partitions (loss of referential integrity across shards). 10
  11. 11. Scaling out RDBMS: Sharding 11
  12. 12. Other ways to scale out RDBMS • In-memory databases • Primarily relies on main memory for data storage. • Faster than disk-optimized databases because disk access is slower than memory access and the internal optimization algorithms are simpler and execute fewer CPU instructions. • Accessing data in memory eliminates querying seek time. 12
  13. 13. Other ways to scale out RDBMS • NewSQL • Is a database that retain main key characteristics of RDBMS but different from the common architecture exhibited by Oracle and SQL. • There are two designs: 1. H-Store: pure distributed InMemory database. 2. C-Store: Involves columnar database design 13
  14. 14. 14 Three waves of database technology
  15. 15. What is NOSQL? • The Name: • Stands for Not Only SQL • The term NOSQL was introduced by Carl Strozzi in 1998 to name his file-based database • It was again re-introduced by Eric Evans when an event was organized to discuss open source distributed databases • Eric states that “… but the whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for. …” 15
  16. 16. Who is using them? 16
  17. 17. What is NOSQL? • Key features (advantages): • Non-relational and doesn’t require schema. • Data are replicated to multiple nodes (fault-tolerant): • down nodes easily replaced • no single point of failure • Horizontal scalable • cheap, easy to implement • Massive write performance • Fast key-value access 17
  18. 18. 18
  19. 19. Recap ACID Properties 19
  20. 20. What is NOSQL? • Disadvantages: • Don’t fully support relational features • no join, group by, order by operations (except within partitions) • no referential integrity constraints across partitions • No declarative query language (e.g., SQL) → more programming • Relaxed ACID (see CAP theorem) → fewer guarantees • No easy integration with other applications that support SQL 20
  21. 21. Discussion Are NoSQL databases better than relational databases? 21
  22. 22. CAP Theorem 23 All client always have the same view of the data Consistency Partition tolerance Availability
  23. 23. CAP Theorem 24 Each client always can read and write. Consistency Partition tolerance Availability
  24. 24. CAP Theorem 25 A system can continue to operate in the presence of a network partitions Consistency Partition tolerance Availability
  25. 25. CAP Theorem (Consistency) 26 Write X row Read X row Is it the last value of X? Client A Client B Answer is: Maybe, however eventual consistency occurs.
  26. 26. CAP Theorem • Brewer’s CAP Theorem: • For any system sharing data, it is “impossible” to guarantee simultaneously all of these three properties • You can have at most two of these three properties for any shared-data system • Very large systems will “partition” at some point: • That leaves either C or A to choose from (traditional DBMS prefers C over A and P ) • In almost all cases, you would choose A over C (except in specific applications such as order processing) • Example: Apache Cassandra focus on AP, while MongoDB and Kafka focus on CP. 27
  27. 27. CAP Theorem summary • ACID • A DBMS is expected to support “ACID transactions,” processes that are: • Atomicity: either the whole process is done or none is • Consistency: only valid data are written • Isolation: one operation at a time • Durability: once committed, it stays that way • CAP • Consistency: all data on cluster has the same copies • Availability: cluster always accepts reads and writes • Partition tolerance: guaranteed properties are maintained even when network failures prevent some machines from communicating with others 28
  28. 28. NO-SQL categories 1. Key-value • Example: DynamoDB, Voldermort, Scalaris 2. Document-based • Example: MongoDB, CouchDB 3. Column-based • Example: BigTable, Cassandra, Hbased 4. Graph-based • Example: Neo4J, InfoGrid • “No-schema” is a common characteristics of most NOSQL storage systems • Provide “flexible” data types 29
  29. 29. Next Lecture Key-value Database 30

×