Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×
Chargement dans…3

Consultez-les par la suite

1 sur 20 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (15)


Similaire à Sql vs nosql (20)

Plus récents (20)


Sql vs nosql

  1. 1. SQL vs. NoSQL Making the right choice 18 September 2014
  2. 2. The contenders SQL NoSQL © Copyright Dimension Data 18 September 2014 2
  3. 3. SQL Databases • RDBMS • Standardized • Mature • Reliable • Well understood • Queryable • ACID © Copyright Dimension Data 18 September 2014 3
  4. 4. NoSQL scalability argument • Scale-Up vs Scale-Out • Use of commodity hardware • Locking / Latching • Consistency over partitions • Availability of partitions • Referential integrity Cost of scaling SQL NoSQL © Copyright Dimension Data 18 September 2014 4
  5. 5. Other RDBMS / SQL Database drawbacks • One-solution-fits-all • Slow for certain tasks • ACID is not always needed • ORM required • Lack of flexibility • Rigid schema • Management complexity • Add-on solutions • XML-fields, Filestreams • Full-text indexes © Copyright Dimension Data 18 September 2014 5
  6. 6. CAP theorem (Brewer's theorem) © Copyright Dimension Data 18 September 2014 6
  7. 7. NoSQL Use Cases • Bigness / Avoid hitting the wall • Massive write performance • Write availability • Fast key-value access • Flexible schema and flexible datatypes • Schema migration • No single point of failure • Generally available parallel computing • Easier maintainability, administration and operations • Programmer ease of use • Use the right data model for the right problem • Tunable CAP tradeoffs © Copyright Dimension Data 18 September 2014 7
  8. 8. ACID Transactions Atomicity Consistancy Isolation Durability © Copyright Dimension Data 18 September 2014 8
  9. 9. NoSQL ACID Trade-offs • Dropping Atomicity lets you shorten the time tables (sets of data) are locked. MongoDB, CouchDB. • Dropping Consistency lets you scale up writes across cluster nodes. Riak, Cassandra. • Dropping Durability lets you respond to write commands without flushing to disk. Memcache, Redis. © Copyright Dimension Data 18 September 2014 9
  10. 10. NoSQL Database Main Types • Key-Value Store • A basic dictionary design storing values under unique keys • The database does not care about the structure of the value • Examples: • Memcache • Riak • Azure Blob Storage • Good at: • Handles size well • Processing a constant stream of small reads and writes • Fast • Programmer friendly © Copyright Dimension Data 18 September 2014 10
  11. 11. NoSQL Database Main Types • Column Store • A column is a tuple of 3 elements: unique name of value, a typed value, timestamp • Columns may be part of column families • Columns need not appear in every record • Example: • Hbase • Hypertable • Cassandra • Azure Table Storage • Good at: • Handles size well • Stream massive write loads • High availability • Multiple-data centers • MapReduce. © Copyright Dimension Data 18 September 2014 11
  12. 12. NoSQL Database Main Types • Document Store • Use a unique key to store and retrieve a JSON document • Documents are schemaless • Metadata is added to the document to aid querying • Indexing of documents and metadata speeds up retrieval • Example: • CouchDB • MongoDB • RavenDB • Azure DocumentDB service (Preview) • Good at: • Natural data modeling • Programmer friendly • Rapid development • Web friendly • CRUD © Copyright Dimension Data 18 September 2014 12
  13. 13. NoSQL Database Main Types • Graph Database • Uses graph structures with nodes, edges, and properties to represent and store data • Every element contains a direct pointer to its adjacent elements • Example: • AllegroGraph • InfoGrid • Neo4j • Good at: • Complicated graph problems • Topographical data • Fast © Copyright Dimension Data 18 September 2014 13
  14. 14. NoSql Database Type Comparison Data Model Performance Scalability Flexibility Complexity Functionality Key–Value Store high high high none variable (none) Column- Oriented Store high high moderate low minimal Document- Oriented Store high variable (high) high low variable (low) Graph Database variable variable high high graph theory Relational Database variable variable low moderate relational algebra © Copyright Dimension Data 18 September 2014 14
  15. 15. Things to consider when choosing • Where are you starting from? • What are you trying to accomplish? • Things to Consider... • Your Problem • Access pattern, scalability, consistency, durability • Money • Scaling, admins, license, operating cost • Programming • Flexible schema, JSON, REST, language, graphs • Performance • Reads, writes, consistency, workload, eventual consistency • Features • Cross datacenter, upgrades, indexes, persistence, tunability • The vendor • Viability, future direction, responsiveness, partnerships © Copyright Dimension Data 18 September 2014 15
  16. 16. Big Data – Petabyte range Microsoft HDInsight = Hadoop as a service on Azure (+ .NET) © Copyright Dimension Data 18 September 2014 16
  17. 17. Hadoop components © Copyright Dimension Data 18 September 2014 17
  18. 18. Using Hadoop © Copyright Dimension Data 18 September 2014 18
  19. 19. Hadoop cluster size Yahoo! wins with a massive 42000 node cluster © Copyright Dimension Data 18 September 2014 19
  20. 20. Questions USE [Euricom] SELECT [Question] FROM [dbo].[FAQ] WHERE [Answer] IS NULL (0 row(s) affected) © Copyright Dimension Data 18 September 2014 20

Notes de l'éditeur

  • Consistency (all nodes see the same data at the same time)
    Availability (a guarantee that every request receives a response about whether it was successful or failed)
    Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
  • Atomicity requires that each transaction is "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. To the outside world, a committed transaction appears (by its effects on the database) to be indivisible ("atomic"), and an aborted transaction does not happen.

    The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code) but merely that any programming errors do not violate any defined rules.

    The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e. one after the other. Providing isolation is the main goal of concurrency control. Depending on concurrency control method, the effects of an incomplete transaction might not even be visible to another transaction.

    Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory.
  • Apache Hadoop is a framework that allows for the distributed processing of such large data sets across clusters of machines.

    Apache Hadoop, at its core, consists of 2 sub-projects ? Hadoop MapReduce and Hadoop Distributed File System. Hadoop MapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. HDFS is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. Other Hadoop-related projects at Apache include Chukwa, Hive, HBase, Mahout, Sqoop and ZooKeeper.

    HDFS - Filesystems that manage the storage across a network of machines are called distributed filesystems. HDFS is designed for storing very large files with write-once-ready-many-times patterns, running on clusters of commodity hardware.

    MapReduce - MapReduce is a framework for processing highly distributable problems across huge datasets using a large number of computers (nodes), collectively referred to as a cluster. The framework is inspired by the map and reduce functions commonly used in functional programming.

    Chukwa - Chukwa is a Hadoop subproject devoted to large-scale log collection and analysis. Chukwa is built on top of HDFS and MapReduce framework and inherits Hadoop’s scalability and robustness.

    Hive - Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis. HiveServer provides a Thrift interface and a JDBC / ODBC server.

    HBase - HBase is the Hadoop application to use when you require real-time read/write random-access to very large datasets. It is a distributed column-oriented database built on top of HDFS.

    Mahout - Mahout is an open source machine learning library from Apache. It’s highly scalable. Mahout aims to be the machine learning tool of choice when the collection of data to be processed is very large, perhaps far too large for a single machine.

    Sqoop/Flume - Sqoop allows easy import and export of data from structured data stores such as relational databases, enterprise data warehouses, and NoSQL systems. The dataset being transferred is sliced up into different partitions and a map-only job is launched with individual mappers responsible for transferring a slice of this dataset.

    ZooKeeper - ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming.