Introduction to NoSQL

Agenda
 Overview of NoSQL
 Why NoSQL?
 NoSQL Market Overview
 Categories of NoSQL databases
 Hadoop – Overview

Overview of NoSQL
A term which stands for

Overview of NoSQL (Contd…)
 NoSQL doesn’t mean to stop using SQL or SQL won’t be used.
 The term refers to those databases that differ from relational databases.
 Simply Non-relational databases.
 NoSQL is a non-relational database management systems, different from
traditional relational database management systems in some significant ways.
 It is designed for distributed data stores where very large scale of data storing
needs (for example Google or Facebook which collects terabits of data every
day for their users). These type of data storing may not require fixed schema,
avoid join operations and typically scale horizontally.

NoSQL databases are eventually consistent / CAP (not ACID).
CAP theorem:
 Consistency - This means that the data in the database remains consistent
after the execution of an operation. For example after an update operation all
clients see the same data.
 Availability - This means that the system is always on (service guarantee
availability), no downtime.
Node failures do not prevent survivors from continuing to operate
 Partition Tolerance - This means that the system continues to function even
the communication among the servers is unreliable, i.e. the servers may be
partitioned into multiple groups that cannot communicate with one another.

NoSQL Features:
1. Scalability
To maintain performance.
 Horizontal Scalability:
To increase the number of machines but maintaining proportional
performance.
Vertical scalability:
To add more resources to your single machine to optimize
performance
2. Open Source
Most of the NoSQL Projects are Open source. So any one can use, modify
it, like
 Cassandra by facebook.
 Bigtable by Google but only allowed for Google application.

3. Schema Freeness
 NoSQL databases doesn’t use any fixed schema like relational database.
 Internal schema
 External schema etc
 The original intention of NoSQL is the modern web-scale databases.
There are large number of companies using NoSQL. To name a few :
• Google
• Facebook
• Mozilla
• Adobe
• Foursquare
• LinkedIn
• Digg
• McGraw-Hill Education

WHY NOSQL?
Benefits of NOSQL:
1. Scaling
RDBs weren’t easy to scale out.
On the other hand NoSQL DBs are specially designed to scale out.
2. Big data
Single RDBMS is almost unable to handle today’s huge amount of data and
the transaction on that data.
But
Non-Relational databases are specially designed to handle big data.
Data is becoming easier to capture and access through third parties such as
Facebook, D&B, and others. Personal user information, geo location data,
social graphs, user-generated content, machine logging data, and sensor-
generated data are just a few examples of the ever-expanding array of data
being captured.
3. Needs no Expert DBAs
Although RDMS vendors claim that RDBMS provide management facilities
but it still need an expert DBA to operate it.
In contrast NoSQL DBs don’t need expert DBAs, as it provides automatic
repair, data distribution, and simpler data models, which lead to lower
administration.

WHY NOSQL? (CONTD…)
4. Economics
RDBMS requires expensive components for providing efficient service.
NoSQL uses cheap commodity servers to manage the same amount of
data for which RDBMS needs expensive server. So NoSQL is economical
as well.
5. Flexibility of data models
There can occur changes in the requirements of an organization with the
passage of time. Changes in RDBMS after its deployment creates
many problems and also affects its services or some time it’s even almost
impossible to make changes. NoSQL database can be changed at
any instance, i.e. existing columns can be altered and new can be added.

Scale up with relational technology: limitations at the database tier
Source: http://www.couchbase.com/why-nosql/nosql-database

Source: http://www.couchbase.com/why-nosql/nosql-database
Scale out with NoSQL technology at the database tier

NOSQL MARKET OVERVIEW
Source: Wikibon 2013 (http://wikibon.org/wiki/v/Hadoop-
NoSQL_Software_and_Services_Market_Forecast_2012-2017)
Hadoop/NoSQL Software and Services Marketshare, 2012

NOSQL MARKET OVERVIEW (CONTD…)
Hadoop/NoSQL Software and Services Market Forecast, 2012-2017
Source: Wikibon 2013 (http://wikibon.org/wiki/v/Hadoop-
NoSQL_Software_and_Services_Market_Forecast_2012-2017)

CATEGORIES OF NOSQL DATABASES
There is a variety of types:
• Column Store – Each storage block contains data from only one column
• Document Store – stores documents made up of tagged elements
• Key-Value Store – Hash table of keys
1. Column Store
• Each storage block contains data from only one column
• Example: Hadoop/Hbase
 http://hadoop.apache.org/
 Clients : Yahoo, Facebook
• Example: Ingres VectorWise
 Column Store integrated with an SQL database
• More efficient than row (or document) store if:
 Multiple row/record/documents are inserted at the same time so updates of
column blocks can be aggregated
 Retrievals access only some of the columns in a row/record/document

CATEGORIES OF NOSQL DATABASES (CONTD…)
2. Document Store:
• It stores documents made up of tagged elements.
• Example: CouchDB
 http://couchdb.apache.org/
 Clients - BBC
• Example: MongoDB
 http://www.mongodb.org/
 Clients - Foursquare, Shutterfly

CATEGORIES OF NOSQL DATABASES (CONTD…)
3. Key-Value Store:
• Hash table of keys
• Values stored with Keys
• Fast access to small data values
• Example – Project-Voldemort
 http://www.project-voldemort.com/
 Clients : Linkedin
• Example – MemCacheDB
 http://memcachedb.org/

HADOOP - OVERVIEW
 The Apache Hadoop software library is a framework that allows for the distributed
processing of large data sets across clusters of computers using simple
programming models.
 It is designed to scale up from single servers to thousands of machines, each
offering local computation and storage.
 Rather than rely on hardware to deliver high-availability, the library itself is designed
to detect and handle failures at the application layer, so delivering a highly-available
service on top of a cluster of computers, each of which may be prone to failures.
The Apache Hadoop framework is composed of the following modules :
 Hadoop Common - contains libraries and utilities needed by other Hadoop modules
 Hadoop Distributed File System (HDFS) - a distributed file-system that stores data
on the commodity machines, providing very high aggregate bandwidth across the
cluster.
 Hadoop YARN - a resource-management platform responsible for managing
compute resources in clusters and using them for scheduling of users' applications.
 Hadoop MapReduce - a programming model for large scale data processing.

Introduction to NoSQL

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (11)

Similaire à Introduction to NoSQL

Similaire à Introduction to NoSQL (20)

Dernier

Dernier (20)

Introduction to NoSQL