3. • Flat Files
• Hierarchical Databases
• Object Oriented Databases
• Relational Databases
have been in use to store &
retrieve data for ages
3
4. Problems associated with RDBMS
• Unable to address large volumes of
data.
• Unable to handle agile sprints, quick
iteration, and frequent code push
• Expensive, monolithic architecture
4
6. • The machines in these
large clusters are
individually unreliable
• But the overall cluster
keeps working even as
machines die - so the
overall cluster is reliable.
• The “cloud” is exactly this
kind of cluster, which
means relational
databases don’t play well
with the cloud.
6
8. • Web services provide an alternative to shared
databases for application integration
• They make it easier for different applications to
choose their own data storage, avoiding relational
databases.
• Google → Bigtable
• Amazon → Dynamo
8
11. What is NoSQL?
• NoSQL database is the first alternative to relational
databases, with scalability, availability, and fault
tolerance being key deciding factors.
• It goes well beyond the more widely understood
relational databases in satisfying the needs of
today’s business applications.
– Oracle
– SQL Server
11
12. Why NoSQL?
• Big Users
• Big Data
• The Internet of Things
• Cloud computing
12
14. Big Users
• NoSQL offers the
dynamic scalability and
level of scale they need
while maintaining the
performance users
demand.
14
15. Big Data
• NoSQL provides a much
more flexible, schema-
less data model that
better maps to an
application’s data
organization
• It simplifies the
interaction between the
application and the
database, resulting in less
code to write, debug, and
maintain.
15
16. The Internet of Things
• NoSQL can
– scale concurrent data
access to millions of
connected devices and
systems
– store billions of data
points
– meet the performance
requirements of mission-
critical infrastructure and
operations
16
17. Cloud Computing
• NoSQL databases are
built from the ground
up to be distributed,
scale-out technologies
• It gives a better fit with
the highly distributed
nature of the three-tier
internet architecture.
17
18. Reasons to choose NoSQL databases for
future development work
• To improve programmer productivity
by using a database which matches an
application's needs better.
• To improve data access performance
via some combination of
– handling larger data volumes,
– reducing latency,
– improving throughput.
18
19. Prominent NoSQL database users
• Google
• Facebook
• Mozilla
• Adobe
• Foursquare
• LinkedIn
• Digg
• McGraw-Hill Education
• Vermont Public Radio
19
22. Common Characteristics
• Not a relational data model
– No SQL queries
• Tends to be designed to run on clusters of
multiple nodes
• Tends to be Open Source
• No fixed schema, allowing you to store any
data in any record
• Designed for data sets of web scale
• Follows CAP theorem
22
23. Scale-up Database Tier with RDBMS
• To support more concurrent
users and store more data,
relational databases require
a bigger and more
expensive server with more
CPUs, memory, and disk
storage.
• At some point, the capacity
of even the biggest server
can be outstripped and the
relational database cannot
scale further!
23
24. Scale-out Database Tier with NoSQL
• NoSQL databases provide
an easier, linear, and cost
effective approach to
database scaling.
• As the number of
concurrent users grows,
simply add additional low-
cost, commodity servers to
your cluster.
• There’s no need to modify
the application, since the
application always sees a
single (distributed)
database.
24
25. Performing Queries???
• RESTful interfaces (HTTP as an access API)
• Query languages other than SQL
– GQL - SQL-like QL for Google BigTable
– SPARQL - Query language for the Semantic Web
– Gremlin - the graph traversal language
– Sones Graph Query Language
• Query APIs
– The Google BigTable DataStore API
– The Neo4j Traversal API
25
28. • Because of the variety
of approaches and
overlaps it is difficult to
maintain an overview of
non-relational
databases.
• A basic classification is
based on data model.
28
30. Key-Value databases
• Simplest NoSQL data store
• Handles large amounts of data.
• Based on Amazon’s Dynamo paper.
• Key value stores allow developer to
store schema-less data, as hash table
where each key is unique and the
value can be string, JSON, BLOB
(basic large object) etc.
• A key may be strings, hashes, lists,
sets, sorted sets and values are
stored against these keys.
• Key-Value stores can be used as
collections, dictionaries, associative
arrays etc.
30
31. • Examples for Key-value store Databases:
– Riak
– Redis
– Memcached
– Berkeley DB
– HamsterDB (especially suited for embedded use)
– Amazon DynamoDB (not open-source)
– Project Voldemort
– Couchbase.
31
32. Document databases
• A collection of documents
• Data in this model is stored inside
documents.
• A document is a key value collection
where the key allows access to its
value.
• Documents are not typically forced to
have a schema and therefore are
flexible and easy to change.
• Documents are stored into
collections in order to group different
kinds of data.
• Documents can contain many
different key-value pairs, or key-array
pairs, or even nested documents.
32
34. Column family stores
• Column-oriented databases primarily
work on columns and every column is
treated individually.
• Stores data in column specific files
and query processors work on
columns too.
• All data within each column data file
have the same type which makes it
ideal for compression.
• Column stores can improve the
performance of queries as it can
access specific column data.
34
36. Graph databases
• A graph database stores data in a graph.
• It is capable of elegantly representing
any kind of data in a highly accessible
way.
• Each node represents an entity (such as
a student or business) and each edge
represents a connection or relationship
between two nodes.
• Every node and edge is defined by a
unique identifier.
• Each node knows its adjacent nodes.
• As the number of nodes increases, the
cost of a local step (or hop) remains the
same.
• Index for lookups.
36
38. Performance
Data Model Performance Scalability Flexibility Complexity Functionality
Key-Value
High High High None
Variable
(none)
Column
Oriented
High High Moderate Low Minimal
Document
Oriented
High
Variable
(High)
High Low
Variable
(low)
Graph
Variable Variable High High
Graph
Theory
Relational
Variable Variable Low Moderate
Relational
Algebra
38
39. How to select your NoSQL database?
39
Key-value databases
• For storing session
information, user profiles,
preferences, shopping cart
data.
• Avoid when you need to
query data or to operate
on multiple keys at the
same time.
Document databases
• For content management
systems, blogging
platforms, web analytics,
real-time analytics, and e-
commerce-applications.
• Avoid systems that need
complex transactions
spanning multiple
operations or queries
against varying aggregate
structures.
Column family
databases
• For content management
systems, blogging
platforms, maintaining
counters, expiring usage,
heavy write volume such
as log aggregation.
• Avoid systems that are in
early development,
changing query patterns.
Graph databases
• For connected data
networks like social
networks, spatial data,
routing information for
goods and money,
recommendation engines
41. There are now more than 50 vendors in
NoSQL DB software and services space!!!
41
42. Even the most popular RDBMS vendors are
pragmatic about the future of databases!!!
42
• Berkeley DB (open-source)Oracle
• Hadoop
• MongoDBIBM
• NoSQL solutions on its
Windows Azure cloud-based
storage solution
Microsoft
43. Job Market
• There is a huge
opportunity for those
with an expertise in
NoSQL databases
43
44. The percentage of job market for MySQL has been
more or less flat, while for Mongo
the job market has been increasing
exponentially...
44
45. 45
This is sure to amplify, as the NoSQL
databases become more and more mature.
47. Summary
• Selecting the correct database for your goal is very
important.
• NoSQL offers better solutions in handling BIG DATA
• Most of them are also open-source.
• Often, organizations will begin with a small-scale trial
of a NoSQL database in their organization, which
makes it possible to develop an understanding of the
technology.
• When comparing with other NoSQL databases,
databases like Cassandra, Hbase & MongoDB are more
popular among enterprise developers because they
require little overhead and can be up and running
quickly for prototyping new kinds of apps or data
analysis.
47