The document discusses factors to consider when selecting a NoSQL database management system (DBMS). It provides an overview of different NoSQL database types, including document databases, key-value databases, column databases, and graph databases. For each type, popular open-source options are described, such as MongoDB for document databases, Redis for key-value, Cassandra for columnar, and Neo4j for graph databases. The document emphasizes choosing a NoSQL solution based on application needs and recommends commercial support for production systems.
2. Topics
Why choose NoSQL database
Overview
Brief on different type of NoSQL’s
3. Why choose NoSQL database
To improve programmer productivity by using a database that better matches
an application's needs.
To improve data access performance via some combination of handling larger
data volumes, reducing latency, and improving throughput.
Since most of the NoSQL databases are open source, testing them is a simple
matter of downloading these products and setting up a test environment.
Separating parts of applications into services also allows you to introduce
NoSQL into an existing application.
4. Overview
NoSQL means that when designing a software solution there are more than
one storage mechanism that could be used based on the needs.
Due to increasing needs for scalability and performance, alternative systems
have emerged, namely NoSQL technology.
There are hundreds of readily available NoSQL databases, and each have
different use case scenarios.
If we categories the NoSQL then we can divide into four main NoSQL
categories
Document Database
Key-value Database
Column Based Database
Graph Database
5. Overview
Before going down the NoSQL path, it's a good recheck whether your existing
DBMS software can be used for the current requirement.
Using NoSQL databases allows developers to develop without having to
convert in-memory structures to relational structures.
NoSQL does not have a prescriptive definition but we can make a set of
common observations, such as:
Not using the relational model
Running well on clusters
Mostly open-source
Built for the 21st century web estates
Schema-less
7. Document Database
The document store DBMS stores data at the document level using a markup
language such as JavaScript Object Notation (JSON) or XML.
The document data model makes it easy for developers to store and combine
data of any structure, without giving up data access and indexing
functionality.
Database administrators (DBAs) can dynamically modify the schema without
downtime.
Document databases work well for event logging, online shopping, content
management and in-depth analytical processing.
The schema flexibility of document databases can also be useful for projects
which required rapid prototyping.
8. Document Database
One of the leading NoSQL DBMS’s is MongoDB, an open source document store
DBMS.
It's designed to make it easy to develop and run modern applications that rely
on structured and unstructured data while delivering scalability and high
availability, and supporting rapidly changing data.
There are probably more technicians familiar with it than any other NoSQL
DBMS, making it somewhat easier to staff MongoDB projects.
MongoDB stores data as documents in a binary JSON representation called
Binary JSON (BSON).
MongoDB is specifically designed for rapidly building applications that scale
globally and are inexpensive to operate.
9. Document Database
Another option is Couchbase Server, a JSON-based document store derived
from Couch DB, which is an Apache open source project.
Couchbase Server delivers eventual consistency for transactions, as opposed
to ACID (atomicity, consistency, isolation, and durability).
Many NoSQL offerings rely on command line interface (CLI) administration,
but Couchbase Server administration tasks can be performed using the Web,
CLI or RESTful API.
Another option is MarkLogic Server, it can handle JSON, XML and resource
description framework (RDF) data natively, and offers critical enterprise
features such as ACID transactions, automated failover and security.
11. Key-Value Database
The key-value approach is somewhat similar to the document approach. Both
offer flexible schemata, but the data in a key-value store isn't structured
using a markup language like JSON.
Key-value databases excel at session management, serving ad content and
managing user or product profiles. When data is encoded in many different
ways without a rigorous schema, using a key-value database can make sense.
One of the leading key-value DBMS’s is Redis, an open source, BSD-
licensed, key-value data store.
Redis is a key-value store, but it also supports different kinds of data
structures. Whereas with traditional key-value stores you associate string keys
to string values, in Redis the value isn't limited to a simple string but can also
hold more complex data structures.
12. Key-Value Database
Another NoSQL key-value DBMS option is Riak from Basho Technologies.
Riak is a fault-tolerant, highly available, scalable, distributed multi-model
DBMS.
Riak open source is free under the Apache 2 license whereas Riak Enterprise
requires a commercial license agreement, sold by Basho Technologies.
Riak is more accurately termed a multi-model platform, supporting key-value,
object store and search capabilities all from the same platform.
Riak is an open source, distributed DBMS that's implemented across multiple
servers, It provides features like any server can respond to read or write
requests. If one server fails, other servers will continue to act upon client
requests.
14. Column Database
A column store NoSQL DBMS allows you to store data with keys mapped to
values and the values grouped into families that are often accessed together.
A column database is well-suited for data where writes are uncommon and
applications need to access a few columns of many rows all at once.
Column stores work well for event logging, content management and
counting/categorizing for analytics.
Column stores are also useful when you have expiring data because you can
set up a column to automatically expire.
Apache Cassandra is one of the top NoSQL column family DBMS’s, it's an open
source DBMS, originally developed at Facebook and later released as an open
source project, and is therefore freely available to download and use.
15. Column Database
Apache Cassandra is designed to be used by online applications that require
fast performance with no downtime, It was engineered to handle very large
amounts of data spread out across commodity servers to deliver high
availability without a single point of failure.
DataStax Enterprise, a commercial vendor, has created an enterprise-level
version of Cassandra with support called DataStax Enterprise.
DataStax Enterprise is free to use in development environments; use in
production requires the purchase of a license (or enrollment in the startup
program).
DataStax offers subscriptions for both production and non-production
environments that include certified software and support.
16. Column Database
Apache HBase is another leading open source NoSQL column store.
Designed to deliver random, real-time, read/write access to large amounts of
data using commodity hardware, HBase is modeled after Google's Big table
storage system.
It's built on top of Hadoop and Hadoop Distributed File System (HDFS).
Although Hadoop and HBase are open source projects there are commercial
providers such as Cloudera, which offers Cloudera Enterprise.
Apache Hadoop and other open source projects into a single, highly scalable
system for analytical processing. Of course, Cloudera isn't the only
commercial provider; for example, Hortonworks and MapR Technologies are
other leading providers of Hadoop distributions that include HBase.
17. Graph Database
The graph database NoSQL category focuses on relationships between values
and stores data using graph structures with nodes, edges and properties.
In a graph database every element contains a direct pointer to its adjacent
element and no index lookups are necessary.
It is used in social media (relationship management), search, network and IT
operations, fraud detection, real-time recommendations, digital asset
management and master data management , essentially any application that
benefits from harnessing the power of data relationships using graphs.
The leading graph database is Neo4j. Neo4j is a native graph
database system, where things are stored as nodes and relationships between
things building the structure of the database.
19. Graph Database
Graph databases allow you to store entities and relationships between these
entities. Entities are also known as nodes, which have properties.
Nodes can have different types of relationships between them, allowing you
to both represent relationships between the domain entities and to have
secondary relationships for things like category, path, time-trees, quad-trees
for spatial indexing, or linked lists for sorted access.
Since most of the power from the graph databases comes from the
relationships and their properties, a lot of thought and design work is needed
to model the relationships in the domain that we are trying to work with.
Relationships are first-class citizens in graph databases; most of the value of
graph databases is derived from the relationships.
20. Graph Database
There are many graph databases available, such as Neo4J, Infinite
Graph, OrientDB, or FlockDB (which is a special case: a graph database that
only supports single-depth relationships or adjacency lists, where you cannot
traverse more than one level deep for relationships).
Neo4j offers ACID transactions, high-availability clustering for enterprise
deployments, and comes with a Web-based administration tool.
Neo4j isn't new technology; the company has been in business for more than a
decade.
Titan, which is optimized for storing and querying graphs represented over a
cluster of machines.
21. Graph Database
Titan has a pluggable storage architecture that allows it to build on proven
database technology such as Apache Cassandra, Apache HBase or Oracle
Berkeley DB.
Choosing a multi-model approach can make sense for applications needing
several different NoSQL approaches (such as key/value for some data and
graph for others).
Most NoSQL DBMS offerings are open source and can be licensed for free
under an open source license or via a commercial license from a vendor that
offers support and upgrades.
The commercial option is recommended for organizations intending to use
NoSQL databases in production applications and systems.
22. The multi-model DBMS
Another choice in the NoSQL market is the multi-model DBMS. A growing
number of vendors have delivered DBMS products that support more than one
(or all) of the NoSQL models (some cases, relational, too). Examples of multi-
model NoSQL vendors include DataStrax Enterprises, Foundation DB, Cortex
DB and Orient DB.
Your existing relational DBMS may also be an option. The relational vendors
are working to expand their DBMS’s to embrace NoSQL, and some have
already started to introduce NoSQL capabilities.
One example is IBM DB2. The DB2 for Linux, Unix and Windows with a column
store capability, albeit a relational column store and it has the ability to store
RDF graph triples and JSON documents, which may obviate the need for DB2
users to acquire a graph or document database.