Big data stores

Introduction to Big Data stores:
Key Value stores:
Cassandra:
• First developed at Facebook (powered the Inbox Search)
• Uses decentralized clustered nodes
• Considered one of the most scalable NoSQL systems
• Very high availability – no single point of failure
• Flexible data storage (structured/un-structured)
• Relatively easy to configure
• Designed for high transaction rates
• Java based – Available under the latest Apache license

Key Value NOSQL Databases
DynamoDB:
• Amazon DynamoDB stores data on Solid State Drives (SSDs)
• DynamoDB implements cryptographic methods to authenticate
users and prevent unauthorized data access.
• Stronger consistency on read tracked by atomic counters enables
latest values.
• Reduces the over-head of scaling and replication from developers.
• Synchronous replication across multiple AWS Availability Zones in
an Single Region.
• DynamoDB with other AWS features like AWS-EMR, AWS-Data
Pipeline can perform complex analytics and data movement
respectively.

Riak:
• Riak adopts Mater-less peer-peer architecture
• Written in Erlang & C, some JavaScript.
• Distributes data and performs replication across nodes with consistent
hashing.
• Riak uses HTTP/REST or custom binary to communicate data with
Cluster/Nodes.
• Riak has two modes of operation (ie) fullsync (Synchronization occurs
every 6 hours) and real-time. (requires synchronization trigger)
• When new nodes are added to cluster, data is rebalanced across nodes
with no downtime.
• Used by 25% of fortune 50 companies. AT&T, AOL, Ask.com, Best Buy,
Boeing and Comcast.

Redis:
• Redis adopts Master-Slave architecture
• Slaves are allowed communicate with each other.
• Redis is written in ANSI C and is best suited for rapidly changing
data, with predictable size. Ex) Stock-Analysis
• By default, latency monitoring is disabled and user can enable by
setting a threshold value to variable "latency threshold"
• Redis is designed to be accessed by trusted-users within trusted
environment.
• Performs Hash or Range partitioning(Mapping range of object to
specific Redis instance)

CouchDB:
• Written in Erlang.
• Instead of locks, CouchDB uses Multi-Version Concurrency Control
(MVCC) to manage concurrent access to the database.
• CouchDB achieves eventual consistency between multiple
databases by using incremental replication.
• Validates documents using Java Script functions and approve/deny
the document update.
• CouchDB supports both pull replication(node acts as target)and
push replication(node acts as source).
• CouchDB is best suited for data that changes occasionally.

Azure Table Storage:
• Maximum data size is 200 TB per table.
• Azure Table retrieves a maximum of 1000 rows per table.
• Azure Table Storage provides ACID transaction that guarantees CRDU
operations for a single entity in a table.
• Storage access architecture of Azure Table Storage has three-layered structure
Front-End (FE) layer - Authenticates and authorizes the request.
Partition Layer - partitions the object data and performs load-balancing.
Distributed and replicated File System (DFS) Layer - Distributes and
Replicates data across many clusters.
• Azure Table Storage does not provide a way to represent relationships between
data.
• To provide fault tolerance the stored data is replicated three times within the
region, and replicated an additional 3 times in another region.

BerkeleyDB:
• Berkeley DB is a embedded database engine and is suitable for storing
key/value data.
• Key and data items are stored in simple structures called DBT (DBT is an
acronym for database thang) that contains reference to memory and length.
• Berkeley DB supports concurrency in threads even in database with size.
• Program accessing Berkeley DB determines how data is to stored in records.
• Berkeley DB has three different products:
o Berkeley DB - contains database implementations and is written in C
o Berkeley DB Java Edition - Log structured storage architecture and
coded in Pure Java.
o Berkeley DB XML - specializes in the storage of XML documents

Column-Family NOSQL databases:
HBase:
• First developed at Powerset (to power natural language
search)
• Distributed column oriented database on top of
Hadoop/HDFS.
• Continuous access to data - Multiple master nodes.
• Linear and modular scalability.
• Provides interactive commands for manipulating database
• Single row atomic operations and row level exclusive locks.
• Multiple clients like its native Java library, Thrift, and REST

BigTable:
• First developed at Google(Structured data ).
• Sparse, distributed, persistent multidimensional sorted
map.
• Self Managing ( Servers can be added/removed
dynamically. Servers adjust to load imbalance).
• Fault tolerant & Persistent.
• Designed to scale into the petabyte range.
• Tables are optimized for GFS (Google File System) by being
split into multiple tablets.

HyperTable:
• Developed as an in-house software at Zvents.
• Manages massive spare tables with timestamped cell
versions.
• Maximum efficiency (Less hardware, power, datacenter).
• Good fit for wide range of applications.
• Clean semantics.
• High performance.

Graph NOSQL databases:
Neo4j:
• Developed by Neo Technology
• Highly scalable, robust.
• Graph structures with nodes, edges and properties to
store data.
• Provides index-free adjacency
• Neo4j is schema free – Data does not have to adhere to
any convention
• ACID – atomic, consistent, isolated and durable for logical
units of work
• Easy to get started and use.
• Support for wide variety of languages (Java, Python, Perl,
Scala, Cypher, etc)

Document NOSQL databases:
MongoDB:
• Developed by the software company 10gen as service
product later shifted to open source.
• Document Oriented Database.
• Implemented in C++ for best performance. (built for
speed).
• Super low latency access to your data (Very little CPU
overhead).
• Auto Sharding for easy scalability.
• Map/Reduce for Aggregation.
• Full index support for high performace.
• Language drivers for (Ruby/Ruby on rails, Java, C#,
JavaScript, Python, Perl, Erlang etc).

Big data stores

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Big data stores

Similaire à Big data stores (20)

Dernier

Dernier (20)

Big data stores

Notes de l'éditeur