SlideShare une entreprise Scribd logo
1  sur  12
Introduction to Big Data stores:
Key Value stores:
Cassandra:
• First developed at Facebook (powered the Inbox Search)
• Uses decentralized clustered nodes
• Considered one of the most scalable NoSQL systems
• Very high availability – no single point of failure
• Flexible data storage (structured/un-structured)
• Relatively easy to configure
• Designed for high transaction rates
• Java based – Available under the latest Apache license
Key Value NOSQL Databases
DynamoDB:
• Amazon DynamoDB stores data on Solid State Drives (SSDs)
• DynamoDB implements cryptographic methods to authenticate
users and prevent unauthorized data access.
• Stronger consistency on read tracked by atomic counters enables
latest values.
• Reduces the over-head of scaling and replication from developers.
• Synchronous replication across multiple AWS Availability Zones in
an Single Region.
• DynamoDB with other AWS features like AWS-EMR, AWS-Data
Pipeline can perform complex analytics and data movement
respectively.
Key Value NOSQL Databases
Riak:
• Riak adopts Mater-less peer-peer architecture
• Written in Erlang & C, some JavaScript.
• Distributes data and performs replication across nodes with consistent
hashing.
• Riak uses HTTP/REST or custom binary to communicate data with
Cluster/Nodes.
• Riak has two modes of operation (ie) fullsync (Synchronization occurs
every 6 hours) and real-time. (requires synchronization trigger)
• When new nodes are added to cluster, data is rebalanced across nodes
with no downtime.
• Used by 25% of fortune 50 companies. AT&T, AOL, Ask.com, Best Buy,
Boeing and Comcast.
Key Value NOSQL Databases
Redis:
• Redis adopts Master-Slave architecture
• Slaves are allowed communicate with each other.
• Redis is written in ANSI C and is best suited for rapidly changing
data, with predictable size. Ex) Stock-Analysis
• By default, latency monitoring is disabled and user can enable by
setting a threshold value to variable "latency threshold"
• Redis is designed to be accessed by trusted-users within trusted
environment.
• Performs Hash or Range partitioning(Mapping range of object to
specific Redis instance)
Key Value NOSQL Databases
CouchDB:
• Written in Erlang.
• Instead of locks, CouchDB uses Multi-Version Concurrency Control
(MVCC) to manage concurrent access to the database.
• CouchDB achieves eventual consistency between multiple
databases by using incremental replication.
• Validates documents using Java Script functions and approve/deny
the document update.
• CouchDB supports both pull replication(node acts as target)and
push replication(node acts as source).
• CouchDB is best suited for data that changes occasionally.
Key Value NOSQL Databases
Azure Table Storage:
• Maximum data size is 200 TB per table.
• Azure Table retrieves a maximum of 1000 rows per table.
• Azure Table Storage provides ACID transaction that guarantees CRDU
operations for a single entity in a table.
• Storage access architecture of Azure Table Storage has three-layered structure
Front-End (FE) layer - Authenticates and authorizes the request.
Partition Layer - partitions the object data and performs load-balancing.
Distributed and replicated File System (DFS) Layer - Distributes and
Replicates data across many clusters.
• Azure Table Storage does not provide a way to represent relationships between
data.
• To provide fault tolerance the stored data is replicated three times within the
region, and replicated an additional 3 times in another region.
Key Value NOSQL Databases
BerkeleyDB:
• Berkeley DB is a embedded database engine and is suitable for storing
key/value data.
• Key and data items are stored in simple structures called DBT (DBT is an
acronym for database thang) that contains reference to memory and length.
• Berkeley DB supports concurrency in threads even in database with size.
• Program accessing Berkeley DB determines how data is to stored in records.
• Berkeley DB has three different products:
o Berkeley DB - contains database implementations and is written in C
o Berkeley DB Java Edition - Log structured storage architecture and
coded in Pure Java.
o Berkeley DB XML - specializes in the storage of XML documents
Column-Family NOSQL databases:
HBase:
• First developed at Powerset (to power natural language
search)
• Distributed column oriented database on top of
Hadoop/HDFS.
• Continuous access to data - Multiple master nodes.
• Linear and modular scalability.
• Provides interactive commands for manipulating database
• Single row atomic operations and row level exclusive locks.
• Multiple clients like its native Java library, Thrift, and REST
Column-Family NOSQL databases:
BigTable:
• First developed at Google(Structured data ).
• Sparse, distributed, persistent multidimensional sorted
map.
• Self Managing ( Servers can be added/removed
dynamically. Servers adjust to load imbalance).
• Fault tolerant & Persistent.
• Designed to scale into the petabyte range.
• Tables are optimized for GFS (Google File System) by being
split into multiple tablets.
Column-Family NOSQL databases:
HyperTable:
• Developed as an in-house software at Zvents.
• Manages massive spare tables with timestamped cell
versions.
• Maximum efficiency (Less hardware, power, datacenter).
• Good fit for wide range of applications.
• Clean semantics.
• High performance.
Graph NOSQL databases:
Neo4j:
• Developed by Neo Technology
• Highly scalable, robust.
• Graph structures with nodes, edges and properties to
store data.
• Provides index-free adjacency
• Neo4j is schema free – Data does not have to adhere to
any convention
• ACID – atomic, consistent, isolated and durable for logical
units of work
• Easy to get started and use.
• Support for wide variety of languages (Java, Python, Perl,
Scala, Cypher, etc)
Document NOSQL databases:
MongoDB:
• Developed by the software company 10gen as service
product later shifted to open source.
• Document Oriented Database.
• Implemented in C++ for best performance. (built for
speed).
• Super low latency access to your data (Very little CPU
overhead).
• Auto Sharding for easy scalability.
• Map/Reduce for Aggregation.
• Full index support for high performace.
• Language drivers for (Ruby/Ruby on rails, Java, C#,
JavaScript, Python, Perl, Erlang etc).

Contenu connexe

Tendances

Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataChen Robert
 
Hive big-data meetup
Hive big-data meetupHive big-data meetup
Hive big-data meetupRemus Rusanu
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرdatastack
 
Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2Ajay Kumar Uppal
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overviewElifTech
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sqlAnuja Gunale
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture PatternsMaynooth University
 
Bn 1016 demo postgre sql-online-training
Bn 1016 demo  postgre sql-online-trainingBn 1016 demo  postgre sql-online-training
Bn 1016 demo postgre sql-online-trainingconline training
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Benoit Perroud
 
Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsRegunath B
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceStu Hood
 

Tendances (20)

Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Cassandra Architecture FTW
Cassandra Architecture FTWCassandra Architecture FTW
Cassandra Architecture FTW
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Hive big-data meetup
Hive big-data meetupHive big-data meetup
Hive big-data meetup
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
 
Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overview
 
Cassandra
CassandraCassandra
Cassandra
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
Bn 1016 demo postgre sql-online-training
Bn 1016 demo  postgre sql-online-trainingBn 1016 demo  postgre sql-online-training
Bn 1016 demo postgre sql-online-training
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systems
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at Rackspace
 

Similaire à Big data stores

04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdfhothyfa
 
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptmy no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptwondimagegndesta
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...raghdooosh
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Managementsameerfaizan
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineDaniel N
 

Similaire à Big data stores (20)

BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptmy no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
 
No SQL
No SQLNo SQL
No SQL
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
From 0 to syncing
From 0 to syncingFrom 0 to syncing
From 0 to syncing
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
NoSql
NoSqlNoSql
NoSql
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 

Dernier

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Big data stores

  • 1. Introduction to Big Data stores: Key Value stores: Cassandra: • First developed at Facebook (powered the Inbox Search) • Uses decentralized clustered nodes • Considered one of the most scalable NoSQL systems • Very high availability – no single point of failure • Flexible data storage (structured/un-structured) • Relatively easy to configure • Designed for high transaction rates • Java based – Available under the latest Apache license
  • 2. Key Value NOSQL Databases DynamoDB: • Amazon DynamoDB stores data on Solid State Drives (SSDs) • DynamoDB implements cryptographic methods to authenticate users and prevent unauthorized data access. • Stronger consistency on read tracked by atomic counters enables latest values. • Reduces the over-head of scaling and replication from developers. • Synchronous replication across multiple AWS Availability Zones in an Single Region. • DynamoDB with other AWS features like AWS-EMR, AWS-Data Pipeline can perform complex analytics and data movement respectively.
  • 3. Key Value NOSQL Databases Riak: • Riak adopts Mater-less peer-peer architecture • Written in Erlang & C, some JavaScript. • Distributes data and performs replication across nodes with consistent hashing. • Riak uses HTTP/REST or custom binary to communicate data with Cluster/Nodes. • Riak has two modes of operation (ie) fullsync (Synchronization occurs every 6 hours) and real-time. (requires synchronization trigger) • When new nodes are added to cluster, data is rebalanced across nodes with no downtime. • Used by 25% of fortune 50 companies. AT&T, AOL, Ask.com, Best Buy, Boeing and Comcast.
  • 4. Key Value NOSQL Databases Redis: • Redis adopts Master-Slave architecture • Slaves are allowed communicate with each other. • Redis is written in ANSI C and is best suited for rapidly changing data, with predictable size. Ex) Stock-Analysis • By default, latency monitoring is disabled and user can enable by setting a threshold value to variable "latency threshold" • Redis is designed to be accessed by trusted-users within trusted environment. • Performs Hash or Range partitioning(Mapping range of object to specific Redis instance)
  • 5. Key Value NOSQL Databases CouchDB: • Written in Erlang. • Instead of locks, CouchDB uses Multi-Version Concurrency Control (MVCC) to manage concurrent access to the database. • CouchDB achieves eventual consistency between multiple databases by using incremental replication. • Validates documents using Java Script functions and approve/deny the document update. • CouchDB supports both pull replication(node acts as target)and push replication(node acts as source). • CouchDB is best suited for data that changes occasionally.
  • 6. Key Value NOSQL Databases Azure Table Storage: • Maximum data size is 200 TB per table. • Azure Table retrieves a maximum of 1000 rows per table. • Azure Table Storage provides ACID transaction that guarantees CRDU operations for a single entity in a table. • Storage access architecture of Azure Table Storage has three-layered structure Front-End (FE) layer - Authenticates and authorizes the request. Partition Layer - partitions the object data and performs load-balancing. Distributed and replicated File System (DFS) Layer - Distributes and Replicates data across many clusters. • Azure Table Storage does not provide a way to represent relationships between data. • To provide fault tolerance the stored data is replicated three times within the region, and replicated an additional 3 times in another region.
  • 7. Key Value NOSQL Databases BerkeleyDB: • Berkeley DB is a embedded database engine and is suitable for storing key/value data. • Key and data items are stored in simple structures called DBT (DBT is an acronym for database thang) that contains reference to memory and length. • Berkeley DB supports concurrency in threads even in database with size. • Program accessing Berkeley DB determines how data is to stored in records. • Berkeley DB has three different products: o Berkeley DB - contains database implementations and is written in C o Berkeley DB Java Edition - Log structured storage architecture and coded in Pure Java. o Berkeley DB XML - specializes in the storage of XML documents
  • 8. Column-Family NOSQL databases: HBase: • First developed at Powerset (to power natural language search) • Distributed column oriented database on top of Hadoop/HDFS. • Continuous access to data - Multiple master nodes. • Linear and modular scalability. • Provides interactive commands for manipulating database • Single row atomic operations and row level exclusive locks. • Multiple clients like its native Java library, Thrift, and REST
  • 9. Column-Family NOSQL databases: BigTable: • First developed at Google(Structured data ). • Sparse, distributed, persistent multidimensional sorted map. • Self Managing ( Servers can be added/removed dynamically. Servers adjust to load imbalance). • Fault tolerant & Persistent. • Designed to scale into the petabyte range. • Tables are optimized for GFS (Google File System) by being split into multiple tablets.
  • 10. Column-Family NOSQL databases: HyperTable: • Developed as an in-house software at Zvents. • Manages massive spare tables with timestamped cell versions. • Maximum efficiency (Less hardware, power, datacenter). • Good fit for wide range of applications. • Clean semantics. • High performance.
  • 11. Graph NOSQL databases: Neo4j: • Developed by Neo Technology • Highly scalable, robust. • Graph structures with nodes, edges and properties to store data. • Provides index-free adjacency • Neo4j is schema free – Data does not have to adhere to any convention • ACID – atomic, consistent, isolated and durable for logical units of work • Easy to get started and use. • Support for wide variety of languages (Java, Python, Perl, Scala, Cypher, etc)
  • 12. Document NOSQL databases: MongoDB: • Developed by the software company 10gen as service product later shifted to open source. • Document Oriented Database. • Implemented in C++ for best performance. (built for speed). • Super low latency access to your data (Very little CPU overhead). • Auto Sharding for easy scalability. • Map/Reduce for Aggregation. • Full index support for high performace. • Language drivers for (Ruby/Ruby on rails, Java, C#, JavaScript, Python, Perl, Erlang etc).

Notes de l'éditeur

  1. Cassandra (an Apache project) is a NOSQL Key Value store distributed storage system designed for storing and managing huge amounts of structured or unstructured data over many nodes. Cassandra was first developed at Facebook and has been available as an Apache top-level project since 2010. Like many other NOSQL systems, Cassandra is designed to run over cheap commodity hardware. Cassandra runs over a series of many decentralized clustered nodes and offers very elastic scalability. Capacity can be increased and put online on the fly. This makes Cassandra an ‘always on solution’. Also, because of its distributed architecture, Cassandra has no single point of failure. Cassandra is designed never to go down. Ever Some design aspects of Cassandra resemble a traditional database management system. Some of the terminology will look recognizable to SQL/DDL database developers. However, Cassandra (like most other NOSQL solutions) does not support a normalized data model. Cassandra is hugely popular and is generally considered the most implemented of the NO/SQL databases. Most like the low complexity of Cassandra. Many consider it an easy and simple solution for cloud data storage. Its simplicity and elegant design makes it a natural choice for many organizations