SlideShare une entreprise Scribd logo
1  sur  17
Date : September 28, 2017
Location : USM
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
1
Presented by:
Md. Shohel Rana
W10006485
CASSANDRA - A DISTRIBUTEDDATABASE SYSTEM
CONTENTS
• About Cassandra
• Features of Cassandra
• Architecture
• Components of Cassandra
• Cassandra Data Model
• Comparison between Data Models of Cassandra and RDBMS
• Installation
• Cassandra - CQLSH
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
2
ABOUT CASSANDRA
• Cassandra is a highly scalable, high-performance distributed
database designed to handle large amounts of data across many
commodity servers, providing high availability with no single point of
failure. It is a type of NoSQL database
• It is a column-oriented database
• Its distribution design is based on Amazon’s Dynamo and its data
model on Google’s Bigtable.
• Created at Facebook, it differs sharply from relational database
management systems.
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
3
ABOUT CASSANDRA (CONT.)
• Cassandra implements a Dynamo-style replication model with no
single point of failure, but adds a more powerful “column family” data
model.
• Cassandra is being used by some of the biggest companies such as
Facebook, Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
4
FEATURES OF CASSANDRA
• Elastic scalability - Cassandra is highly scalable; it allows to add more hardware to
accommodate more customers and more data as per requirement.
• Always on architecture - Cassandra has no single point of failure and it is continuously
available for business-critical applications that cannot afford a failure.
• Fast linear-scale performance - Cassandra is linearly scalable, i.e., it increases your
throughput as you increase the number of nodes in the cluster. Therefore, it maintains a
quick response time.
• Flexible data storage - Cassandra accommodates all possible data formats including:
structured, semi-structured, and unstructured. It can dynamically accommodate changes
to your data structures according to your need.
• Easy data distribution - Cassandra provides the flexibility to distribute data where you
need by replicating data across multiple data centers.
• Transaction support - Cassandra supports properties like Atomicity, Consistency,
Isolation, and Durability (ACID).
• Fast writes - Cassandra was designed to run on cheap commodity hardware. It performs
blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the
read efficiency.
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
5
ARCHITECTURE
• One or more of the nodes in a cluster act as replicas for a given piece
of data. If it is detected that some of the nodes responded with an out-
of-date value, Cassandra will return the most recent value to the
client. After returning the most recent value, Cassandra performs a
read repair in the background to update the stale values
• Cassandra uses the Gossip Protocol in the background to allow the
nodes to communicate with each other and detect any faulty nodes in
the cluster
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
6
ARCHITECTURE (CONT.)
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
7
COMPONENTS OF CASSANDRA
• Node − It is the place where data is stored.
• Data center− It is a collection of related nodes.
• Cluster− A cluster is a component that contains one or more data centers.
• Commit log − The commit log is a crash-recovery mechanism in Cassandra. Every write
operation is written to the commit log.
• Mem-table − A mem-table is a memory-resident data structure. After commit log, the
data will be written to the mem-table. Sometimes, for a single-column family, there will be
multiple mem-tables.
• SSTable − It is a disk file to which the data is flushed from the mem-table when its
contents reach a threshold value.
• Bloomfilter− These are nothing but quick, nondeterministic, algorithms for testing
whether an element is a member of a set. It is a special kind of cache. Bloom filters are
accessed after every query.
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
8
CASSANDRA DATA MODEL
• Cluster: Cassandra database is distributed over several machines that
operate together. The outermost container is known as the Cluster
• KEYSPACE: outermost container for data in Cassandra
Replication factor− It is the number of machines in the cluster that will receive
copies of the same data.
Replica placement strategy − It is nothing but the strategy to place replicas in the
ring. We have strategies such as simple strategy (rack-aware strategy), old network
topology strategy (rack-aware strategy), and network topology strategy (datacenter-
shared strategy).
Column families − Keyspace is a container for a list of one or more column families.
A column family, in turn, is a container of a collection of rows
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
9
CASSANDRA DATA MODEL (CONT.)
• Column Family: A column family is a container for an ordered collection of
rows. Each row, in turn, is an ordered collection of columns
• Column: A column is the basic data structure of Cassandra with three
values, namely key or column name, value, and a time stamp
• SuperColumn: A super column is a special column, therefore, it is also a
key-value pair. But a super column stores a map of sub-columns.
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
10
COMPARISON BETWEEN DATA MODELS OF
CASSANDRA AND RDBMS
RDBMS Cassandra
RDBMS deals with structured data. Cassandra deals with unstructured data.
It has a fixed schema. Cassandra has a flexible schema.
In RDBMS, a table is an array of arrays. (ROW
x COLUMN)
In Cassandra, a table is a list of “nested key-
value pairs”. (ROW x COLUMN key x COLUMN
value)
Database is the outermost container that
contains data corresponding to an application.
Keyspace is the outermost container that
contains data corresponding to an application.
Tables are the entities of a database. Tables or column families are the entity of a
keyspace.
Row is an individual record in RDBMS. Row is a unit of replication in Cassandra.
Column represents the attributes of a relation. Column is a unit of storage in Cassandra.
RDBMS supports the concepts of foreign keys,
joins.
Relationships are represented using
collections.
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
11
INSTALLATION
• Pre-requisites: Java JDK, Python, Client drivers (Java ▪Python ▪Ruby ▪C# /
.NET ▪Nodejs ▪PHP ▪C++ ▪Scala ▪Clojure ▪Erlang ▪Go ▪Haskell ▪Rust ▪
Perl)
• Installation Methods:
 Installation from binary tarball files
 Installation from Debian packages.
 Installation using Windows Installer (preferred)
• Installation Procedure (Windows System): See Documents
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
12
CASSANDRA - CQLSH
• Cassandra provides a prompt Cassandra query language shell (CQLSH)
that allows users to communicate with it. Using this shell, you can execute
Cassandra Query Language (CQL)
• CQLSHCommands:
• Documented Shell Commands: HELP, COPY, DESCRIBE, TRACING, SHOW etc.
• CQL Data Definition Commands: CREATE KEYSPACE, ALTER, USE, DROP,
TRUNCATE etc.
• CQL Data Manipulation Commands: INSERT, UPDATE, DELETE, BATCH etc.
• CQL Clauses: SELECT, WHERE, ORDER BY etc.
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
13
CASSANDRA – CQLSH (CONT.)
• CQL Example:
• Create KEYSPACE: CREATE KEYSPACE people WITH REPLICATION = <'class' :
'SimpleStrategy', 'replication_factor' : 3>;
• Create Table: USE "people"; CREATE TABLE "employees" (id uuid, name varchar,
email, PRIMARY KEY (id, email));
• Insert Data INSERT INTO employees(id, name, email) VALUES(now(), ‘X’, ‘a@a.com’);
• Data Retrieve: SELECT * FROM employees;
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
14
DATA MONITORING
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
15
LIMITATION
• DataStax Versioning
• Lac of more machines to make multiple nodes as well as multiple
clusters
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
16
THANKS!
QUESTION?
CSC 733: DISTRIBUTEDDATABASE SYSTEMS
17

Contenu connexe

Tendances

Tendances (20)

Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache Cassandra
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
Cassandra Architecture FTW
Cassandra Architecture FTWCassandra Architecture FTW
Cassandra Architecture FTW
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud Database
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Column db dol
Column db dolColumn db dol
Column db dol
 
Cassandra
CassandraCassandra
Cassandra
 
No sql
No sqlNo sql
No sql
 
Cassandra ppt 2
Cassandra ppt 2Cassandra ppt 2
Cassandra ppt 2
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 

Similaire à Cassandra - A Distributed Database System

04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
hothyfa
 

Similaire à Cassandra - A Distributed Database System (20)

04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptx
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
 
Why Cassandra?
Why Cassandra?Why Cassandra?
Why Cassandra?
 
NoSQL Session II
NoSQL Session IINoSQL Session II
NoSQL Session II
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Cassandra advanced part-ll
Cassandra advanced part-llCassandra advanced part-ll
Cassandra advanced part-ll
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 

Plus de Md. Shohel Rana

Plus de Md. Shohel Rana (7)

A Proposed PST Model for Enhancing E-Learning Experiences
A Proposed PST Model for Enhancing E-Learning ExperiencesA Proposed PST Model for Enhancing E-Learning Experiences
A Proposed PST Model for Enhancing E-Learning Experiences
 
Speckle Noise Reduction in Ultrasound Images using Adaptive and Anisotropic D...
Speckle Noise Reduction in Ultrasound Images using Adaptive and Anisotropic D...Speckle Noise Reduction in Ultrasound Images using Adaptive and Anisotropic D...
Speckle Noise Reduction in Ultrasound Images using Adaptive and Anisotropic D...
 
An Enhanced Model for Inpainting on Digital Images Using Dynamic Masking
An Enhanced Model for Inpainting on Digital Images Using Dynamic MaskingAn Enhanced Model for Inpainting on Digital Images Using Dynamic Masking
An Enhanced Model for Inpainting on Digital Images Using Dynamic Masking
 
Comparing the Performance of Different Ultrasonic Image Enhancement Technique...
Comparing the Performance of Different Ultrasonic Image Enhancement Technique...Comparing the Performance of Different Ultrasonic Image Enhancement Technique...
Comparing the Performance of Different Ultrasonic Image Enhancement Technique...
 
Malware analysis on android using supervised machine learning techniques
Malware analysis on android using supervised machine learning techniquesMalware analysis on android using supervised machine learning techniques
Malware analysis on android using supervised machine learning techniques
 
Visual Techniques
Visual TechniquesVisual Techniques
Visual Techniques
 
De-convolution on Digital Images
De-convolution on Digital ImagesDe-convolution on Digital Images
De-convolution on Digital Images
 

Dernier

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 

Dernier (20)

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

Cassandra - A Distributed Database System

  • 1. Date : September 28, 2017 Location : USM CSC 733: DISTRIBUTEDDATABASE SYSTEMS 1 Presented by: Md. Shohel Rana W10006485 CASSANDRA - A DISTRIBUTEDDATABASE SYSTEM
  • 2. CONTENTS • About Cassandra • Features of Cassandra • Architecture • Components of Cassandra • Cassandra Data Model • Comparison between Data Models of Cassandra and RDBMS • Installation • Cassandra - CQLSH CSC 733: DISTRIBUTEDDATABASE SYSTEMS 2
  • 3. ABOUT CASSANDRA • Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is a type of NoSQL database • It is a column-oriented database • Its distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable. • Created at Facebook, it differs sharply from relational database management systems. CSC 733: DISTRIBUTEDDATABASE SYSTEMS 3
  • 4. ABOUT CASSANDRA (CONT.) • Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful “column family” data model. • Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more CSC 733: DISTRIBUTEDDATABASE SYSTEMS 4
  • 5. FEATURES OF CASSANDRA • Elastic scalability - Cassandra is highly scalable; it allows to add more hardware to accommodate more customers and more data as per requirement. • Always on architecture - Cassandra has no single point of failure and it is continuously available for business-critical applications that cannot afford a failure. • Fast linear-scale performance - Cassandra is linearly scalable, i.e., it increases your throughput as you increase the number of nodes in the cluster. Therefore, it maintains a quick response time. • Flexible data storage - Cassandra accommodates all possible data formats including: structured, semi-structured, and unstructured. It can dynamically accommodate changes to your data structures according to your need. • Easy data distribution - Cassandra provides the flexibility to distribute data where you need by replicating data across multiple data centers. • Transaction support - Cassandra supports properties like Atomicity, Consistency, Isolation, and Durability (ACID). • Fast writes - Cassandra was designed to run on cheap commodity hardware. It performs blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency. CSC 733: DISTRIBUTEDDATABASE SYSTEMS 5
  • 6. ARCHITECTURE • One or more of the nodes in a cluster act as replicas for a given piece of data. If it is detected that some of the nodes responded with an out- of-date value, Cassandra will return the most recent value to the client. After returning the most recent value, Cassandra performs a read repair in the background to update the stale values • Cassandra uses the Gossip Protocol in the background to allow the nodes to communicate with each other and detect any faulty nodes in the cluster CSC 733: DISTRIBUTEDDATABASE SYSTEMS 6
  • 7. ARCHITECTURE (CONT.) CSC 733: DISTRIBUTEDDATABASE SYSTEMS 7
  • 8. COMPONENTS OF CASSANDRA • Node − It is the place where data is stored. • Data center− It is a collection of related nodes. • Cluster− A cluster is a component that contains one or more data centers. • Commit log − The commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log. • Mem-table − A mem-table is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-tables. • SSTable − It is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value. • Bloomfilter− These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query. CSC 733: DISTRIBUTEDDATABASE SYSTEMS 8
  • 9. CASSANDRA DATA MODEL • Cluster: Cassandra database is distributed over several machines that operate together. The outermost container is known as the Cluster • KEYSPACE: outermost container for data in Cassandra Replication factor− It is the number of machines in the cluster that will receive copies of the same data. Replica placement strategy − It is nothing but the strategy to place replicas in the ring. We have strategies such as simple strategy (rack-aware strategy), old network topology strategy (rack-aware strategy), and network topology strategy (datacenter- shared strategy). Column families − Keyspace is a container for a list of one or more column families. A column family, in turn, is a container of a collection of rows CSC 733: DISTRIBUTEDDATABASE SYSTEMS 9
  • 10. CASSANDRA DATA MODEL (CONT.) • Column Family: A column family is a container for an ordered collection of rows. Each row, in turn, is an ordered collection of columns • Column: A column is the basic data structure of Cassandra with three values, namely key or column name, value, and a time stamp • SuperColumn: A super column is a special column, therefore, it is also a key-value pair. But a super column stores a map of sub-columns. CSC 733: DISTRIBUTEDDATABASE SYSTEMS 10
  • 11. COMPARISON BETWEEN DATA MODELS OF CASSANDRA AND RDBMS RDBMS Cassandra RDBMS deals with structured data. Cassandra deals with unstructured data. It has a fixed schema. Cassandra has a flexible schema. In RDBMS, a table is an array of arrays. (ROW x COLUMN) In Cassandra, a table is a list of “nested key- value pairs”. (ROW x COLUMN key x COLUMN value) Database is the outermost container that contains data corresponding to an application. Keyspace is the outermost container that contains data corresponding to an application. Tables are the entities of a database. Tables or column families are the entity of a keyspace. Row is an individual record in RDBMS. Row is a unit of replication in Cassandra. Column represents the attributes of a relation. Column is a unit of storage in Cassandra. RDBMS supports the concepts of foreign keys, joins. Relationships are represented using collections. CSC 733: DISTRIBUTEDDATABASE SYSTEMS 11
  • 12. INSTALLATION • Pre-requisites: Java JDK, Python, Client drivers (Java ▪Python ▪Ruby ▪C# / .NET ▪Nodejs ▪PHP ▪C++ ▪Scala ▪Clojure ▪Erlang ▪Go ▪Haskell ▪Rust ▪ Perl) • Installation Methods:  Installation from binary tarball files  Installation from Debian packages.  Installation using Windows Installer (preferred) • Installation Procedure (Windows System): See Documents CSC 733: DISTRIBUTEDDATABASE SYSTEMS 12
  • 13. CASSANDRA - CQLSH • Cassandra provides a prompt Cassandra query language shell (CQLSH) that allows users to communicate with it. Using this shell, you can execute Cassandra Query Language (CQL) • CQLSHCommands: • Documented Shell Commands: HELP, COPY, DESCRIBE, TRACING, SHOW etc. • CQL Data Definition Commands: CREATE KEYSPACE, ALTER, USE, DROP, TRUNCATE etc. • CQL Data Manipulation Commands: INSERT, UPDATE, DELETE, BATCH etc. • CQL Clauses: SELECT, WHERE, ORDER BY etc. CSC 733: DISTRIBUTEDDATABASE SYSTEMS 13
  • 14. CASSANDRA – CQLSH (CONT.) • CQL Example: • Create KEYSPACE: CREATE KEYSPACE people WITH REPLICATION = <'class' : 'SimpleStrategy', 'replication_factor' : 3>; • Create Table: USE "people"; CREATE TABLE "employees" (id uuid, name varchar, email, PRIMARY KEY (id, email)); • Insert Data INSERT INTO employees(id, name, email) VALUES(now(), ‘X’, ‘a@a.com’); • Data Retrieve: SELECT * FROM employees; CSC 733: DISTRIBUTEDDATABASE SYSTEMS 14
  • 15. DATA MONITORING CSC 733: DISTRIBUTEDDATABASE SYSTEMS 15
  • 16. LIMITATION • DataStax Versioning • Lac of more machines to make multiple nodes as well as multiple clusters CSC 733: DISTRIBUTEDDATABASE SYSTEMS 16