SlideShare une entreprise Scribd logo
1  sur  36
Prepared by,
Pooja G.V
8th sem, CSE
GECH.
• Cassandra is a distributed database from Apache
that is highly scalable and designed to manage
very large amounts of structured data.
•It provides high availability with no single point of
failure.
• Open-source database
management system
(DBMS)
• Several key features of
Cassandra differentiate it
from other similar systems
Continued.....
History of Cassandra
● Cassandra was created to power
the Facebook Inbox Search
● Facebook open-sourced Cassandra in 2008 and
became an Apache Incubator project
● In 2010, Cassandra graduated to a top-level
project, regular update and releases followed
Motivation and Function
● Designed to handle large amount of data across
multiple servers
● There is a lot of unorganized data out there
● Easy to implement and deploy
● Mimics traditional relational database systems, but
with triggers and lightweight transactions
● Raw, simple data structures
General Design Features
Emphasis on performance over analysis
● Still has support for analysis tools such as Hadoop
Organization
● Rows are organized into tables
● First component of a table’s primary key is the partition key
● Rows are clustered by the remaining columns of the key
● Columns may be indexed separately from the primary key
● Tables may be created, dropped, altered at runtime without blocking
queries
Language
● CQL (Cassandra Query Language) introduced, similar to SQL (flattened
learning curve)
Peer to Peer Cluster
• Decentralized design
• No single point of failure
• No bottlenecking
Fault Tolerance/Durability
Failures happen all the time with multiple nodes
● Hardware Failure
● Bugs
● Operator error
● Power Outage, etc.
Solution: Buy cheap, redundant hardware,
replicate, maintain consistency
Fault Tolerance/Durability
• Replication
• Distribution of data to multiple data centers
Performance
• Core architectural designs allow Cassandra to
outperform its competitors
• Very good read and write throughputs
Scalability
Read and write throughput increase linearly as more machines are added
“In terms of scalability, there is a clear winner throughout our experiments.
Cassandra achieves the highest throughput for the maximum number of
nodes…” - University of Toronto
Comparisons
Apache Cassandra Google Big Table Amazon DynamoDB
Storage Type Column Column Key-Value
Best Use Write often, read less Designed for large
scalability
Large database solution
Concurrency Control MVCC Locks ACID
Characteristics High Availability
Partition Tolerance
Persistence
Consistency
High Availability
Partition Tolerance
Persistence
Consistency
High Availability
Cassandra Use Cases
Netflix
• online DVD and Blu-Ray
movie retailer
• Nielsen study showed
that 38% of Americans
use or subscribe to
Netflix
Netflix: Why Cassandra
● Using a central SQL database negatively
impacted scalability and availability
● International Expansion required Multi-
Datacenter solution
● Need for configurable Replication, Consistency,
and Resiliency in the face of failure
● Cassandra on AWS offered high levels of
scalability and availability
Jason Brown,
Senior Software
Engineer at Netflix
Hulu
• a website and a
subscription service
offering on-demand
streaming video media
• ~30 million unique
viewers per month
Hulu: Why Cassandra
• need for Availability
• need for Scalability
• Good Performance
• Nearly Linear Scalability
• Geo-Replication
• Minimal Maintenance Requirements
Andres Rangel,
Senior Software
Engineer at Hulu
Reasons for Choosing Cassandra
• Value availability over
consistency
• Require high write-throughput
• High scalability required
• No single point of failure
CAP Theorem
Cassandra’s Data Model
• Cassandra is a column oriented
NoSQL system
• Column families: sets of key-
value pairs
• A row is a collection of columns
labeled with a name
Key-Value Model
Cassandra Row
• the value of a row is itself a
sequence of key-value pairs
• such nested key-value pairs are
columns
• key = column name
• a row must contain at least 1
column
Example of Columns
Column names storing values
• key: User ID
• column names store
tweet ID values
• values of all column
names are set to “-”
(empty byte array) as
they are not used
• A Key Space is a group of
column families together. It
is only a logical grouping of
column families and
provides an isolated scope
for names
Key Space
Comparing Cassandra (C*) and RDBMS
• with RDBMS, a normalized data model is created
without considering the exact queries
• with C*, the data model is designed for specific
queries
• C*: NO joins, relationships, or foreign keys
Cassandra Query Language - CQL
• creating a keyspace - namespace of tables
CREATE KEYSPACE demo
WITH replication = {‘class’: ’SimpleStrategy’,
replication_factor’: 3};
• to use namespace:
USE demo;
Cassandra Query Language - CQL
• creating tables:
CREATE TABLE users( CREATE TABLE tweets(
email varchar, email varchar,
bio varchar, time_posted timestamp,
birthday timestamp, tweet varchar,
active boolean, PRIMARY KEY (email,
time_posted));
PRIMARY KEY (email));
Cassandra Query Language - CQL
• inserting data
INSERT INTO users (email, bio, birthday, active)
VALUES (‘john.doe@bti360.com’, ‘BT360 Teammate’,
516513600000, true);
Cassandra Query Language - CQL
• querying tables
• SELECT expression reads one or more records from
Cassandra column family and returns a result-set of rows
SELECT * FROM users;
SELECT email FROM users WHERE active = true;
Cassandra: Conclusion
• perfect for time-series data
• high performance
• Decentralization
• nearly linear scalability
• replication support
• no single points of failure
• MapReduce support
Cassandra Advantages
Cassandra Weaknesses
● no referential integrity
● querying options for retrieving data are limited
● sorting data is a design decision
● no support for atomic operations
● first think about queries, then about data model
Cassandra: Points to Consider
● Cassandra is designed as a distributed database
management system
● Cassandra write performance is always excellent, but read
performance depends on write patterns
● having a high-level understanding of some internals is a plus
References
• Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured
storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40.
• Hewitt, Eben. Cassandra: the definitive guide. O'Reilly Media, 2010.
• http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/a
rchitectureTOC.html
• http://www.slideshare.net/planetcassandra/a-deep-dive-into-understanding-
apache-cassandra
• http://www.slideshare.net/DataStax/evaluating-apache-cassandra-as-a-cloud-
database
• http://planetcassandra.org/functional-use-cases/
• http://marsmedia.info/en/cassandra-pros-cons-and-model.php
• http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-
cassandra
• http://wiki.apache.org/cassandra/CassandraLimitations

Contenu connexe

Tendances

PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best PracticesCloudera, Inc.
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra Nikiforos Botis
 
Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthdaveconnors
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbsonalighai
 
Introducing Delta Live Tables: Make Reliable ETL Easy on Delta Lake
Introducing Delta Live Tables: Make Reliable ETL Easy on Delta LakeIntroducing Delta Live Tables: Make Reliable ETL Easy on Delta Lake
Introducing Delta Live Tables: Make Reliable ETL Easy on Delta LakeDatabricks
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDr. C.V. Suresh Babu
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Eric Sun
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentialsqureshihamid
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 

Tendances (20)

PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per month
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
 
Introducing Delta Live Tables: Make Reliable ETL Easy on Delta Lake
Introducing Delta Live Tables: Make Reliable ETL Easy on Delta LakeIntroducing Delta Live Tables: Make Reliable ETL Easy on Delta Lake
Introducing Delta Live Tables: Make Reliable ETL Easy on Delta Lake
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Spark
SparkSpark
Spark
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 

En vedette

Cassandra devoxx 2010
Cassandra devoxx 2010Cassandra devoxx 2010
Cassandra devoxx 2010jbellis
 
Cassandra - Deep Dive ...
Cassandra - Deep Dive ...Cassandra - Deep Dive ...
Cassandra - Deep Dive ...sameiralk
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemVarad Meru
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architectureT Jake Luciani
 
Cassandra - Research Paper Overview
Cassandra - Research Paper OverviewCassandra - Research Paper Overview
Cassandra - Research Paper Overviewsameiralk
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
 
Shall we play a game?
Shall we play a game?Shall we play a game?
Shall we play a game?Maciej Lasyk
 

En vedette (10)

Cassandra devoxx 2010
Cassandra devoxx 2010Cassandra devoxx 2010
Cassandra devoxx 2010
 
Cassandra - Deep Dive ...
Cassandra - Deep Dive ...Cassandra - Deep Dive ...
Cassandra - Deep Dive ...
 
Cassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage SystemCassandra - A Decentralized Structured Storage System
Cassandra - A Decentralized Structured Storage System
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Cassandra - Research Paper Overview
Cassandra - Research Paper OverviewCassandra - Research Paper Overview
Cassandra - Research Paper Overview
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
Shall we play a game?
Shall we play a game?Shall we play a game?
Shall we play a game?
 

Similaire à Cassandra

cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_finalSergioBruno21
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataChen Robert
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBJanos Geronimo
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Managementsameerfaizan
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL ServicesAmazon Web Services
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Lviv Startup Club
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 

Similaire à Cassandra (20)

cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Big data stores
Big data  storesBig data  stores
Big data stores
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDB
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Cassandra Architecture FTW
Cassandra Architecture FTWCassandra Architecture FTW
Cassandra Architecture FTW
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 

Dernier

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Dernier (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Cassandra

  • 1. Prepared by, Pooja G.V 8th sem, CSE GECH.
  • 2.
  • 3. • Cassandra is a distributed database from Apache that is highly scalable and designed to manage very large amounts of structured data. •It provides high availability with no single point of failure.
  • 4. • Open-source database management system (DBMS) • Several key features of Cassandra differentiate it from other similar systems Continued.....
  • 5. History of Cassandra ● Cassandra was created to power the Facebook Inbox Search ● Facebook open-sourced Cassandra in 2008 and became an Apache Incubator project ● In 2010, Cassandra graduated to a top-level project, regular update and releases followed
  • 6. Motivation and Function ● Designed to handle large amount of data across multiple servers ● There is a lot of unorganized data out there ● Easy to implement and deploy ● Mimics traditional relational database systems, but with triggers and lightweight transactions ● Raw, simple data structures
  • 7. General Design Features Emphasis on performance over analysis ● Still has support for analysis tools such as Hadoop Organization ● Rows are organized into tables ● First component of a table’s primary key is the partition key ● Rows are clustered by the remaining columns of the key ● Columns may be indexed separately from the primary key ● Tables may be created, dropped, altered at runtime without blocking queries Language ● CQL (Cassandra Query Language) introduced, similar to SQL (flattened learning curve)
  • 8. Peer to Peer Cluster • Decentralized design • No single point of failure • No bottlenecking
  • 9. Fault Tolerance/Durability Failures happen all the time with multiple nodes ● Hardware Failure ● Bugs ● Operator error ● Power Outage, etc. Solution: Buy cheap, redundant hardware, replicate, maintain consistency
  • 10. Fault Tolerance/Durability • Replication • Distribution of data to multiple data centers
  • 11. Performance • Core architectural designs allow Cassandra to outperform its competitors • Very good read and write throughputs
  • 12. Scalability Read and write throughput increase linearly as more machines are added “In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes…” - University of Toronto
  • 13. Comparisons Apache Cassandra Google Big Table Amazon DynamoDB Storage Type Column Column Key-Value Best Use Write often, read less Designed for large scalability Large database solution Concurrency Control MVCC Locks ACID Characteristics High Availability Partition Tolerance Persistence Consistency High Availability Partition Tolerance Persistence Consistency High Availability
  • 15. Netflix • online DVD and Blu-Ray movie retailer • Nielsen study showed that 38% of Americans use or subscribe to Netflix
  • 16. Netflix: Why Cassandra ● Using a central SQL database negatively impacted scalability and availability ● International Expansion required Multi- Datacenter solution ● Need for configurable Replication, Consistency, and Resiliency in the face of failure ● Cassandra on AWS offered high levels of scalability and availability Jason Brown, Senior Software Engineer at Netflix
  • 17. Hulu • a website and a subscription service offering on-demand streaming video media • ~30 million unique viewers per month
  • 18. Hulu: Why Cassandra • need for Availability • need for Scalability • Good Performance • Nearly Linear Scalability • Geo-Replication • Minimal Maintenance Requirements Andres Rangel, Senior Software Engineer at Hulu
  • 19. Reasons for Choosing Cassandra • Value availability over consistency • Require high write-throughput • High scalability required • No single point of failure
  • 22. • Cassandra is a column oriented NoSQL system • Column families: sets of key- value pairs • A row is a collection of columns labeled with a name Key-Value Model
  • 23. Cassandra Row • the value of a row is itself a sequence of key-value pairs • such nested key-value pairs are columns • key = column name • a row must contain at least 1 column
  • 25. Column names storing values • key: User ID • column names store tweet ID values • values of all column names are set to “-” (empty byte array) as they are not used
  • 26. • A Key Space is a group of column families together. It is only a logical grouping of column families and provides an isolated scope for names Key Space
  • 27. Comparing Cassandra (C*) and RDBMS • with RDBMS, a normalized data model is created without considering the exact queries • with C*, the data model is designed for specific queries • C*: NO joins, relationships, or foreign keys
  • 28. Cassandra Query Language - CQL • creating a keyspace - namespace of tables CREATE KEYSPACE demo WITH replication = {‘class’: ’SimpleStrategy’, replication_factor’: 3}; • to use namespace: USE demo;
  • 29. Cassandra Query Language - CQL • creating tables: CREATE TABLE users( CREATE TABLE tweets( email varchar, email varchar, bio varchar, time_posted timestamp, birthday timestamp, tweet varchar, active boolean, PRIMARY KEY (email, time_posted)); PRIMARY KEY (email));
  • 30. Cassandra Query Language - CQL • inserting data INSERT INTO users (email, bio, birthday, active) VALUES (‘john.doe@bti360.com’, ‘BT360 Teammate’, 516513600000, true);
  • 31. Cassandra Query Language - CQL • querying tables • SELECT expression reads one or more records from Cassandra column family and returns a result-set of rows SELECT * FROM users; SELECT email FROM users WHERE active = true;
  • 33. • perfect for time-series data • high performance • Decentralization • nearly linear scalability • replication support • no single points of failure • MapReduce support Cassandra Advantages
  • 34. Cassandra Weaknesses ● no referential integrity ● querying options for retrieving data are limited ● sorting data is a design decision ● no support for atomic operations ● first think about queries, then about data model
  • 35. Cassandra: Points to Consider ● Cassandra is designed as a distributed database management system ● Cassandra write performance is always excellent, but read performance depends on write patterns ● having a high-level understanding of some internals is a plus
  • 36. References • Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40. • Hewitt, Eben. Cassandra: the definitive guide. O'Reilly Media, 2010. • http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/a rchitectureTOC.html • http://www.slideshare.net/planetcassandra/a-deep-dive-into-understanding- apache-cassandra • http://www.slideshare.net/DataStax/evaluating-apache-cassandra-as-a-cloud- database • http://planetcassandra.org/functional-use-cases/ • http://marsmedia.info/en/cassandra-pros-cons-and-model.php • http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global- cassandra • http://wiki.apache.org/cassandra/CassandraLimitations

Notes de l'éditeur

  1. It combines Amazon Dynamo’s fully distributed design with Google Bigtable’s column-oriented data model.
  2. designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failur
  3. offering streaming movies through video game consoles, Apple TV, TiVo and more http://www.usatoday.com/story/life/tv/2013/09/18/netflix-hulu-amazon-nielsen-viewership-data/2831535/
  4. Central Oracle database -> everything in one place, convenient until it fails Schema changes required downtime Cassandra stores 3 local copies, 1 per zone Synchronous access, durable Replicates at destination Global Coverage business agility Local Access better latency fault isolation
  5. Geo-Replication – distribution of data across multiple regions
  6. SimpleStrategy is a replication strategy. There are two: SimpleStrategy and NetworkTopologyStrategy. SimpleStrategy is used for simple data center clusters. It is default. NetworkTopologyStrategy is used for multi-datacenter deployments
  7. Table – when schema is static Column Family – when schema is dynamic
  8. No join or subquery support, and limited support for aggregation. This is by design, to force you to denormalize into partitions that can be efficiently queried from a single replica, instead of having to gather data from across the entire cluster. Ordering is done per-partition, and is specified at table creation time. Again, this is to enforce good application design; sorting thousands or millions of rows can be fast in development, but sorting billions in production is a bad idea.
  9. It’s important to analyze how you are going to query your data. Spending time to design your schema around your query pattern can save a lot of hassle debugging performance issues while also ensuring that you can scale easily. Additionally, having a high-level understanding of some of the internals such has how deletions are implemented, how secondary indices operate, and when to use the row cache can go a long way in designing a strong application built atop Cassandra.