SlideShare a Scribd company logo
1 of 31
Download to read offline
Introduction to
  Cassandra




    Shimi Kiviti
    @shimi_k
Motivation

            Scaling

How do you scale your database?
 ● reads
 ● writes
Influential Papers

 ● Bigtable: A distributed storage system for structured data,
   2006
 ● Dynamo: amazon's highly available key-value store, 2007


Cassandra:
 ● partition and replication - Dynamo
 ● log structure column family - Bigtable
Cassandra Highlights

● Symmetric - all nodes are exactly the same
   ○ No single point of failure
   ○ Linearly scalable
   ○ Ease of administration
● High availability with multiple datacenters
● Consistency vs Latency
● Read/Write anywhere
● Flexible Schema
● Column TTL
● Distributed Counters
DHT - Distributed Hash Table
DHT

● O(1) node lookup
● Explicit replication
● Linear Scalability
Consistency

N = Replication factor
R = Number of replicas to block when read <= N
W = Number of replicas to block when write <= N
Quorum = N/2 + 1

When W + R > N there is a full consistency
examples:
 ● W = 1, R = N
 ● W = N, R = 1
 ● W = Quorum, R = Quorum
Consistency Level

● Every request defines consistency level
   ○ Any
   ○ One
   ○ Two
   ○ Three
   ○ Quorum
   ○ Local Quorum
   ○ Each Quorum
   ○ All
Data Model

● Keyspace ~ schema
● ColumnFamilies ~ table
● Rows
● Columns
Column Family

Key1   Column   Column   Column


Key2   Column   Column
Column Family

ColumnFamily: {
  TOK: {
    chen: 1,
    ronen: 7
  }
  CityPath: {
    yuval: 5
  }
}
Super Column Family
          Super1   Column Column Column
Key
          Super2   Column Column Column

 ColumnFamily: {
   Key: {
     super1: {
       name: value,
       name: value
     }
     super2: {
       name: value
     }
   }
 }
Write

● Any node
● Partitioner
● Commit log, memtable
● Wait for W responses
Write
Write

● No reads
● No seeks
● Sequential disk access
● Atomic within a column family
● Fast
● Always writeable (hinted hand-off)
Read

● Choose any node
● Partitioner
● Wait for R responses
● tunable read repair in the background
Read




Read can be from multiple SSTables
Slower then writes
Cache

● There is no need to use memcached
● There is an internal configurable cache
   ○ Key cache
   ○ Row cache
Sorting

When you preform get the result is sorted
 ● Rows are sorted according to the partitioner
 ● Columns in a row are sorted according to the type of the
   column name
Partitioner

● RandomPartitioner - Uses hash values as tokens. useful for
  distributing the load on all nodes.
  If you use it, set the nodes tokens manually

● OrderPreservePartioner - You can get sorted rows but it will
  cost you with an even cluster
Column Types

Available types:
 ● Bytes
 ● UTF8
 ● Ascii
 ● Long
 ● Date
 ● UUID
 ● Composite - <Type1>:<Type2>
Column Types

Examples:

Sort1:
8            10
9      vs    8
10           9

Sort2:
dan:8             dan:10
dan:10      vs    dan:8
shimi:1           shimi:1
Clients

● Thrift - Cassandra driver level interface
● CQL - Cassandra query language (SQL like)
● High level clients:
   ○ Python
   ○ Java
   ○ Scala
   ○ Clojure
   ○ .Net
   ○ Ruby
   ○ PHP
   ○ Perl
   ○ C++
   ○ Haskel
Cascal - Scala client

Insert column:

session.insert("app"  "users"  "shimi"  "passwd"  "mypass")

val key = "app"  "users"  "shimi"
session.insert(key  "email"  "shimi.k@...")


Get column value:

val pass = session.get(key  "passwd")
Cascal

Get multiple columns:

val row = session.list(key)
val cols = session.list(key, RangePredicate("email", "passwd"))
val cols = session.list(key, ColumnPredicate( List("passwd", "email") ))
Cascal

Get multiple rows:

val family = "app"  "users"
val rows = session.list(family, RangePredicate("dan", "shimi"))
val rows = session.list(family, KeyPrdicate("dan", "shimi"))
Cascal

Remove column:
session.remove("app"  "users"  "shimi"  "passwd")


Remove row:
session.remove("app"  "users"  "shimi")


Batch operations:

val deleteCols = Delete(key, ColumnPredicate("age" :: "sex"))
val insertEmail = Insert(key  "email"  "shimi.k@...")
session.batch(insertEmail :: deleteCols)
Guidelines

● Keep together the data you query together
● Think about your use case and how you should fetch your
  data.
● Don't try to normalize your data
● You can't win the disk
● Be ready to get your hands dirty
● There is no single solution for everything. You might
  consider using different solutions together
The End

Useful links:
 ● Cassandra, http://cassandra.apache.org/
 ● Wiki http://wiki.apache.org/cassandra/
 ● Cassandra mailing list
 ● IRC
 ● Bigtable, http://labs.google.com/papers/bigtable.html
 ● Dynamo http://www.allthingsdistributed.
   com/2007/10/amazons_dynamo.html
 ● Cascal, https://github.com/shimi/cascal

More Related Content

What's hot

openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldOliver Hankeln
 
"Scala in Goozy", Alexey Zlobin
"Scala in Goozy", Alexey Zlobin "Scala in Goozy", Alexey Zlobin
"Scala in Goozy", Alexey Zlobin Vasil Remeniuk
 
Log stage zero-cost structured logging
Log stage  zero-cost structured loggingLog stage  zero-cost structured logging
Log stage zero-cost structured loggingMaksym Ratoshniuk
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overviewbtoddb
 
Viliam Ganz - Domain Specific Languages
Viliam Ganz - Domain Specific LanguagesViliam Ganz - Domain Specific Languages
Viliam Ganz - Domain Specific LanguagesDavinci software
 
Query hierarchical data the easy way, with CTEs
Query hierarchical data the easy way, with CTEsQuery hierarchical data the easy way, with CTEs
Query hierarchical data the easy way, with CTEsMariaDB plc
 
XML Schema and RELAX NG Element Comparison
XML Schema and RELAX NG Element ComparisonXML Schema and RELAX NG Element Comparison
XML Schema and RELAX NG Element ComparisonOverdue Books LLC
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureDr. Christian Betz
 
C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
C* Summit 2013: Time-Series Metrics with Cassandra by Mike HeffnerC* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
C* Summit 2013: Time-Series Metrics with Cassandra by Mike HeffnerDataStax Academy
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017HBaseCon
 

What's hot (14)

openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed world
 
"Scala in Goozy", Alexey Zlobin
"Scala in Goozy", Alexey Zlobin "Scala in Goozy", Alexey Zlobin
"Scala in Goozy", Alexey Zlobin
 
Log stage zero-cost structured logging
Log stage  zero-cost structured loggingLog stage  zero-cost structured logging
Log stage zero-cost structured logging
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
 
11 bytecode
11 bytecode11 bytecode
11 bytecode
 
Viliam Ganz - Domain Specific Languages
Viliam Ganz - Domain Specific LanguagesViliam Ganz - Domain Specific Languages
Viliam Ganz - Domain Specific Languages
 
Query hierarchical data the easy way, with CTEs
Query hierarchical data the easy way, with CTEsQuery hierarchical data the easy way, with CTEs
Query hierarchical data the easy way, with CTEs
 
XML Schema and RELAX NG Element Comparison
XML Schema and RELAX NG Element ComparisonXML Schema and RELAX NG Element Comparison
XML Schema and RELAX NG Element Comparison
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and Clojure
 
Clojure Small Intro
Clojure Small IntroClojure Small Intro
Clojure Small Intro
 
C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
C* Summit 2013: Time-Series Metrics with Cassandra by Mike HeffnerC* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017
 
Meet the-other-elephant
Meet the-other-elephantMeet the-other-elephant
Meet the-other-elephant
 
SAX PARSER
SAX PARSER SAX PARSER
SAX PARSER
 

Viewers also liked

Dare to be Digital 2012 - Information presentation
Dare to be Digital 2012 - Information presentation Dare to be Digital 2012 - Information presentation
Dare to be Digital 2012 - Information presentation Dare to be Digital
 
Pa yessy
Pa yessyPa yessy
Pa yessySJM
 
7 สามัญ อังกฤษ
7 สามัญ อังกฤษ7 สามัญ อังกฤษ
7 สามัญ อังกฤษWarangkana Singthong
 
Roshoman: The Truth About the Truth
Roshoman: The Truth About the Truth Roshoman: The Truth About the Truth
Roshoman: The Truth About the Truth Azzikorn
 
Shreya bhaveshreception airport
Shreya bhaveshreception airportShreya bhaveshreception airport
Shreya bhaveshreception airportdoshi15
 
ΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑ
ΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑ
ΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑEleni Papadopoulou
 
2nd Equitarian Workshop Logistics
2nd Equitarian Workshop Logistics2nd Equitarian Workshop Logistics
2nd Equitarian Workshop Logisticsequitarian
 
ituren eta zubieta inauteriak
ituren eta zubieta inauteriakituren eta zubieta inauteriak
ituren eta zubieta inauteriakIratxe Allende
 
L'onada perillosa
L'onada perillosaL'onada perillosa
L'onada perillosacarmeo
 
Movi moves
Movi movesMovi moves
Movi movesmiloherr
 

Viewers also liked (20)

Dare to be Digital 2012 - Information presentation
Dare to be Digital 2012 - Information presentation Dare to be Digital 2012 - Information presentation
Dare to be Digital 2012 - Information presentation
 
Pa yessy
Pa yessyPa yessy
Pa yessy
 
Gp
GpGp
Gp
 
Lantz inauteri
Lantz inauteriLantz inauteri
Lantz inauteri
 
Halo3 .pdf
Halo3 .pdfHalo3 .pdf
Halo3 .pdf
 
Front covers comparison
Front covers comparisonFront covers comparison
Front covers comparison
 
Maintenance Engineering
Maintenance EngineeringMaintenance Engineering
Maintenance Engineering
 
7 สามัญ อังกฤษ
7 สามัญ อังกฤษ7 สามัญ อังกฤษ
7 สามัญ อังกฤษ
 
Roshoman: The Truth About the Truth
Roshoman: The Truth About the Truth Roshoman: The Truth About the Truth
Roshoman: The Truth About the Truth
 
Ituren eta zubieta3
Ituren eta zubieta3Ituren eta zubieta3
Ituren eta zubieta3
 
IKT PROIEKTUA
IKT PROIEKTUAIKT PROIEKTUA
IKT PROIEKTUA
 
Lantz inauteri
Lantz inauteriLantz inauteri
Lantz inauteri
 
Shreya bhaveshreception airport
Shreya bhaveshreception airportShreya bhaveshreception airport
Shreya bhaveshreception airport
 
Amit kumar mishra
Amit kumar mishraAmit kumar mishra
Amit kumar mishra
 
ΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑ
ΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑ
ΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑ
 
2nd Equitarian Workshop Logistics
2nd Equitarian Workshop Logistics2nd Equitarian Workshop Logistics
2nd Equitarian Workshop Logistics
 
ituren eta zubieta inauteriak
ituren eta zubieta inauteriakituren eta zubieta inauteriak
ituren eta zubieta inauteriak
 
Gruppo_8_tirapelle_sean
Gruppo_8_tirapelle_seanGruppo_8_tirapelle_sean
Gruppo_8_tirapelle_sean
 
L'onada perillosa
L'onada perillosaL'onada perillosa
L'onada perillosa
 
Movi moves
Movi movesMovi moves
Movi moves
 

Similar to Introduction to Cassandra

On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache CassandraStu Hood
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUGStu Hood
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in productionvalstadsve
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache CassandraSaeid Zebardast
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Rust All Hands Winter 2011
Rust All Hands Winter 2011Rust All Hands Winter 2011
Rust All Hands Winter 2011Patrick Walton
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 
GBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APIGBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APISri Ambati
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfCédrick Lunven
 
Programming in scala - 1
Programming in scala - 1Programming in scala - 1
Programming in scala - 1Mukesh Kumar
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introductionShehaaz Saif
 

Similar to Introduction to Cassandra (20)

On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in production
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Rust All Hands Winter 2011
Rust All Hands Winter 2011Rust All Hands Winter 2011
Rust All Hands Winter 2011
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
GBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APIGBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O API
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
Programming in scala - 1
Programming in scala - 1Programming in scala - 1
Programming in scala - 1
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introduction
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Introduction to Cassandra

  • 1. Introduction to Cassandra Shimi Kiviti @shimi_k
  • 2. Motivation Scaling How do you scale your database? ● reads ● writes
  • 3.
  • 4. Influential Papers ● Bigtable: A distributed storage system for structured data, 2006 ● Dynamo: amazon's highly available key-value store, 2007 Cassandra: ● partition and replication - Dynamo ● log structure column family - Bigtable
  • 5. Cassandra Highlights ● Symmetric - all nodes are exactly the same ○ No single point of failure ○ Linearly scalable ○ Ease of administration ● High availability with multiple datacenters ● Consistency vs Latency ● Read/Write anywhere ● Flexible Schema ● Column TTL ● Distributed Counters
  • 6. DHT - Distributed Hash Table
  • 7. DHT ● O(1) node lookup ● Explicit replication ● Linear Scalability
  • 8.
  • 9. Consistency N = Replication factor R = Number of replicas to block when read <= N W = Number of replicas to block when write <= N Quorum = N/2 + 1 When W + R > N there is a full consistency examples: ● W = 1, R = N ● W = N, R = 1 ● W = Quorum, R = Quorum
  • 10. Consistency Level ● Every request defines consistency level ○ Any ○ One ○ Two ○ Three ○ Quorum ○ Local Quorum ○ Each Quorum ○ All
  • 11. Data Model ● Keyspace ~ schema ● ColumnFamilies ~ table ● Rows ● Columns
  • 12. Column Family Key1 Column Column Column Key2 Column Column
  • 13. Column Family ColumnFamily: { TOK: { chen: 1, ronen: 7 } CityPath: { yuval: 5 } }
  • 14. Super Column Family Super1 Column Column Column Key Super2 Column Column Column ColumnFamily: { Key: { super1: { name: value, name: value } super2: { name: value } } }
  • 15. Write ● Any node ● Partitioner ● Commit log, memtable ● Wait for W responses
  • 16. Write
  • 17. Write ● No reads ● No seeks ● Sequential disk access ● Atomic within a column family ● Fast ● Always writeable (hinted hand-off)
  • 18. Read ● Choose any node ● Partitioner ● Wait for R responses ● tunable read repair in the background
  • 19. Read Read can be from multiple SSTables Slower then writes
  • 20. Cache ● There is no need to use memcached ● There is an internal configurable cache ○ Key cache ○ Row cache
  • 21. Sorting When you preform get the result is sorted ● Rows are sorted according to the partitioner ● Columns in a row are sorted according to the type of the column name
  • 22. Partitioner ● RandomPartitioner - Uses hash values as tokens. useful for distributing the load on all nodes. If you use it, set the nodes tokens manually ● OrderPreservePartioner - You can get sorted rows but it will cost you with an even cluster
  • 23. Column Types Available types: ● Bytes ● UTF8 ● Ascii ● Long ● Date ● UUID ● Composite - <Type1>:<Type2>
  • 24. Column Types Examples: Sort1: 8 10 9 vs 8 10 9 Sort2: dan:8 dan:10 dan:10 vs dan:8 shimi:1 shimi:1
  • 25. Clients ● Thrift - Cassandra driver level interface ● CQL - Cassandra query language (SQL like) ● High level clients: ○ Python ○ Java ○ Scala ○ Clojure ○ .Net ○ Ruby ○ PHP ○ Perl ○ C++ ○ Haskel
  • 26. Cascal - Scala client Insert column: session.insert("app" "users" "shimi" "passwd" "mypass") val key = "app" "users" "shimi" session.insert(key "email" "shimi.k@...") Get column value: val pass = session.get(key "passwd")
  • 27. Cascal Get multiple columns: val row = session.list(key) val cols = session.list(key, RangePredicate("email", "passwd")) val cols = session.list(key, ColumnPredicate( List("passwd", "email") ))
  • 28. Cascal Get multiple rows: val family = "app" "users" val rows = session.list(family, RangePredicate("dan", "shimi")) val rows = session.list(family, KeyPrdicate("dan", "shimi"))
  • 29. Cascal Remove column: session.remove("app" "users" "shimi" "passwd") Remove row: session.remove("app" "users" "shimi") Batch operations: val deleteCols = Delete(key, ColumnPredicate("age" :: "sex")) val insertEmail = Insert(key "email" "shimi.k@...") session.batch(insertEmail :: deleteCols)
  • 30. Guidelines ● Keep together the data you query together ● Think about your use case and how you should fetch your data. ● Don't try to normalize your data ● You can't win the disk ● Be ready to get your hands dirty ● There is no single solution for everything. You might consider using different solutions together
  • 31. The End Useful links: ● Cassandra, http://cassandra.apache.org/ ● Wiki http://wiki.apache.org/cassandra/ ● Cassandra mailing list ● IRC ● Bigtable, http://labs.google.com/papers/bigtable.html ● Dynamo http://www.allthingsdistributed. com/2007/10/amazons_dynamo.html ● Cascal, https://github.com/shimi/cascal