SlideShare une entreprise Scribd logo
1  sur  28
Cassandra  Jahangir Mohammed md.jahangir27@gmail.com
What is Cassandra? Distributed data store O(1) DHT Column-oriented Dynamo + Big Table
Why not RDBMS? Many-to-many relationships -> Joins -> Denormalization -> Multiple copies of data or redundancy Rigid schema Vertical scaling is easier than horizontal ACID, Distributed transaction, Two-phase commit Slower writes
CAP Theorem Consistency – All clients will read the same data at same time. Availability – Service always up and running. Partition tolerance – System on whole operates despite network issues.
Features Proven Rich data model Scalable Distributed & Decentralized Cross datacenter support High performance writes/reads No SPOF Schema free Tunable consistency
Limitations No ACID transactions(if needed) Eventually consistent(Tunable consistency, trade-off with performance)
ARCHITECTURE Ring Each node – unique token Tokens range from 0 to 2**127 Keys MD5 hash to determine node
RING h(key1) 0 1 N=3 B h(key2) A C F E D 1/2 9
ARCHITECTURE  P2P:  All nodes are identical No “master” node Gossip:  Protocol for intra-ring communication Each node have state information about other nodes Anti-entropy & Read Repair:  Replica synchronization mechanism Occurs during major compaction Uses Merkle trees
READ REPAIR Client Result Query Cassandra Cluster Read repair if digests differ Closest replica Result Replica A Digest Query Digest Response Digest Response Replica B Replica C
WRITE PATH Commit log: Responsible for all writes Memtable: In-memory data structure, written after commit log. SSTable:  Immutable table Memtable flushed to disk
WRITE PATH Key (CF1 , CF2 , CF3) ,[object Object]
 Number of Objects
 LifetimeMemtable ( CF1) Commit Log Binary serialized  Key ( CF1 , CF2 , CF3 ) Memtable ( CF2) FLUSH Memtable ( CF2) Data file on disk <Key name><Size of key Data><Index of columns/supercolumns>< Serialized column family>  --- --- --- --- <Key name><Size of key Data><Index of columns/supercolumns>< Serialized column family> Dedicated Disk
READ PATH
ARCHITECTURE Bloom filter: Performance booster Fast, nondeterministic algorithms In memory Used during read operation Tombstones: Deletion marker Soft delete Marker older than a set time, GC’ed
HINTED HANDOFF & COMPACTION Hinted Handoff: Node responsible down Coordinator creates hint Compaction: Merge SSTables. Keys merged Columns combined Tombstones discarded New index created
PARTITIONER Decides where row key(data) finds place in ring. Random Partitioner: MD5 hash Spreads keys evenly Inefficient range queries Order-Preserving Partitioner: Rows sorted
DATA MODEL Keyspace: Like Database.  Container for CFs. Column Family: Like Table(But, not exactly a relational database table). Container of rows. Row: Sorted collection of columns. Column: Basic unit of data structure. Triplet of name, value and timestamp
DATA MODEL Super Column: Special column. Sorted associative array of columns. Map of maps. Only one level deep. Super Column Family: Container of rows having super columns. 4-D DHT = Standard CF: [Keyspace][ColumnFamily][Key][Column]. 5-D DHT = Super CF: [Keyspace][ColumnFamily][Key][SuperColumn][SubColumn].
REPLICATION & CONSISTENCY Replication: No. of copies of data in the system. Consistency level: No. of replicas to respond.
WRITE
READ
REPLICA PLACEMENT STRATEGY	 Simple Strategy: Rack-Unaware Fast Single D.C.
SIMPLE STRATEGY
OLD NTS Rack-aware Same D.C.
NTS Rack-aware D.C. aware

Contenu connexe

Tendances

Understanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraUnderstanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraJason Brown
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsJulien Anguenot
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystemAlex Thompson
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax
 
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Gluster.org
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseCloudera, Inc.
 
NoSQL with Cassandra
NoSQL with CassandraNoSQL with Cassandra
NoSQL with CassandraGasol Wu
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsgrro
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraDataStax
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScyllaDB
 
Basic stuff You Need to Know about Cassandra
Basic stuff You Need to Know about CassandraBasic stuff You Need to Know about Cassandra
Basic stuff You Need to Know about CassandraYu-Chang Ho
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckDataStax Academy
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017Alex Robinson
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...DataStax
 
Keeping your files safe in the post-Snowden era with SXFS
Keeping your files safe in the post-Snowden era with SXFSKeeping your files safe in the post-Snowden era with SXFS
Keeping your files safe in the post-Snowden era with SXFSRobert Wojciechowski
 

Tendances (20)

4 technical-dns-workshop-day2
4 technical-dns-workshop-day24 technical-dns-workshop-day2
4 technical-dns-workshop-day2
 
Understanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraUnderstanding AntiEntropy in Cassandra
Understanding AntiEntropy in Cassandra
 
7 technical-dns-workshop-day3
7 technical-dns-workshop-day37 technical-dns-workshop-day3
7 technical-dns-workshop-day3
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBase
 
NoSQL with Cassandra
NoSQL with CassandraNoSQL with Cassandra
NoSQL with Cassandra
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
5 technical-dns-workshop-day3
5 technical-dns-workshop-day35 technical-dns-workshop-day3
5 technical-dns-workshop-day3
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
 
Basic stuff You Need to Know about Cassandra
Basic stuff You Need to Know about CassandraBasic stuff You Need to Know about Cassandra
Basic stuff You Need to Know about Cassandra
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Keeping your files safe in the post-Snowden era with SXFS
Keeping your files safe in the post-Snowden era with SXFSKeeping your files safe in the post-Snowden era with SXFS
Keeping your files safe in the post-Snowden era with SXFS
 

En vedette (20)

GAIVOTA - Julho e Agosto de 2010
GAIVOTA - Julho e Agosto de 2010GAIVOTA - Julho e Agosto de 2010
GAIVOTA - Julho e Agosto de 2010
 
Mitos griegos
Mitos griegosMitos griegos
Mitos griegos
 
Vive notre amitié
Vive notre amitiéVive notre amitié
Vive notre amitié
 
Jasmim
JasmimJasmim
Jasmim
 
Presentation, ICEM-SIIE, Aveiro 2011
Presentation, ICEM-SIIE, Aveiro 2011Presentation, ICEM-SIIE, Aveiro 2011
Presentation, ICEM-SIIE, Aveiro 2011
 
Tui itc presentación itc 2011
Tui itc presentación itc 2011Tui itc presentación itc 2011
Tui itc presentación itc 2011
 
Jandir
JandirJandir
Jandir
 
G2M Current Events - Bertha
G2M Current Events - BerthaG2M Current Events - Bertha
G2M Current Events - Bertha
 
Nombre de la rosa
Nombre de la rosaNombre de la rosa
Nombre de la rosa
 
East Los Angeles Project
East Los Angeles Project East Los Angeles Project
East Los Angeles Project
 
Clipping TheBridge 2015
Clipping TheBridge 2015Clipping TheBridge 2015
Clipping TheBridge 2015
 
Double sided equations
Double sided equationsDouble sided equations
Double sided equations
 
Dirce
DirceDirce
Dirce
 
Bhagvad Gita - 2 (English)
Bhagvad Gita - 2 (English)Bhagvad Gita - 2 (English)
Bhagvad Gita - 2 (English)
 
GISELA
GISELAGISELA
GISELA
 
Site-internet
Site-internetSite-internet
Site-internet
 
Barça ivett
Barça ivettBarça ivett
Barça ivett
 
Tajuk 8 Konsep Ketuhanan
Tajuk 8 Konsep KetuhananTajuk 8 Konsep Ketuhanan
Tajuk 8 Konsep Ketuhanan
 
Mitos griegos
Mitos griegosMitos griegos
Mitos griegos
 
Gertrude Simmons Bonnin
Gertrude Simmons BonninGertrude Simmons Bonnin
Gertrude Simmons Bonnin
 

Similaire à Cassandra

About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"Jihyun Ahn
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon RedshiftAmazon Web Services
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internalsnarsiman
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupAdam Hutson
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesAmazon Web Services
 
2017 AWS DB Day | Amazon Redshift 소개 및 실습
2017 AWS DB Day | Amazon Redshift  소개 및 실습2017 AWS DB Day | Amazon Redshift  소개 및 실습
2017 AWS DB Day | Amazon Redshift 소개 및 실습Amazon Web Services Korea
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for SysadminsNathan Milford
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDTony Rogerson
 
Cassandra consistency
Cassandra consistencyCassandra consistency
Cassandra consistencyzqhxuyuan
 
Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data eraBill GU
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQLPankaj Khattar
 
Basics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageBasics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageNilesh Salpe
 

Similaire à Cassandra (20)

Cassandra
CassandraCassandra
Cassandra
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
 
2017 AWS DB Day | Amazon Redshift 소개 및 실습
2017 AWS DB Day | Amazon Redshift  소개 및 실습2017 AWS DB Day | Amazon Redshift  소개 및 실습
2017 AWS DB Day | Amazon Redshift 소개 및 실습
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACID
 
Cassandra consistency
Cassandra consistencyCassandra consistency
Cassandra consistency
 
Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data era
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
 
Basics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageBasics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed Storage
 

Dernier

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Dernier (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Cassandra

  • 1. Cassandra Jahangir Mohammed md.jahangir27@gmail.com
  • 2. What is Cassandra? Distributed data store O(1) DHT Column-oriented Dynamo + Big Table
  • 3. Why not RDBMS? Many-to-many relationships -> Joins -> Denormalization -> Multiple copies of data or redundancy Rigid schema Vertical scaling is easier than horizontal ACID, Distributed transaction, Two-phase commit Slower writes
  • 4. CAP Theorem Consistency – All clients will read the same data at same time. Availability – Service always up and running. Partition tolerance – System on whole operates despite network issues.
  • 5.
  • 6. Features Proven Rich data model Scalable Distributed & Decentralized Cross datacenter support High performance writes/reads No SPOF Schema free Tunable consistency
  • 7. Limitations No ACID transactions(if needed) Eventually consistent(Tunable consistency, trade-off with performance)
  • 8. ARCHITECTURE Ring Each node – unique token Tokens range from 0 to 2**127 Keys MD5 hash to determine node
  • 9. RING h(key1) 0 1 N=3 B h(key2) A C F E D 1/2 9
  • 10. ARCHITECTURE P2P: All nodes are identical No “master” node Gossip: Protocol for intra-ring communication Each node have state information about other nodes Anti-entropy & Read Repair: Replica synchronization mechanism Occurs during major compaction Uses Merkle trees
  • 11. READ REPAIR Client Result Query Cassandra Cluster Read repair if digests differ Closest replica Result Replica A Digest Query Digest Response Digest Response Replica B Replica C
  • 12. WRITE PATH Commit log: Responsible for all writes Memtable: In-memory data structure, written after commit log. SSTable: Immutable table Memtable flushed to disk
  • 13.
  • 14. Number of Objects
  • 15. LifetimeMemtable ( CF1) Commit Log Binary serialized Key ( CF1 , CF2 , CF3 ) Memtable ( CF2) FLUSH Memtable ( CF2) Data file on disk <Key name><Size of key Data><Index of columns/supercolumns>< Serialized column family> --- --- --- --- <Key name><Size of key Data><Index of columns/supercolumns>< Serialized column family> Dedicated Disk
  • 17. ARCHITECTURE Bloom filter: Performance booster Fast, nondeterministic algorithms In memory Used during read operation Tombstones: Deletion marker Soft delete Marker older than a set time, GC’ed
  • 18. HINTED HANDOFF & COMPACTION Hinted Handoff: Node responsible down Coordinator creates hint Compaction: Merge SSTables. Keys merged Columns combined Tombstones discarded New index created
  • 19. PARTITIONER Decides where row key(data) finds place in ring. Random Partitioner: MD5 hash Spreads keys evenly Inefficient range queries Order-Preserving Partitioner: Rows sorted
  • 20. DATA MODEL Keyspace: Like Database. Container for CFs. Column Family: Like Table(But, not exactly a relational database table). Container of rows. Row: Sorted collection of columns. Column: Basic unit of data structure. Triplet of name, value and timestamp
  • 21. DATA MODEL Super Column: Special column. Sorted associative array of columns. Map of maps. Only one level deep. Super Column Family: Container of rows having super columns. 4-D DHT = Standard CF: [Keyspace][ColumnFamily][Key][Column]. 5-D DHT = Super CF: [Keyspace][ColumnFamily][Key][SuperColumn][SubColumn].
  • 22. REPLICATION & CONSISTENCY Replication: No. of copies of data in the system. Consistency level: No. of replicas to respond.
  • 23. WRITE
  • 24. READ
  • 25. REPLICA PLACEMENT STRATEGY Simple Strategy: Rack-Unaware Fast Single D.C.
  • 27. OLD NTS Rack-aware Same D.C.
  • 29. IMAGE REFERENCES Nathan Hurst’s Blog http://2.bp.blogspot.com/_YGilJHLjrrI/TJy3K0wshLI/AAAAAAAAAOI/ogAvf8Ckq3k/s1600/cassandra-ring2.png Sigmod presentation: Avinash et. al, Facebook Datastax http://answers.oreilly.com/topic/2408-replica-placement-strategies-when-using-cassandra/