Cassandra - Research Paper Overview

•Télécharger en tant que PPTX, PDF•

2 j'aime•4,124 vues

Cassandra is a decentralized structured storage system developed at Facebook to handle large amounts of structured data across many servers. It uses a distributed architecture with no single point of failure and dynamically replicates data across nodes for high availability. Cassandra uses a column-oriented data model and supports operations like insert, get, and delete. It partitions and distributes data using consistent hashing and handles failures through gossip-based cluster membership and an anti-entropy protocol.

Technologie

Cassandra
A Decentralized Structured Storage System
Avinash Lakshman Prashant Malik
Facebook Facebook
Presented by Sameera Nelson

Outline …
 Introduction
 Data Model
 System Architecture
 Bootstrapping & Scaling
 Local Persistence
 Conclusion

What is Cassandra ?
 Distributed Storage System
 Manages Structured Data
 Highly available , No SPoF
 Not a Relational Data Model
 Handle high write throughput
◦ No impact on read efficiency

Motivation
 Operational Requirements in Facebook
◦ Performance
◦ Reliability/ Dealing with Failures
◦ Efficiency
◦ Continues Growth
 Application
◦ Inbox Search Problem, Facebook

Related Work
 Google File System
◦ Distributed FS, Single master/Slave
 Ficus/ Coda
◦ Distributed FS
 Farsite
◦ Distributed FS, No centralized server
 Bayou
◦ Distributed Relational DB System
 Dynamo
◦ Distributed Storage system

Data Model
Figure from Eben Hewitt’s slides.

• Table
• Multidimensional map indexed by key
• Columns
• Grouped in to Column Families
• Simple
• Super (Nested Column Families)
• Column has
• Name/ Value/ Timestamp
Data Model

Supported Operations
 insert(table; key; rowMutation)
 get(table; key; columnName)
 delete(table; key; columnName)

Query Language
CREATE TABLE users
( user_id int PRIMARY KEY,
fname text,
lname text );
INSERT INTO users
(user_id, fname, lname) VALUES (1745, 'john',
'smith');
SELECT * FROM users;

Fully Distributed …
 No Single Point of Failure

Cassandra Architecture
 Partitioning
 Data distribution across nodes
 Replication
 Data duplication across nodes
 Cluster Membership
 Node management in cluster
 adding/ deleting

Partitioning
 Partitions using Consistent hashing

Partitioning
 Assignment in to the relevant partition

Replication
 Based on configured replication factor

Replication
 Different Replication Policies
◦ Rack Unaware
 Replicate at N-1 nodes
◦ Rack Aware
 Zookeeper, using a leader
◦ Data center Aware
 similar to Rack Aware, leader chosen at Datacenter
level.

Cluster Membership
 Based on scuttlebutt
 Efficient Gossip based mechanism
 Inspired for real life rumor spreading.
 Anti Entropy protocol
◦ Repair replicated data by comparing &
reconciling differences

Cluster Membership
 Failure Detection
◦ Accrual Failure Detector
If a node is faulty, the suspicion level increases.
Φ(t)  k as t  k
k - threshold variable
◦ If node is correct
Φ(t) = 0

Bootstrapping & Scaling
 Bootstrapping
◦ Node selects random token
◦ Locally persisted, gossiped to cluster
 Scaling
◦ Cassandra bootstrap algorithm initiated by
operator
◦ New node get a spitted range of heavily
loaded node

Local Persistence
 Write Operation
◦ Flush to disk after threshold
◦ Sequential Entries, Index per each
◦ Data file merging
◦ Rolling Commit logs

Local Persistence
 Read Operation
◦ Indexes all data on primary key
◦ Maintain column indices
Rea
d
Data

Conclusion
 Proven high scalability, performance, and
wide applicability
 Very high update throughput, delivering low
latency
 Future work
◦ Adding compression
◦ Support atomicity across keys
◦ Secondary index support

Contenu connexe

Tendances

Introduction to cassandraNguyen Quang

How to size up an Apache Cassandra cluster (Training)DataStax Academy

Cql – cassandra query languageCourtney Robinson

Cassandra an overviewPritamKathar

Apache Spark At Scale in the CloudDatabricks

Introduction to NoSQL DatabasesDerek Stainer

Introduction to Cassandra Architecturenickmbailey

Cassandra DatabaseYounesCharfaoui

Apache Spark ArchitectureAlexey Grishchenko

Apache Cassandra at the Geek2Geek BerlinChristian Johannsen

Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...DataStax

Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive

Top 5 Mistakes When Writing Spark ApplicationsSpark Summit

Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder

Intro to cassandraAaron Ploetz

Spark architectureGauravBiswas9

Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov

Introduction to Cassandra Basicsnickmbailey

NoSqlGirish Khanzode

Introduction to Cassandra: Replication and ConsistencyBenjamin Black

Tendances (20)

Introduction to cassandra

How to size up an Apache Cassandra cluster (Training)

Cql – cassandra query language

Cassandra an overview

Apache Spark At Scale in the Cloud

Introduction to NoSQL Databases

Introduction to Cassandra Architecture

Cassandra Database

Apache Spark Architecture

Apache Cassandra at the Geek2Geek Berlin

Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...

Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook

Top 5 Mistakes When Writing Spark Applications

Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2

Intro to cassandra

Spark architecture

Apache Spark in Depth: Core Concepts, Architecture & Internals

Introduction to Cassandra Basics

NoSql

Introduction to Cassandra: Replication and Consistency

En vedette

Application Development with Apache Cassandra as a ServiceWSO2

Cassandra - Deep Dive ...sameiralk

Data Presentations Cassandra SigmodJeff Hammerbacher

Cassandra Summit 2014: Apache Cassandra at Telefonica CBSDataStax Academy

Cassandra's Sweet Spot - an introduction to Apache CassandraDave Gardner

Rebooting Your Spiritual Operating SystemMark Humphries

Arduino Hackday: Rebooting Computingrebooting_computing

Faster and smaller inverted indices with Treaps Research Papersameiralk

NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10egpeters

Cassandra ProphecyIgor Khotin

Apache cassandra an introductionShehaaz Saif

NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10egpeters

Rebooting the smartcardEric Larcheveque

Cassandra devoxx 2010jbellis

Cassandra Pooja GV

Storm@Twitter, SIGMOD 2014Karthik Ramasamy

DataStax C*ollege Credit: What and Why NoSQL?DataStax

Cassandra presentation at NoSQLEvan Weaver

MongoDB at eBayMongoDB

Writing HTTP Middleware In GoShiju Varghese

En vedette (20)

Application Development with Apache Cassandra as a Service

Cassandra - Deep Dive ...

Data Presentations Cassandra Sigmod

Cassandra Summit 2014: Apache Cassandra at Telefonica CBS

Cassandra's Sweet Spot - an introduction to Apache Cassandra

Rebooting Your Spiritual Operating System

Arduino Hackday: Rebooting Computing

Faster and smaller inverted indices with Treaps Research Paper

NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10

Cassandra Prophecy

Apache cassandra an introduction

NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10

Rebooting the smartcard

Cassandra devoxx 2010

Cassandra

Storm@Twitter, SIGMOD 2014

DataStax C*ollege Credit: What and Why NoSQL?

Cassandra presentation at NoSQL

MongoDB at eBay

Writing HTTP Middleware In Go

Similaire à Cassandra - Research Paper Overview

5266732.ppthothyfa

6.1-Cassandra.pptyashsharma863914

6.1-Cassandra.pptDanBarcan2

Cassandrassuserbad56d

Apache ignite as in-memory computing platformSurinder Mehra

Cassandraexsuns

Strata NY 2018: The deconstructed databaseJulien Le Dem

Talk about apache cassandra, TWJUG 2011Boris Yen

Talk About Apache CassandraJacky Chu

Modeling data and best practices for the Azure Cosmos DB.Mohammad Asif

From flat files to deconstructed databaseJulien Le Dem

Cassandra trainingAndrás Fehér

Cassandra from the trenches: migrating Netflix (update)Jason Brown

PostgreSQL as an Alternative to MSSQLAlexei Krasner

Real time analytics using Hadoop and ElasticsearchAbhishek Andhavarapu

2. Lecture2_NOSQL_KeyValue.pptShaimaaMohamedGalal

NoSQL Introduction, Theory, ImplementationsFirat Atagun

The No SQL Principles and Basic Application Of Casandra ModelRishikese MR

Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian

Cassandra & Python - Springfield MO User GroupAdam Hutson

Similaire à Cassandra - Research Paper Overview (20)

5266732.ppt

6.1-Cassandra.ppt

Cassandra

Apache ignite as in-memory computing platform

Cassandra

Strata NY 2018: The deconstructed database

Talk about apache cassandra, TWJUG 2011

Talk About Apache Cassandra

Modeling data and best practices for the Azure Cosmos DB.

From flat files to deconstructed database

Cassandra training

Cassandra from the trenches: migrating Netflix (update)

PostgreSQL as an Alternative to MSSQL

Real time analytics using Hadoop and Elasticsearch

2. Lecture2_NOSQL_KeyValue.ppt

NoSQL Introduction, Theory, Implementations

The No SQL Principles and Basic Application Of Casandra Model

Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی

Cassandra & Python - Springfield MO User Group

Dernier

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Scale your database traffic with Read & Write split using MySQL RouterMydbops

Data governance with Unity Catalog PresentationKnoldus Inc.

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda

Sample pptx for embedding into website for demoHarshalMandlekar2

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

From Family Reminiscence to Scholarly Archive .Alan Dix

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll

Rise of the Machines: Known As Drones...Rick Flair

Dernier (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

DevEX - reference for building teams, processes, and platforms

Scale your database traffic with Read & Write split using MySQL Router

Data governance with Unity Catalog Presentation

Potential of AI (Generative AI) in Business: Learnings and Insights

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...

Sample pptx for embedding into website for demo

Long journey of Ruby standard library at RubyConf AU 2024

Testing tools and AI - ideas what to try with some tool examples

From Family Reminiscence to Scholarly Archive .

How AI, OpenAI, and ChatGPT impact business and software.

So einfach geht modernes Roaming fuer Notes und Nomad.pdf

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

What is DBT - The Ultimate Data Build Tool.pdf

TeamStation AI System Report LATAM IT Salaries 2024

Emixa Mendix Meetup 11 April 2024 about Mendix Native development

Rise of the Machines: Known As Drones...

Cassandra - Research Paper Overview

1. Cassandra A Decentralized Structured Storage System Avinash Lakshman Prashant Malik Facebook Facebook Presented by Sameera Nelson

2. Outline …  Introduction  Data Model  System Architecture  Bootstrapping & Scaling  Local Persistence  Conclusion

3. What is Cassandra ?  Distributed Storage System  Manages Structured Data  Highly available , No SPoF  Not a Relational Data Model  Handle high write throughput ◦ No impact on read efficiency

4. Motivation  Operational Requirements in Facebook ◦ Performance ◦ Reliability/ Dealing with Failures ◦ Efficiency ◦ Continues Growth  Application ◦ Inbox Search Problem, Facebook

5. Related Work  Google File System ◦ Distributed FS, Single master/Slave  Ficus/ Coda ◦ Distributed FS  Farsite ◦ Distributed FS, No centralized server  Bayou ◦ Distributed Relational DB System  Dynamo ◦ Distributed Storage system

6. Data Model

7. Data Model Figure from Eben Hewitt’s slides.

8. • Table • Multidimensional map indexed by key • Columns • Grouped in to Column Families • Simple • Super (Nested Column Families) • Column has • Name/ Value/ Timestamp Data Model

9. Supported Operations  insert(table; key; rowMutation)  get(table; key; columnName)  delete(table; key; columnName)

10. Query Language CREATE TABLE users ( user_id int PRIMARY KEY, fname text, lname text ); INSERT INTO users (user_id, fname, lname) VALUES (1745, 'john', 'smith'); SELECT * FROM users;

11. System Architecture

12. Fully Distributed …  No Single Point of Failure

13. Cassandra Architecture  Partitioning  Data distribution across nodes  Replication  Data duplication across nodes  Cluster Membership  Node management in cluster  adding/ deleting

14. Partitioning  The Token Ring

15. Partitioning  Partitions using Consistent hashing

16. Partitioning  Assignment in to the relevant partition

17. Replication  Based on configured replication factor

18. Replication  Different Replication Policies ◦ Rack Unaware  Replicate at N-1 nodes ◦ Rack Aware  Zookeeper, using a leader ◦ Data center Aware  similar to Rack Aware, leader chosen at Datacenter level.

19. Cluster Membership  Based on scuttlebutt  Efficient Gossip based mechanism  Inspired for real life rumor spreading.  Anti Entropy protocol ◦ Repair replicated data by comparing & reconciling differences

20. Cluster Membership Gossip Based

21. Cluster Membership  Failure Detection ◦ Accrual Failure Detector If a node is faulty, the suspicion level increases. Φ(t)  k as t  k k - threshold variable ◦ If node is correct Φ(t) = 0

22. Bootstrapping & Scaling

23. Bootstrapping & Scaling  Bootstrapping ◦ Node selects random token ◦ Locally persisted, gossiped to cluster  Scaling ◦ Cassandra bootstrap algorithm initiated by operator ◦ New node get a spitted range of heavily loaded node

24. Local Persistence

25. Local Persistence  Write Operation

26. Local Persistence  Write Operation ◦ Flush to disk after threshold ◦ Sequential Entries, Index per each ◦ Data file merging ◦ Rolling Commit logs

27. Local Persistence  Read Operation ◦ Indexes all data on primary key ◦ Maintain column indices Rea d Data

28. Conclusion

29. Conclusion  Proven high scalability, performance, and wide applicability  Very high update throughput, delivering low latency  Future work ◦ Adding compression ◦ Support atomicity across keys ◦ Secondary index support

30. Thank You

Cassandra - Research Paper Overview

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Cassandra - Research Paper Overview

Similaire à Cassandra - Research Paper Overview (20)

Dernier

Dernier (20)

Cassandra - Research Paper Overview