SlideShare une entreprise Scribd logo
1  sur  15
Cassandra – A Decentralized Structured Storage System Gemini Mobile Technologies, Inc. NOSQL Tokyo Reading Group (http://nosqlsummer.org/city/tokyo) August 25, 2010 Tags: #cassandra #nosql 2010/8/23 Gemini Mobile Technologies, Inc. 1
Cassandra: A Decentralized Structured Storage System Authors: AvinashLakshman, PrashantMalik. Abstract:  Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. … Appeared in:3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware, 2009. http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 2
1. Introduction and 2. Related Work Facebook inbox search: Enables users to search through their inbox. Launched 6/2008.  Highly scalable: 250M users. Tolerant for server/network failures. Very high write throughput: “billions of writes per day”. Replicate data across data centers. Related Work Distributed file systems: Ficus, Coda, Farsite, GFS, Bayou. Storage systems: Dynamo, Bigtable. “The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.” 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 3
3. Data Model Multi-level Index: Table:  Set of rows Key:  Identifies the row Key is arbitrary byte[]. Each row can contain a variable number of columns/CFs.    No need for rows to contain same columns/CFs. Each row can contain millions of columns/CFs Atomic operations per key per replica. 3.  ColumnName: Identify the column value(s). Can be either “Column”, “ColumnFamily”, “ColumnFamily:Column”, “ColumnFamily:ColumnFamily”, etc. ColumnFamily (CF) is a group of Columns. CFs and Columns are sorted.  Time-based or name-based. Columns can be added/deleted efficiently during run-time. 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 4
Data Model Example: Inbox Search 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 5 Query: Find all messages of user3 with “hello”. Get(UserMessages, “user3”, “term:hello”) Table: UserMessages Key:<userid> CF:”term” CF: <word> Name:<timestamp>  Val:<messageID> “term” user3 “hello” “how” “you” time4 time12 time4 time4 time12 time1 msg10 msg81 msg10 msg10 msg81 msg03
4. API Simple get/put operations: Insert(table, key, rowMutation) Single columns, Multiple columns, Batch of multiple keys. Get(table, key, columnName) Key: Single key or key range. columnName: “Slice” range or name. Delete(table, key, columnName) Also, specify Consistency Level. 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 6
5. System Architecture Data partitioned to subset of nodes: Consistent Hashing Data replicated to multiple nodes for redundancy, performance: 	Quorum using “preference list” of nodes Node management: Membership algorithm to know which nodes are up/down. “Accrual failure detection + Gossip” Bootstrapping to add node. Manual operation + “Seed” nodes 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 7 Consistent Hash NodeA NodeC NodeD Gossip NodeB
5.1 Partitioning Algorithm: Consistent Hashing Each node is assigned a random position on ring. Key k is hashed to fixed circular space. Nodes are assigned by walking clockwise from hash location. Example:  Nodes A, B, C, D, E, F, G assigned to ring. Hash(k) is between A and B. Since 3 replicas, choose next 3 nodes on ring (i.e., B, C, D). 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 8 Hash(k) A Node assignment B G C F D E
5.1 Consistent Hashing Key advantage:  Adding, deleting, re-allocating nodes is cheap.  It affects only immediate neighbor node keys. Hash function Locality Load distribution. Load-balancing by moving nodes toward heavily-loaded nodes. 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 9
5.2 Replication Each data item is replicated at multiple nodes (N). Each key is assigned to a “coordinator” node by consistent hash function. “Coordinator” node replicates the key to an additional N-1 nodes. “Consistency Level” is set by client per read/write request. ZERO, ONE, ALL, ANY, QUORUM Zookeeper used to elect leader node and distribute “preference list” Leader node owns “preference list” that maps key to node list. 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 10
5.3 Membership Each node locally determines if any other node in the system is up/down. Φ (phi) Accrual Failure Detector Instead of boolean value (up or down), compute a numeric value Φ representing suspicion level for each monitored nodes. Φ is computed using inter-arrival times of gossip messages from other nodes in the cluster. If Φ exceeds a particular threshold, then node is considered as “down”. In experiment of 100 nodes with threshold of 5, average time to detect failure: 15 seconds. 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 11
5.4 Bootstrapping, 5.5 Scaling the Cluster New nodes check configuration for “seed” nodes to get initial gossip data like “preference” lists. Add/remove of nodes is not done automatically.  Requires manual command-line operation. New node needs to have data moved to it from other nodes.  Operationally, 40MB/s.  Working to improve this by copying data from multiple replicas a la BitTorrent. 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 12
5.6 Local Persistence 2010/8/23 Gemini Mobile Technologies, Inc.  All rights reserved. 13 READ ,[object Object]
 Bloom filter to reduce SSTable access
 Check SSTables in time-orderIn-Memory Table ,[object Object],Commit Log SS Table SS Table SS Table WRITE SSTable ,[object Object]

Contenu connexe

Tendances

Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloud
Bharat Rane
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
Hiroshi Ono
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)
Rim Moussa
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
Marin Dimitrov
 

Tendances (20)

Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloud
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
The Google Bigtable
The Google BigtableThe Google Bigtable
The Google Bigtable
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systems
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
Communication model of parallel platforms
Communication model of parallel platformsCommunication model of parallel platforms
Communication model of parallel platforms
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 

En vedette

Lielie lasīšanas svētki
Lielie lasīšanas svētkiLielie lasīšanas svētki
Lielie lasīšanas svētki
Valmibibl
 
ELR Rad Waste Article Werner
ELR Rad Waste Article WernerELR Rad Waste Article Werner
ELR Rad Waste Article Werner
Jim Werner
 
Trollisi mumini aicina ciemos
Trollisi mumini aicina ciemosTrollisi mumini aicina ciemos
Trollisi mumini aicina ciemos
Valmibibl
 
吉祥经2
吉祥经2吉祥经2
吉祥经2
suminch
 
Welcome back learning
Welcome back learningWelcome back learning
Welcome back learning
papeeler
 
Chelyabinsk russia remarks 05 20-92
Chelyabinsk russia remarks 05 20-92Chelyabinsk russia remarks 05 20-92
Chelyabinsk russia remarks 05 20-92
Jim Werner
 
Xbrl dimension a primer nirmal ghorawat
Xbrl dimension a primer nirmal ghorawatXbrl dimension a primer nirmal ghorawat
Xbrl dimension a primer nirmal ghorawat
Nirmal Ghorawat
 

En vedette (20)

Lielie lasīšanas svētki
Lielie lasīšanas svētkiLielie lasīšanas svētki
Lielie lasīšanas svētki
 
Cassandra: a NoSQL storage system
Cassandra: a NoSQL storage system Cassandra: a NoSQL storage system
Cassandra: a NoSQL storage system
 
Melokalisasi dan mengisolasi daerah permasalahan
Melokalisasi dan mengisolasi daerah permasalahanMelokalisasi dan mengisolasi daerah permasalahan
Melokalisasi dan mengisolasi daerah permasalahan
 
Opening at cloudian seminar 2012
Opening at cloudian seminar 2012Opening at cloudian seminar 2012
Opening at cloudian seminar 2012
 
Cloudian closing remarks at cloudian seminar 2013
Cloudian closing remarks at cloudian seminar 2013Cloudian closing remarks at cloudian seminar 2013
Cloudian closing remarks at cloudian seminar 2013
 
How to Distribute, Store and Version Models with EMFStore
How to Distribute, Store and Version Models with EMFStoreHow to Distribute, Store and Version Models with EMFStore
How to Distribute, Store and Version Models with EMFStore
 
ELR Rad Waste Article Werner
ELR Rad Waste Article WernerELR Rad Waste Article Werner
ELR Rad Waste Article Werner
 
Trollisi mumini aicina ciemos
Trollisi mumini aicina ciemosTrollisi mumini aicina ciemos
Trollisi mumini aicina ciemos
 
Digital collaborative accounting
Digital collaborative accounting Digital collaborative accounting
Digital collaborative accounting
 
吉祥经2
吉祥经2吉祥经2
吉祥经2
 
Welcome back learning
Welcome back learningWelcome back learning
Welcome back learning
 
Ipadrevolution 100721204445-phpapp02
Ipadrevolution 100721204445-phpapp02Ipadrevolution 100721204445-phpapp02
Ipadrevolution 100721204445-phpapp02
 
NTT Com at Cloudian seminar 2012
NTT Com at Cloudian seminar 2012NTT Com at Cloudian seminar 2012
NTT Com at Cloudian seminar 2012
 
Submate Pitch TWiST Paris Meetup
Submate Pitch TWiST Paris MeetupSubmate Pitch TWiST Paris Meetup
Submate Pitch TWiST Paris Meetup
 
Chelyabinsk russia remarks 05 20-92
Chelyabinsk russia remarks 05 20-92Chelyabinsk russia remarks 05 20-92
Chelyabinsk russia remarks 05 20-92
 
Idaho
IdahoIdaho
Idaho
 
Idaho
IdahoIdaho
Idaho
 
Puerto rico
Puerto ricoPuerto rico
Puerto rico
 
Xbrl dimension a primer nirmal ghorawat
Xbrl dimension a primer nirmal ghorawatXbrl dimension a primer nirmal ghorawat
Xbrl dimension a primer nirmal ghorawat
 
Apresentação ShareNext
Apresentação ShareNextApresentação ShareNext
Apresentação ShareNext
 

Similaire à Summary of "Cassandra" for 3rd nosql summer reading in Tokyo

Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Kiruthikak14
 

Similaire à Summary of "Cassandra" for 3rd nosql summer reading in Tokyo (20)

Summary of "Amazon's Dynamo" for the 2nd nosql summer reading in Tokyo
Summary of "Amazon's Dynamo" for the 2nd nosql summer reading in TokyoSummary of "Amazon's Dynamo" for the 2nd nosql summer reading in Tokyo
Summary of "Amazon's Dynamo" for the 2nd nosql summer reading in Tokyo
 
Summary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in TokyoSummary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in Tokyo
 
Cassandra
CassandraCassandra
Cassandra
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
 
Cassandra no sql ecosystem
Cassandra no sql ecosystemCassandra no sql ecosystem
Cassandra no sql ecosystem
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
 
Cassandra advanced part-ll
Cassandra advanced part-llCassandra advanced part-ll
Cassandra advanced part-ll
 
Chapter3 ec2 and usage.ppt
Chapter3 ec2 and usage.pptChapter3 ec2 and usage.ppt
Chapter3 ec2 and usage.ppt
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
 
AWS Summit 2018 Summary
AWS Summit 2018 SummaryAWS Summit 2018 Summary
AWS Summit 2018 Summary
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Os9
Os9Os9
Os9
 
A Holistic Approach to Addressing the Cloud's Paradox of Choice
A Holistic Approach to Addressing the Cloud's Paradox of ChoiceA Holistic Approach to Addressing the Cloud's Paradox of Choice
A Holistic Approach to Addressing the Cloud's Paradox of Choice
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 

Plus de CLOUDIAN KK

Plus de CLOUDIAN KK (20)

CLOUDIAN HYPERSTORE - 風林火山ストレージ
CLOUDIAN HYPERSTORE - 風林火山ストレージCLOUDIAN HYPERSTORE - 風林火山ストレージ
CLOUDIAN HYPERSTORE - 風林火山ストレージ
 
クラウディアンのご紹介
クラウディアンのご紹介クラウディアンのご紹介
クラウディアンのご紹介
 
IoT/ビッグデータ/AI連携により次世代ストレージが促進するビジネス変革
IoT/ビッグデータ/AI連携により次世代ストレージが促進するビジネス変革IoT/ビッグデータ/AI連携により次世代ストレージが促進するビジネス変革
IoT/ビッグデータ/AI連携により次世代ストレージが促進するビジネス変革
 
CLOUDIAN Presentation at VERITAS VISION in Tokyo
CLOUDIAN Presentation at VERITAS VISION in TokyoCLOUDIAN Presentation at VERITAS VISION in Tokyo
CLOUDIAN Presentation at VERITAS VISION in Tokyo
 
S3 API接続検証プログラムのご紹介
S3 API接続検証プログラムのご紹介S3 API接続検証プログラムのご紹介
S3 API接続検証プログラムのご紹介
 
Auto tiering and Versioning of CLOUDIAN HyperStore
Auto tiering and Versioning of CLOUDIAN HyperStoreAuto tiering and Versioning of CLOUDIAN HyperStore
Auto tiering and Versioning of CLOUDIAN HyperStore
 
AWS SDK for Python and CLOUDIAN HyperStore
AWS SDK for Python and CLOUDIAN HyperStoreAWS SDK for Python and CLOUDIAN HyperStore
AWS SDK for Python and CLOUDIAN HyperStore
 
AWS CLI and CLOUDIAN HyperStore
AWS CLI and CLOUDIAN HyperStoreAWS CLI and CLOUDIAN HyperStore
AWS CLI and CLOUDIAN HyperStore
 
ZiDOMA data and CLOUDIAN HyperStore
ZiDOMA data and CLOUDIAN HyperStoreZiDOMA data and CLOUDIAN HyperStore
ZiDOMA data and CLOUDIAN HyperStore
 
FOBAS CSC and CLOUDIAN HyperStore
FOBAS CSC and CLOUDIAN HyperStoreFOBAS CSC and CLOUDIAN HyperStore
FOBAS CSC and CLOUDIAN HyperStore
 
ARCserve backup and CLOUDIAN HyperStore
ARCserve backup and CLOUDIAN HyperStoreARCserve backup and CLOUDIAN HyperStore
ARCserve backup and CLOUDIAN HyperStore
 
Cloudian presentation at idc japan sv2016
Cloudian presentation at idc japan sv2016Cloudian presentation at idc japan sv2016
Cloudian presentation at idc japan sv2016
 
ITコアを刷新するハイブリッドクラウド型ITシステム
ITコアを刷新するハイブリッドクラウド型ITシステムITコアを刷新するハイブリッドクラウド型ITシステム
ITコアを刷新するハイブリッドクラウド型ITシステム
 
【FOBAS】Data is money. ストレージ分散投資のススメ
【FOBAS】Data is money. ストレージ分散投資のススメ【FOBAS】Data is money. ストレージ分散投資のススメ
【FOBAS】Data is money. ストレージ分散投資のススメ
 
【ARI】ストレージのコスト・利便性・非機能要求項目を徹底比較
【ARI】ストレージのコスト・利便性・非機能要求項目を徹底比較【ARI】ストレージのコスト・利便性・非機能要求項目を徹底比較
【ARI】ストレージのコスト・利便性・非機能要求項目を徹底比較
 
【SIS】オブジェクトストレージを活用した増え続ける長期保管データの運用の効率化
【SIS】オブジェクトストレージを活用した増え続ける長期保管データの運用の効率化【SIS】オブジェクトストレージを活用した増え続ける長期保管データの運用の効率化
【SIS】オブジェクトストレージを活用した増え続ける長期保管データの運用の効率化
 
【CLOUDIAN】コード化されたインフラの実装
【CLOUDIAN】コード化されたインフラの実装【CLOUDIAN】コード化されたインフラの実装
【CLOUDIAN】コード化されたインフラの実装
 
【CLOUDIAN】自動階層化による現有ストレージ活用術
【CLOUDIAN】自動階層化による現有ストレージ活用術【CLOUDIAN】自動階層化による現有ストレージ活用術
【CLOUDIAN】自動階層化による現有ストレージ活用術
 
【CLOUDIAN】秒間隔RPO(目標復旧時点)の実現
【CLOUDIAN】秒間隔RPO(目標復旧時点)の実現【CLOUDIAN】秒間隔RPO(目標復旧時点)の実現
【CLOUDIAN】秒間隔RPO(目標復旧時点)の実現
 
【Cloudian】FIT2015における会社製品紹介
【Cloudian】FIT2015における会社製品紹介【Cloudian】FIT2015における会社製品紹介
【Cloudian】FIT2015における会社製品紹介
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Summary of "Cassandra" for 3rd nosql summer reading in Tokyo

  • 1. Cassandra – A Decentralized Structured Storage System Gemini Mobile Technologies, Inc. NOSQL Tokyo Reading Group (http://nosqlsummer.org/city/tokyo) August 25, 2010 Tags: #cassandra #nosql 2010/8/23 Gemini Mobile Technologies, Inc. 1
  • 2. Cassandra: A Decentralized Structured Storage System Authors: AvinashLakshman, PrashantMalik. Abstract: Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. … Appeared in:3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware, 2009. http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 2
  • 3. 1. Introduction and 2. Related Work Facebook inbox search: Enables users to search through their inbox. Launched 6/2008. Highly scalable: 250M users. Tolerant for server/network failures. Very high write throughput: “billions of writes per day”. Replicate data across data centers. Related Work Distributed file systems: Ficus, Coda, Farsite, GFS, Bayou. Storage systems: Dynamo, Bigtable. “The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.” 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 3
  • 4. 3. Data Model Multi-level Index: Table: Set of rows Key: Identifies the row Key is arbitrary byte[]. Each row can contain a variable number of columns/CFs. No need for rows to contain same columns/CFs. Each row can contain millions of columns/CFs Atomic operations per key per replica. 3. ColumnName: Identify the column value(s). Can be either “Column”, “ColumnFamily”, “ColumnFamily:Column”, “ColumnFamily:ColumnFamily”, etc. ColumnFamily (CF) is a group of Columns. CFs and Columns are sorted. Time-based or name-based. Columns can be added/deleted efficiently during run-time. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 4
  • 5. Data Model Example: Inbox Search 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 5 Query: Find all messages of user3 with “hello”. Get(UserMessages, “user3”, “term:hello”) Table: UserMessages Key:<userid> CF:”term” CF: <word> Name:<timestamp> Val:<messageID> “term” user3 “hello” “how” “you” time4 time12 time4 time4 time12 time1 msg10 msg81 msg10 msg10 msg81 msg03
  • 6. 4. API Simple get/put operations: Insert(table, key, rowMutation) Single columns, Multiple columns, Batch of multiple keys. Get(table, key, columnName) Key: Single key or key range. columnName: “Slice” range or name. Delete(table, key, columnName) Also, specify Consistency Level. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 6
  • 7. 5. System Architecture Data partitioned to subset of nodes: Consistent Hashing Data replicated to multiple nodes for redundancy, performance: Quorum using “preference list” of nodes Node management: Membership algorithm to know which nodes are up/down. “Accrual failure detection + Gossip” Bootstrapping to add node. Manual operation + “Seed” nodes 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 7 Consistent Hash NodeA NodeC NodeD Gossip NodeB
  • 8. 5.1 Partitioning Algorithm: Consistent Hashing Each node is assigned a random position on ring. Key k is hashed to fixed circular space. Nodes are assigned by walking clockwise from hash location. Example: Nodes A, B, C, D, E, F, G assigned to ring. Hash(k) is between A and B. Since 3 replicas, choose next 3 nodes on ring (i.e., B, C, D). 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 8 Hash(k) A Node assignment B G C F D E
  • 9. 5.1 Consistent Hashing Key advantage: Adding, deleting, re-allocating nodes is cheap. It affects only immediate neighbor node keys. Hash function Locality Load distribution. Load-balancing by moving nodes toward heavily-loaded nodes. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 9
  • 10. 5.2 Replication Each data item is replicated at multiple nodes (N). Each key is assigned to a “coordinator” node by consistent hash function. “Coordinator” node replicates the key to an additional N-1 nodes. “Consistency Level” is set by client per read/write request. ZERO, ONE, ALL, ANY, QUORUM Zookeeper used to elect leader node and distribute “preference list” Leader node owns “preference list” that maps key to node list. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 10
  • 11. 5.3 Membership Each node locally determines if any other node in the system is up/down. Φ (phi) Accrual Failure Detector Instead of boolean value (up or down), compute a numeric value Φ representing suspicion level for each monitored nodes. Φ is computed using inter-arrival times of gossip messages from other nodes in the cluster. If Φ exceeds a particular threshold, then node is considered as “down”. In experiment of 100 nodes with threshold of 5, average time to detect failure: 15 seconds. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 11
  • 12. 5.4 Bootstrapping, 5.5 Scaling the Cluster New nodes check configuration for “seed” nodes to get initial gossip data like “preference” lists. Add/remove of nodes is not done automatically. Requires manual command-line operation. New node needs to have data moved to it from other nodes. Operationally, 40MB/s. Working to improve this by copying data from multiple replicas a la BitTorrent. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 12
  • 13.
  • 14. Bloom filter to reduce SSTable access
  • 15.
  • 19.
  • 20.
  • 21. Epilogue Active Apache project with good documentation: http://cassandra.apache.org/ http://wiki.apache.org/cassandra/ArticlesAndPresentations In use at companies like Digg, Facebook, Twitter, Reddit, Rackspace. Largest production cluster has over 100 TB data over 150 machines. 2010/8/23 Gemini Mobile Technologies, Inc. All rights reserved. 15