SlideShare une entreprise Scribd logo
1  sur  29
Cassandra
FUNdamentals Overview
Main points

Structured log storage

Columns ordered by name inside key

Rows ordered by hash of row key *

Column family storage

Fully distributed peer-to-peer

Partitioned by row key

Dynamo consistency
Structured log storage

No writes in place for you

JVM heap is reserved for memtables

Memtables are sorted

Memtables reach a specific size they are
flushed to disk
− Creates sstable file
− Bloom filter file
− Index file
Compaction

SSTables merged

Deleted columns physically removed

Two compaction strategies
− Sized
− LevelDB
Commit logs

Every write/delete operation goes to commit
log

If a node were to shutdown with un-flushed
memtables (every shutdown really)

Replay the commit logs
Columns ordered inside key

Cassandra likes wide rows
− Up to 2 billion
− (but not really would be a 32GB row)

set mystuff['ecapriolo']['a']='1'

set mystuff['ecapriolo']['b']='2'

set mystuff['ecapriolo']['c']='3'
...

slice mystuff['ecapriolo'] ['b'] ['g']
Rows ordered by hash of row key

All columns of row 'a1' on the same node

But all columns of row 'a2' may not be on
same node

Reduces hot spots

But there is no total ordering based on row
keys
Peer to Peer

Node list and token range is gossip-ed

Each node responsible for local storage and
requests

When a new node joins it take some token
range away from other nodes.
Ed Ed Ed
stacey stacey stacey
bob bob bob
Replication 3
Dynamo consistency

Operations have a requested Consistency
Level
− ONE
− QUORUM

CL nodes ack the operation before the user
receives ack

If an operation fails it is safe to retry *
Fully distributed. The good

Highly available

Redundant

Fault tolerant
Fully distributed! The bad

Locks

Counters

Tombstones

Consistency
Hadoop
Hadoop and Cassandra

ColumnFamilyInputFormat
− Takes a ColumnFamily as input
− Map(ByteBuffer[] key,
SortedMap<ByteBuffer,Column>

ColumnFamilyOutputFormat
− Writes out to a column family
− OutputFormat ByteBuffer,List<Mutation>
Hadoop optimizations

Tasks run with locality if c* and h same node

InputFormat can leverage c* secondary
indexes

OutputFormat can use bulk loader
− C* writes are helluva fast anyway
Hive and Cassandra

Hive support similar to the hbase handler
support

Create a hive table specifying properties
similar to those in map reduce

hive> CREATE EXTERNAL TABLE
Users(userid string, name string, email
string, phone string)
STORED BY
'org.apache.hadoop.hive.cassandra.Ca
ssandraStorageHandler' WITH
Other support out there

github.com/edwardcapriolo/hive-cassandra-
udfs
− Delete UDF
− Composite splitter/builder UDFS

Not very hard to roll your own input format
− OneRowInputFormat
− ListOfRowsInputFormat
Pig Cassandra

Nice support for pig/cassandra

Pigmalian library

But I don't use it
− Cause I use hive
− You should as well
− And get my book :)
Comparison between c*
and “other noSQL”

I know your talking about hbase :)

Cassandra does not store multiple versions of
column
− Last update wins
− Use UUID as part of column name instead

The row keys are not globally ordered *
− Unless you are using ByteOrderPartitioner (no one
should use this)
Comparison between c*
and “other noSQL”

Each c* replica actively servers reads & writes

Cassandra directly manages its storage

Shards are pre-defined tokens (no auto-split)

Qualifier/column name can NOT be null
Key Performance tips
Know your data

Design for the long tail scenarios
− With design x our largest customer will have
10000000000000 columns in one row

How large will this column family be in 5
months?

What is the request rate?

How random is the read pattern
Understanding write-once files

Deletes are writes that get compacted away
later

Can you optimize from blind writes?

What percent of your application is
update/insert?
Profiling / Dark Launch

Compression

Compaction strategy
Metrics

Collect the JMX information
− Column family
− Caches

Set milestone alerts (traps)
Hardware

Fast disk (you almost always want SSD)

RAM
− Caches, bloom filters, young gen

CPU
− Garbage collector, deserialization + compaction
needs cpu to work
Anti patterns

Using one row key as a queue

Doing N reads to satisfy a request

Read before write

Using collection support in place of wide rows

Encoding
Questions?

Contenu connexe

Tendances

IBM DB2 LUW UDB DBA Training by www.etraining.guru
IBM DB2 LUW UDB DBA Training by www.etraining.guruIBM DB2 LUW UDB DBA Training by www.etraining.guru
IBM DB2 LUW UDB DBA Training by www.etraining.guruRavikumar Nandigam
 
Meet Hadoop Family: part 4
Meet Hadoop Family: part 4Meet Hadoop Family: part 4
Meet Hadoop Family: part 4caizer_x
 
Archlinux install
Archlinux installArchlinux install
Archlinux installsambismo
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Cassandra South Bay Meetup - Backup And Restore For Apache Cassandra
Cassandra South Bay Meetup - Backup And Restore For Apache CassandraCassandra South Bay Meetup - Backup And Restore For Apache Cassandra
Cassandra South Bay Meetup - Backup And Restore For Apache Cassandraaaronmorton
 
Configuringahadoop
ConfiguringahadoopConfiguringahadoop
Configuringahadoopmensb
 
Basic command of hadoop
Basic command of hadoopBasic command of hadoop
Basic command of hadoopAhmad Kabeer
 
Research computing at ILRI
Research computing at ILRIResearch computing at ILRI
Research computing at ILRIILRI
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
 
Glusterfs session #9 index xlator
Glusterfs session #9   index xlatorGlusterfs session #9   index xlator
Glusterfs session #9 index xlatorPranith Karampuri
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoopFrank Y
 
Gluster dev session #3 xlator interface
Gluster dev session #3   xlator interfaceGluster dev session #3   xlator interface
Gluster dev session #3 xlator interfacePranith Karampuri
 
Glusterfs session #18 intro to fuse and its trade offs
Glusterfs session #18 intro to fuse and its trade offsGlusterfs session #18 intro to fuse and its trade offs
Glusterfs session #18 intro to fuse and its trade offsPranith Karampuri
 
Whirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIveWhirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIveEdward Capriolo
 
Glusterfs session #5 inode t, fd-t lifecycles
Glusterfs session #5   inode t, fd-t lifecyclesGlusterfs session #5   inode t, fd-t lifecycles
Glusterfs session #5 inode t, fd-t lifecyclesPranith Karampuri
 

Tendances (20)

IBM DB2 LUW UDB DBA Training by www.etraining.guru
IBM DB2 LUW UDB DBA Training by www.etraining.guruIBM DB2 LUW UDB DBA Training by www.etraining.guru
IBM DB2 LUW UDB DBA Training by www.etraining.guru
 
Meet Hadoop Family: part 4
Meet Hadoop Family: part 4Meet Hadoop Family: part 4
Meet Hadoop Family: part 4
 
Php dba cache
Php dba cachePhp dba cache
Php dba cache
 
Archlinux install
Archlinux installArchlinux install
Archlinux install
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Cassandra South Bay Meetup - Backup And Restore For Apache Cassandra
Cassandra South Bay Meetup - Backup And Restore For Apache CassandraCassandra South Bay Meetup - Backup And Restore For Apache Cassandra
Cassandra South Bay Meetup - Backup And Restore For Apache Cassandra
 
Xusage
XusageXusage
Xusage
 
Configuringahadoop
ConfiguringahadoopConfiguringahadoop
Configuringahadoop
 
Tablespaces
TablespacesTablespaces
Tablespaces
 
Basic command of hadoop
Basic command of hadoopBasic command of hadoop
Basic command of hadoop
 
Research computing at ILRI
Research computing at ILRIResearch computing at ILRI
Research computing at ILRI
 
Cassandra+Hadoop
Cassandra+HadoopCassandra+Hadoop
Cassandra+Hadoop
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
Glusterfs session #9 index xlator
Glusterfs session #9   index xlatorGlusterfs session #9   index xlator
Glusterfs session #9 index xlator
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoop
 
Cgghh
CgghhCgghh
Cgghh
 
Gluster dev session #3 xlator interface
Gluster dev session #3   xlator interfaceGluster dev session #3   xlator interface
Gluster dev session #3 xlator interface
 
Glusterfs session #18 intro to fuse and its trade offs
Glusterfs session #18 intro to fuse and its trade offsGlusterfs session #18 intro to fuse and its trade offs
Glusterfs session #18 intro to fuse and its trade offs
 
Whirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIveWhirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIve
 
Glusterfs session #5 inode t, fd-t lifecycles
Glusterfs session #5   inode t, fd-t lifecyclesGlusterfs session #5   inode t, fd-t lifecycles
Glusterfs session #5 inode t, fd-t lifecycles
 

En vedette

Cassandra 1.2 by Eddie Satterly
Cassandra 1.2 by Eddie SatterlyCassandra 1.2 by Eddie Satterly
Cassandra 1.2 by Eddie SatterlyDataStax Academy
 
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break GlassCassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glassaaronmorton
 
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"DataStax Academy
 
Jvm operation casual talks
Jvm operation casual talksJvm operation casual talks
Jvm operation casual talksoranie Narut
 
Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編Yuki Morishita
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 

En vedette (7)

Cassandra 1.2 by Eddie Satterly
Cassandra 1.2 by Eddie SatterlyCassandra 1.2 by Eddie Satterly
Cassandra 1.2 by Eddie Satterly
 
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break GlassCassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
 
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
 
Jvm operation casual talks
Jvm operation casual talksJvm operation casual talks
Jvm operation casual talks
 
Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 
Book notes- Book-I & Book-III
Book notes- Book-I & Book-IIIBook notes- Book-I & Book-III
Book notes- Book-I & Book-III
 

Similaire à Cassandra4Hadoop

Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandrazznate
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
 
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...DataStax Academy
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityColin Charles
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Md. Shohel Rana
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at FacebookS S
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msJodok Batlogg
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis
 
MySQL for Oracle DBAs
MySQL for Oracle DBAsMySQL for Oracle DBAs
MySQL for Oracle DBAsMark Leith
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cqlzznate
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupMichael Wynholds
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache CassandraStu Hood
 
Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...
Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...
Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...PyData
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 

Similaire à Cassandra4Hadoop (20)

Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra Interoperability
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900ms
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
MySQL for Oracle DBAs
MySQL for Oracle DBAsMySQL for Oracle DBAs
MySQL for Oracle DBAs
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cql
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL Meetup
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...
Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...
Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
NoSQL Session II
NoSQL Session IINoSQL Session II
NoSQL Session II
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 

Plus de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Plus de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Dernier

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Cassandra4Hadoop

  • 3. Main points  Structured log storage  Columns ordered by name inside key  Rows ordered by hash of row key *  Column family storage  Fully distributed peer-to-peer  Partitioned by row key  Dynamo consistency
  • 4. Structured log storage  No writes in place for you  JVM heap is reserved for memtables  Memtables are sorted  Memtables reach a specific size they are flushed to disk − Creates sstable file − Bloom filter file − Index file
  • 5. Compaction  SSTables merged  Deleted columns physically removed  Two compaction strategies − Sized − LevelDB
  • 6. Commit logs  Every write/delete operation goes to commit log  If a node were to shutdown with un-flushed memtables (every shutdown really)  Replay the commit logs
  • 7. Columns ordered inside key  Cassandra likes wide rows − Up to 2 billion − (but not really would be a 32GB row)  set mystuff['ecapriolo']['a']='1'  set mystuff['ecapriolo']['b']='2'  set mystuff['ecapriolo']['c']='3' ...  slice mystuff['ecapriolo'] ['b'] ['g']
  • 8. Rows ordered by hash of row key  All columns of row 'a1' on the same node  But all columns of row 'a2' may not be on same node  Reduces hot spots  But there is no total ordering based on row keys
  • 9. Peer to Peer  Node list and token range is gossip-ed  Each node responsible for local storage and requests  When a new node joins it take some token range away from other nodes.
  • 10. Ed Ed Ed stacey stacey stacey bob bob bob Replication 3
  • 11. Dynamo consistency  Operations have a requested Consistency Level − ONE − QUORUM  CL nodes ack the operation before the user receives ack  If an operation fails it is safe to retry *
  • 12. Fully distributed. The good  Highly available  Redundant  Fault tolerant
  • 13. Fully distributed! The bad  Locks  Counters  Tombstones  Consistency
  • 15. Hadoop and Cassandra  ColumnFamilyInputFormat − Takes a ColumnFamily as input − Map(ByteBuffer[] key, SortedMap<ByteBuffer,Column>  ColumnFamilyOutputFormat − Writes out to a column family − OutputFormat ByteBuffer,List<Mutation>
  • 16. Hadoop optimizations  Tasks run with locality if c* and h same node  InputFormat can leverage c* secondary indexes  OutputFormat can use bulk loader − C* writes are helluva fast anyway
  • 17. Hive and Cassandra  Hive support similar to the hbase handler support  Create a hive table specifying properties similar to those in map reduce  hive> CREATE EXTERNAL TABLE Users(userid string, name string, email string, phone string) STORED BY 'org.apache.hadoop.hive.cassandra.Ca ssandraStorageHandler' WITH
  • 18. Other support out there  github.com/edwardcapriolo/hive-cassandra- udfs − Delete UDF − Composite splitter/builder UDFS  Not very hard to roll your own input format − OneRowInputFormat − ListOfRowsInputFormat
  • 19. Pig Cassandra  Nice support for pig/cassandra  Pigmalian library  But I don't use it − Cause I use hive − You should as well − And get my book :)
  • 20. Comparison between c* and “other noSQL”  I know your talking about hbase :)  Cassandra does not store multiple versions of column − Last update wins − Use UUID as part of column name instead  The row keys are not globally ordered * − Unless you are using ByteOrderPartitioner (no one should use this)
  • 21. Comparison between c* and “other noSQL”  Each c* replica actively servers reads & writes  Cassandra directly manages its storage  Shards are pre-defined tokens (no auto-split)  Qualifier/column name can NOT be null
  • 23. Know your data  Design for the long tail scenarios − With design x our largest customer will have 10000000000000 columns in one row  How large will this column family be in 5 months?  What is the request rate?  How random is the read pattern
  • 24. Understanding write-once files  Deletes are writes that get compacted away later  Can you optimize from blind writes?  What percent of your application is update/insert?
  • 25. Profiling / Dark Launch  Compression  Compaction strategy
  • 26. Metrics  Collect the JMX information − Column family − Caches  Set milestone alerts (traps)
  • 27. Hardware  Fast disk (you almost always want SSD)  RAM − Caches, bloom filters, young gen  CPU − Garbage collector, deserialization + compaction needs cpu to work
  • 28. Anti patterns  Using one row key as a queue  Doing N reads to satisfy a request  Read before write  Using collection support in place of wide rows  Encoding