SlideShare une entreprise Scribd logo
1  sur  59
High-order bits from Cassandra & Hadoop srisatishambati @srisatish
Thank You!  svccg in first page of search results for “cloud” on google!
NoSQL- Know your queries.
points Usecases Why cassandra? Usecase: Hadoop, Brisk FUD:Consistency  Why facebook is not using Cassandra? Anti-patterns Community, Code, Tools Q&A
Users. Netflix. Key by Customer, read-heavy Key by Customer:Movie, write-heavy
TimeSeries: (several customers) periodic readings:  dev0, dev1…deviceID:metric:timestamp ->value Metrics typically way larger dataset than users.
Why Cassandra?
Operational simplicity peer-to-peer
Operational simplicity peer-to-peer
Replication:  Multi-datacenter Multi-region ec2 Multi-availability zones
reads local dc1 dc2 Replication:  Multi-datacenter Multi-region ec2, aws Multi-availability zones
4.21.2011,  Amazon Web Services outage: “Movie marathons on Netflix awaiting AWS to come back up.”  #ec2disabled
4.21.2011,  Amazon Web Services outage: Netflix was running on AWS.
fast durable writes.  fast reads.
Writes Sequential, append-only. ~1-5ms
Writes Sequential, append-only. ~1-5ms On cloud: ephemeral disks rock!
Reads  Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized
Reads  Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized ssds: improved read performance!
Distribution between nodes  Gossip Anti-entropy Failure-detector L i g h t w e i g h t
Clients: cql, thrift pycassa, phpcassa  hector, pelops (scala, ruby, clojure)
Usecase #3: hadoop Hdfs cassandra hive Logs         stats          analytics
Brisk Truly peer-to-peer hadoop.
mv computation not data
Parallel Execution View
jobtracker, tasktracker hdfs: namenode, datanode
cloudera amazon: elastic map reduce hortonworks mapR brisk
Tools & Analytics  Hive, Pig, R Karmasphere Datameer … dozens of stealth startups!
Namenode decomposition, explained.
Use column families (tables) inode sblock
near-real time hadoop Low latency: cassandra_dc nodes Batch Analytics: brisk_dc nodes
FUD,  acronym: fear, uncertainty, doubt.
Consistency:  R + W > N     ORACLE, 2-node: R=1, W=2, N=2,(T=2) DNS * N is replication factor. Not to be confused with T=total #of nodes
Tune-able, flexibility. For High Consistency:   read:quorum, write:quorum For High Availability:  	high W, low R.
Inbox Search:  600+cores.120+TB (2008) Went from 100-500m users. Average NoSQL deployment size: ~6-12 nodes.
Usecase #5: search Apache Solr + Cassandra = Solandra Other inbox/file Searches: xobni, c3 github.com/tjake/solandra
“Eventual consistency is harder to program.” mostly immutable data. complex systems at scale.
Miscellaneous, Myth: data-loss, partial rows. writes are durable.
Anti-Patterns Transactions Joins Read before write
Anti-Patterns for cloud ebs jvm, virtualized single region
Three good reasons for Cassandra...
Tools AMIs, OpsCenter, DataStax AppDynamics Netflix just builds AMIs for deployment!
B e a u t i f u l   C   0   d   e = new code(); //less is more ~90k.java.concurrent.@annotate.  bloomfilters, merkletrees. non-blocking, staged-event-driven. bigtable, dynamo.
Current & Future Focus: Distributed Counters, CQL. Simple client. operational smoothening.  compaction.
Community Robust. Rapid. # Professional support from DataStax. Filesysteminnovatin from Acunu engineers: independent,startups, large companies, Rackspace, Twitter, Netflix.. Come join the efforts!
Usecase #4:  first NoSQL, then scale! simpledb  Cassandra mongodb Cassandra
Copyright: xkcd
Copyright: plantoys … more than one way to do it!
Summary - high scale peer-to-peer datastore best friend for  multi-region, multi-zone availability. Hadoop – HDFS engulfing the DataWorld
Q&A @srisatish
NoSQL- Know your queries.

Contenu connexe

Tendances

Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
DataStax
 

Tendances (20)

The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
 
Spark application on ec2 cluster
Spark application on ec2 clusterSpark application on ec2 cluster
Spark application on ec2 cluster
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
SSTable Reader Cassandra Day Denver 2014
SSTable Reader Cassandra Day Denver 2014SSTable Reader Cassandra Day Denver 2014
SSTable Reader Cassandra Day Denver 2014
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
 
Learn Cassandra at edureka!
Learn Cassandra at edureka!Learn Cassandra at edureka!
Learn Cassandra at edureka!
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
 
Up and running with pyspark
Up and running with pysparkUp and running with pyspark
Up and running with pyspark
 
Scylla db deck, july 2017
Scylla db deck, july 2017Scylla db deck, july 2017
Scylla db deck, july 2017
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
 
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the CloudTime to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
 
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 

Similaire à High order bits from cassandra & hadoop

Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011
sandeep_tata
 
Developing with Cassandra
Developing with CassandraDeveloping with Cassandra
Developing with Cassandra
Sperasoft
 

Similaire à High order bits from cassandra & hadoop (20)

Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Apache Cassandra Lunch #72: Databricks and Cassandra
Apache Cassandra Lunch #72: Databricks and CassandraApache Cassandra Lunch #72: Databricks and Cassandra
Apache Cassandra Lunch #72: Databricks and Cassandra
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011
 
Developing with Cassandra
Developing with CassandraDeveloping with Cassandra
Developing with Cassandra
 
Introduction to parallel iterative deep learning on hadoop’s next​ generation...
Introduction to parallel iterative deep learning on hadoop’s next​ generation...Introduction to parallel iterative deep learning on hadoop’s next​ generation...
Introduction to parallel iterative deep learning on hadoop’s next​ generation...
 
No Sql
No SqlNo Sql
No Sql
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with Cassandra
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Cassandra
CassandraCassandra
Cassandra
 

Plus de srisatish ambati

How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
srisatish ambati
 

Plus de srisatish ambati (11)

H2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business TransformationH2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business Transformation
 
Digital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open SourceDigital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open Source
 
Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Jvm goes big_data_sfjava
Jvm goes big_data_sfjavaJvm goes big_data_sfjava
Jvm goes big_data_sfjava
 
jvm goes to big data
jvm goes to big datajvm goes to big data
jvm goes to big data
 
Svccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandraSvccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandra
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

High order bits from cassandra & hadoop