SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Developing with Cassandra
In memory Key-Value:
Redis
CouchBase
File system
BerkeleyDB
Distributed file system
HDFS
Velocity
Volume
Variety
In memory NewSQL /
domain specific
VoltDB
Traditional
[SQL] RDBMS
MongoDB
Distributed DBMS
Cassandra
HBase
CAP theorem:
• Consistency
• Availability
• Partition tolerance
- pick two
Eventual consistency
Transactional?
• No
Choose the DB
Why select Cassandra?
• [relatively] easy to setup
• [relatively] easy to use
• ~zero routine ops
• it works (!!) as promised:
o real-time replication
o node/site failure recovery
o zero load writes
o double of nodes = double of speed
Because Cassandra is Fast!
But needs some time to deliver
• 12'000 WPS on a laptop
• ~0.1 / 1 ms constant latency for writes/reads
Check References:
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf
Scalability
Good For:
• log-like data
TTL helps
• massive writes
1M WPS enough?
• simple real-time analytics
Not So Good For:
• dump of junk
(consider HDFS)
• OLAP
(depends on "O")
Good and Not So Good
Distributed DBMS
Just DBMS - closed monolithic solution
o not a platform to run custom code (as MongoDB);
o not an extension (as HBase);
o highly optimized
No-master, eventually consistent
NoSQL
Data model - Key-Value
http://cassandra.apache.org
Apache Cassandra
Developed at Facebook for Inbox search
Released to open source in 2008
In use:
• Netflix - main non-content data store~500 Cassandra nodes (2012)
• eBay - recommendation system"dozens of nodes", 200 TB storage (2012)
• Twitter - tweet analysis100 + TB of data
• More clients: (http://www.datastax.com/cassandrausers)
History
1.0 - October 2011
1.1 - April 2012
1.2 - January 2013
2.0 - expected this summer (2013)
June 26 2013 - 158 bugs, 89 worth to notice
Sperasoft Experience:
• hit 1 bug in production (stability issue)
• hit 1 bug in QA (in a crafted case)
Mature & Agile
Apache .tar.gz and Debian
packageshttp://cassandra.apache.org/download/
DataStax DSC - Cassandra + OpsCenter
http://planetcassandra.org/Download/DataStaxCommunityEdition
Embedded – for funct. tests on Java apps
Maven
Documentation
http://wiki.apache.org/cassandra/
http://www.datastax.com/docs
Distributions
http://www.slideshare.net/planetcassandra/5-andy-cobley-raspberry-pi
CPU: ARM 700 MHz
RAM: 500 MB
Storage: SD card
Price: $25
200 WPS!
What Hardware?
Bare metal
CPU: 8 cores (4 works too)
RAM: 16 - 64 GB (min 8 GB)
Storage: rotating disks 3 - 5 TB total (SSD better)
VM works too, but...
Storage: local disks, avoid NAS
More on Hardware
Software Yes & No (production)
?
Software for Development: All Yes
Plus more: Java, Python, C#, PHP, Ruby, Clojure, Go, R, ...
KeyspaceKeyspace
Column
family
Column
family
~ RDBMS DB / Schema
~ RDBMS table
Key1
Key2
Value
Column
Row
Clustering key
Partitioning key Map < ... , Map < ... , ... > >
Data Model
Node 3Node 2Node 1
CFCFCF
1
2
3
Client
Parallel reads,writes
1
2
3
Data on Discs
https://github.com/datastax/java-driver
Client API Options
Thrift RPC Native protocol + CQL3
Apache Thrift Custom protocol
Synchronous Asynchronous
Schema-less Static schema
Store & Forward Cursors promised in 2.0
API for any language Java; Python, C# coming
Cryptic API JDBC-like API
Supported yet Going forward
• Forget RDB design principles
• Forget abstract data model
- shape data for queries
• No joins - materialized views
• Data duplication - OK
• Remember eventual consistency
• Queries are precious
• Use right data types - timestamp, uuid
Why? Because NoSQL is a low level tool for high optimization.
Data Modeling for NoSQL
Do & Don't
Patterns and Anti-patterns
Query
Wait
Read
Query
Wait
Read
Parallel it:
slo-o-ow
Sequential Read
CREATE TABLE timeline (
event uuid,
timestamp timeuuid,
...
PRIMARY KEY (event, timestamp)
);
event:date
timestamp
...
CREATE TABLE timeline (
event uuid,
date long,
timestamp timeuuid,
...
PRIMARY KEY ((event, date), timestamp)
);
event
timestamp
...
Still bad - need sharding
http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
Long rows - Cassandra handle 2G columns, but... slo-o-ow
Timeline
Insert = Update = Delete
A B C D1
Y1
Z1
1
A Y Z1
a b c d
UPDATE ... SET b = 'Y' WHERE id = 1
INSERT INTO ... SET (id, c) values (1, 'Z')
DELETE d FROM ... WHERE id = 1
SELECT * FROM ... WHERE id = 1
have to fetch 4 rows
slo-o-ow
Plan Data Immutable
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
Q
Queue: INSERT INTO ...
SET( name, enqueued_at, payload )
VALUES ( 'Q', now(), ... )
Dequeue: DELETE payload FROM ...
WHERE name = 'Q'
AND enqueued_at = ...
Pick the next: SELECT * FROM ...
WHERE id = 1 LIMIT 1
*Q
*Q
*Q
*Q
Q
Q
*? ? ?Q
have to fetch 4 rowsslo-o-ow
Queue
Remember - eventual consistency.
Concurrent updates
=> wrong count
SELECT count(*)
FROM ... WHERE .... ;
Full scan over the selection
=>
Default 10'000 rows limit
=> wrong count
Have an integer column and increment it
CREATE TABLE count_table (
id
uuid,
value counter,
PRIMARY KEY (id)
);
...
UPDATE count_table
SET value = value + 1
WHERE id = ... ;
Counter column family
http://www.datastax.com/documentation/cassandr
a/1.2/cassandra/cql_using/use_counter_t.html
slo-o-ow
mess
mess
How Many?
CREATE TABLE blob (
id uuid,
data blob,
PRIMARY KEY (id)
);
id
chunk_no
data
CREATE TABLE blob (
id uuid,
chunk_no int,
data blob,
PRIMARY KEY (id, chunk_no)
);
id data
http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage
http://wiki.apache.org/cassandra/CassandraLimitations
OutOfMemory
Blobs
Unbounded Queries
Result set
Client
OutOfMemory
OutOfMemory
Cursors. Planned to 2.0.
https://issues.apache.org/jira/browse/CASSANDRA-4415
C* clients use RPC yet
Cassandra
slo-o-ow
Column names - data... yet
Keep them short.
Node 3Node 1
CF
3
Client
2
Node 3Node 2
Why??
slo-o-ow
hot spots
Limit Client to a node
Helpful Links
Sperasoft @ slideshare: http://www.slideshare.net/Sperasoft
Sperasoft @ speakerdeck: https://speakerdeck.com/sperasoft
Sperasoft @ github: https://github.com/Sperasoft/Workshop

Contenu connexe

Tendances

DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big DataDataStax Academy
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsSamir Bessalah
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Cosmin Lehene
 
(SDD403) Amazon RDS for MySQL Deep Dive | AWS re:Invent 2014
(SDD403) Amazon RDS for MySQL Deep Dive | AWS re:Invent 2014(SDD403) Amazon RDS for MySQL Deep Dive | AWS re:Invent 2014
(SDD403) Amazon RDS for MySQL Deep Dive | AWS re:Invent 2014Amazon Web Services
 
Understanding DSE Search by Matt Stump
Understanding DSE Search by Matt StumpUnderstanding DSE Search by Matt Stump
Understanding DSE Search by Matt StumpDataStax
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonC* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonDataStax Academy
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in SparkShiao-An Yuan
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...DataStax
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
 
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...DataStax Academy
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
 
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!ScyllaDB
 
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014Amazon Web Services
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
 
Scylla Summit 2018: What's New in Scylla Manager?
Scylla Summit 2018: What's New in Scylla Manager?Scylla Summit 2018: What's New in Scylla Manager?
Scylla Summit 2018: What's New in Scylla Manager?ScyllaDB
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0J.B. Langston
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuRedis Labs
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataYan Wang
 

Tendances (20)

DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The Sequel
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark Jobs
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
 
(SDD403) Amazon RDS for MySQL Deep Dive | AWS re:Invent 2014
(SDD403) Amazon RDS for MySQL Deep Dive | AWS re:Invent 2014(SDD403) Amazon RDS for MySQL Deep Dive | AWS re:Invent 2014
(SDD403) Amazon RDS for MySQL Deep Dive | AWS re:Invent 2014
 
Understanding DSE Search by Matt Stump
Understanding DSE Search by Matt StumpUnderstanding DSE Search by Matt Stump
Understanding DSE Search by Matt Stump
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonC* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick Branson
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in Spark
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
 
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Scylla Summit 2018: What's New in Scylla Manager?
Scylla Summit 2018: What's New in Scylla Manager?Scylla Summit 2018: What's New in Scylla Manager?
Scylla Summit 2018: What's New in Scylla Manager?
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
 

En vedette

BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time CassandraAcunu
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012jbellis
 
Cassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APICassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APISimran Kedia
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Applicationsupertom
 
Cassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisCassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisDuyhai Doan
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparisonshsedghi
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceWSO2
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and CassandraStratio
 
NodeJS : Communication and Round Robin Way
NodeJS : Communication and Round Robin WayNodeJS : Communication and Round Robin Way
NodeJS : Communication and Round Robin WayEdureka!
 
69 claves para conocer Big Data
69 claves para conocer Big Data69 claves para conocer Big Data
69 claves para conocer Big DataStratebi
 

En vedette (12)

BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Cassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APICassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful API
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
Cassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisCassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS Paris
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and Cassandra
 
NodeJS : Communication and Round Robin Way
NodeJS : Communication and Round Robin WayNodeJS : Communication and Round Robin Way
NodeJS : Communication and Round Robin Way
 
69 claves para conocer Big Data
69 claves para conocer Big Data69 claves para conocer Big Data
69 claves para conocer Big Data
 

Similaire à Developing with Cassandra

Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at PollfishPollfish
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/KuduChris George
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScyllaDB
 
Quick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to knowQuick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to knowRafał Hryniewski
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016Tzach Livyatan
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKSkills Matter
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and BeyondSage Weil
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...DataStax Academy
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStackPuppet
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackke4qqq
 
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardwayDave Pitts
 
OSv at Cassandra Summit
OSv at Cassandra SummitOSv at Cassandra Summit
OSv at Cassandra SummitDon Marti
 
Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesYevgeniy Brikman
 
Under The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureUnder The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureScyllaDB
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"Giivee The
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developersChristopher Batey
 

Similaire à Developing with Cassandra (20)

Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 
Making KVS 10x Scalable
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x Scalable
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
 
Quick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to knowQuick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to know
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and Beyond
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStack
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
 
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
 
OSv at Cassandra Summit
OSv at Cassandra SummitOSv at Cassandra Summit
OSv at Cassandra Summit
 
Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modules
 
Under The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureUnder The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database Architecture
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers
 

Plus de Sperasoft

особенности работы с Locomotion в Unreal Engine 4
особенности работы с Locomotion в Unreal Engine 4особенности работы с Locomotion в Unreal Engine 4
особенности работы с Locomotion в Unreal Engine 4Sperasoft
 
концепт и архитектура геймплея в Creach: The Depleted World
концепт и архитектура геймплея в Creach: The Depleted Worldконцепт и архитектура геймплея в Creach: The Depleted World
концепт и архитектура геймплея в Creach: The Depleted WorldSperasoft
 
Опыт разработки VR игры для UE4
Опыт разработки VR игры для UE4Опыт разработки VR игры для UE4
Опыт разработки VR игры для UE4Sperasoft
 
Организация работы с UE4 в команде до 20 человек
Организация работы с UE4 в команде до 20 человек Организация работы с UE4 в команде до 20 человек
Организация работы с UE4 в команде до 20 человек Sperasoft
 
Gameplay Tags
Gameplay TagsGameplay Tags
Gameplay TagsSperasoft
 
Data Driven Gameplay in UE4
Data Driven Gameplay in UE4Data Driven Gameplay in UE4
Data Driven Gameplay in UE4Sperasoft
 
Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Sperasoft
 
The theory of relational databases
The theory of relational databasesThe theory of relational databases
The theory of relational databasesSperasoft
 
Automated layout testing using Galen Framework
Automated layout testing using Galen FrameworkAutomated layout testing using Galen Framework
Automated layout testing using Galen FrameworkSperasoft
 
Sperasoft talks: Android Security Threats
Sperasoft talks: Android Security ThreatsSperasoft talks: Android Security Threats
Sperasoft talks: Android Security ThreatsSperasoft
 
Sperasoft Talks: RxJava Functional Reactive Programming on Android
Sperasoft Talks: RxJava Functional Reactive Programming on AndroidSperasoft Talks: RxJava Functional Reactive Programming on Android
Sperasoft Talks: RxJava Functional Reactive Programming on AndroidSperasoft
 
Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft
 
Effective Мeetings
Effective МeetingsEffective Мeetings
Effective МeetingsSperasoft
 
Unreal Engine 4 Introduction
Unreal Engine 4 IntroductionUnreal Engine 4 Introduction
Unreal Engine 4 IntroductionSperasoft
 
JIRA Development
JIRA DevelopmentJIRA Development
JIRA DevelopmentSperasoft
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchSperasoft
 
MOBILE DEVELOPMENT with HTML, CSS and JS
MOBILE DEVELOPMENT with HTML, CSS and JSMOBILE DEVELOPMENT with HTML, CSS and JS
MOBILE DEVELOPMENT with HTML, CSS and JSSperasoft
 
Quick Intro Into Kanban
Quick Intro Into KanbanQuick Intro Into Kanban
Quick Intro Into KanbanSperasoft
 
ECMAScript 6 Review
ECMAScript 6 ReviewECMAScript 6 Review
ECMAScript 6 ReviewSperasoft
 
Console Development in 15 minutes
Console Development in 15 minutesConsole Development in 15 minutes
Console Development in 15 minutesSperasoft
 

Plus de Sperasoft (20)

особенности работы с Locomotion в Unreal Engine 4
особенности работы с Locomotion в Unreal Engine 4особенности работы с Locomotion в Unreal Engine 4
особенности работы с Locomotion в Unreal Engine 4
 
концепт и архитектура геймплея в Creach: The Depleted World
концепт и архитектура геймплея в Creach: The Depleted Worldконцепт и архитектура геймплея в Creach: The Depleted World
концепт и архитектура геймплея в Creach: The Depleted World
 
Опыт разработки VR игры для UE4
Опыт разработки VR игры для UE4Опыт разработки VR игры для UE4
Опыт разработки VR игры для UE4
 
Организация работы с UE4 в команде до 20 человек
Организация работы с UE4 в команде до 20 человек Организация работы с UE4 в команде до 20 человек
Организация работы с UE4 в команде до 20 человек
 
Gameplay Tags
Gameplay TagsGameplay Tags
Gameplay Tags
 
Data Driven Gameplay in UE4
Data Driven Gameplay in UE4Data Driven Gameplay in UE4
Data Driven Gameplay in UE4
 
Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks
 
The theory of relational databases
The theory of relational databasesThe theory of relational databases
The theory of relational databases
 
Automated layout testing using Galen Framework
Automated layout testing using Galen FrameworkAutomated layout testing using Galen Framework
Automated layout testing using Galen Framework
 
Sperasoft talks: Android Security Threats
Sperasoft talks: Android Security ThreatsSperasoft talks: Android Security Threats
Sperasoft talks: Android Security Threats
 
Sperasoft Talks: RxJava Functional Reactive Programming on Android
Sperasoft Talks: RxJava Functional Reactive Programming on AndroidSperasoft Talks: RxJava Functional Reactive Programming on Android
Sperasoft Talks: RxJava Functional Reactive Programming on Android
 
Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015
 
Effective Мeetings
Effective МeetingsEffective Мeetings
Effective Мeetings
 
Unreal Engine 4 Introduction
Unreal Engine 4 IntroductionUnreal Engine 4 Introduction
Unreal Engine 4 Introduction
 
JIRA Development
JIRA DevelopmentJIRA Development
JIRA Development
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
MOBILE DEVELOPMENT with HTML, CSS and JS
MOBILE DEVELOPMENT with HTML, CSS and JSMOBILE DEVELOPMENT with HTML, CSS and JS
MOBILE DEVELOPMENT with HTML, CSS and JS
 
Quick Intro Into Kanban
Quick Intro Into KanbanQuick Intro Into Kanban
Quick Intro Into Kanban
 
ECMAScript 6 Review
ECMAScript 6 ReviewECMAScript 6 Review
ECMAScript 6 Review
 
Console Development in 15 minutes
Console Development in 15 minutesConsole Development in 15 minutes
Console Development in 15 minutes
 

Dernier

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 

Dernier (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 

Developing with Cassandra

  • 2. In memory Key-Value: Redis CouchBase File system BerkeleyDB Distributed file system HDFS Velocity Volume Variety In memory NewSQL / domain specific VoltDB Traditional [SQL] RDBMS MongoDB Distributed DBMS Cassandra HBase CAP theorem: • Consistency • Availability • Partition tolerance - pick two Eventual consistency Transactional? • No Choose the DB
  • 3. Why select Cassandra? • [relatively] easy to setup • [relatively] easy to use • ~zero routine ops • it works (!!) as promised: o real-time replication o node/site failure recovery o zero load writes o double of nodes = double of speed
  • 4. Because Cassandra is Fast! But needs some time to deliver • 12'000 WPS on a laptop • ~0.1 / 1 ms constant latency for writes/reads
  • 6. Good For: • log-like data TTL helps • massive writes 1M WPS enough? • simple real-time analytics Not So Good For: • dump of junk (consider HDFS) • OLAP (depends on "O") Good and Not So Good
  • 7. Distributed DBMS Just DBMS - closed monolithic solution o not a platform to run custom code (as MongoDB); o not an extension (as HBase); o highly optimized No-master, eventually consistent NoSQL Data model - Key-Value http://cassandra.apache.org Apache Cassandra
  • 8. Developed at Facebook for Inbox search Released to open source in 2008 In use: • Netflix - main non-content data store~500 Cassandra nodes (2012) • eBay - recommendation system"dozens of nodes", 200 TB storage (2012) • Twitter - tweet analysis100 + TB of data • More clients: (http://www.datastax.com/cassandrausers) History
  • 9. 1.0 - October 2011 1.1 - April 2012 1.2 - January 2013 2.0 - expected this summer (2013) June 26 2013 - 158 bugs, 89 worth to notice Sperasoft Experience: • hit 1 bug in production (stability issue) • hit 1 bug in QA (in a crafted case) Mature & Agile
  • 10. Apache .tar.gz and Debian packageshttp://cassandra.apache.org/download/ DataStax DSC - Cassandra + OpsCenter http://planetcassandra.org/Download/DataStaxCommunityEdition Embedded – for funct. tests on Java apps Maven Documentation http://wiki.apache.org/cassandra/ http://www.datastax.com/docs Distributions
  • 11. http://www.slideshare.net/planetcassandra/5-andy-cobley-raspberry-pi CPU: ARM 700 MHz RAM: 500 MB Storage: SD card Price: $25 200 WPS! What Hardware?
  • 12. Bare metal CPU: 8 cores (4 works too) RAM: 16 - 64 GB (min 8 GB) Storage: rotating disks 3 - 5 TB total (SSD better) VM works too, but... Storage: local disks, avoid NAS More on Hardware
  • 13. Software Yes & No (production) ?
  • 14. Software for Development: All Yes Plus more: Java, Python, C#, PHP, Ruby, Clojure, Go, R, ...
  • 15. KeyspaceKeyspace Column family Column family ~ RDBMS DB / Schema ~ RDBMS table Key1 Key2 Value Column Row Clustering key Partitioning key Map < ... , Map < ... , ... > > Data Model
  • 16. Node 3Node 2Node 1 CFCFCF 1 2 3 Client Parallel reads,writes 1 2 3 Data on Discs
  • 17. https://github.com/datastax/java-driver Client API Options Thrift RPC Native protocol + CQL3 Apache Thrift Custom protocol Synchronous Asynchronous Schema-less Static schema Store & Forward Cursors promised in 2.0 API for any language Java; Python, C# coming Cryptic API JDBC-like API Supported yet Going forward
  • 18. • Forget RDB design principles • Forget abstract data model - shape data for queries • No joins - materialized views • Data duplication - OK • Remember eventual consistency • Queries are precious • Use right data types - timestamp, uuid Why? Because NoSQL is a low level tool for high optimization. Data Modeling for NoSQL
  • 19. Do & Don't Patterns and Anti-patterns
  • 21. CREATE TABLE timeline ( event uuid, timestamp timeuuid, ... PRIMARY KEY (event, timestamp) ); event:date timestamp ... CREATE TABLE timeline ( event uuid, date long, timestamp timeuuid, ... PRIMARY KEY ((event, date), timestamp) ); event timestamp ... Still bad - need sharding http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra Long rows - Cassandra handle 2G columns, but... slo-o-ow Timeline
  • 22. Insert = Update = Delete A B C D1 Y1 Z1 1 A Y Z1 a b c d UPDATE ... SET b = 'Y' WHERE id = 1 INSERT INTO ... SET (id, c) values (1, 'Z') DELETE d FROM ... WHERE id = 1 SELECT * FROM ... WHERE id = 1 have to fetch 4 rows slo-o-ow Plan Data Immutable
  • 23. http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets Q Queue: INSERT INTO ... SET( name, enqueued_at, payload ) VALUES ( 'Q', now(), ... ) Dequeue: DELETE payload FROM ... WHERE name = 'Q' AND enqueued_at = ... Pick the next: SELECT * FROM ... WHERE id = 1 LIMIT 1 *Q *Q *Q *Q Q Q *? ? ?Q have to fetch 4 rowsslo-o-ow Queue
  • 24. Remember - eventual consistency. Concurrent updates => wrong count SELECT count(*) FROM ... WHERE .... ; Full scan over the selection => Default 10'000 rows limit => wrong count Have an integer column and increment it CREATE TABLE count_table ( id uuid, value counter, PRIMARY KEY (id) ); ... UPDATE count_table SET value = value + 1 WHERE id = ... ; Counter column family http://www.datastax.com/documentation/cassandr a/1.2/cassandra/cql_using/use_counter_t.html slo-o-ow mess mess How Many?
  • 25. CREATE TABLE blob ( id uuid, data blob, PRIMARY KEY (id) ); id chunk_no data CREATE TABLE blob ( id uuid, chunk_no int, data blob, PRIMARY KEY (id, chunk_no) ); id data http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage http://wiki.apache.org/cassandra/CassandraLimitations OutOfMemory Blobs
  • 26. Unbounded Queries Result set Client OutOfMemory OutOfMemory Cursors. Planned to 2.0. https://issues.apache.org/jira/browse/CASSANDRA-4415 C* clients use RPC yet Cassandra slo-o-ow
  • 27. Column names - data... yet Keep them short.
  • 28. Node 3Node 1 CF 3 Client 2 Node 3Node 2 Why?? slo-o-ow hot spots Limit Client to a node
  • 29. Helpful Links Sperasoft @ slideshare: http://www.slideshare.net/Sperasoft Sperasoft @ speakerdeck: https://speakerdeck.com/sperasoft Sperasoft @ github: https://github.com/Sperasoft/Workshop