SlideShare a Scribd company logo
1 of 46
Victor Coustenoble
@vizanalytics
2.2 & 3.0
http://www.datastax.com/dev/blog/cassandra-2-2
Where did 2.2 come from?
Don't start Thrift rpc by default (CASSANDRA-9319)
New features
• 2.2
- JSON
- User defined functions
- User defined aggregates
- Other useful features
- http://docs.datastax.com/en/cassandra/2.2/cassandra/features.html
- http://www.datastax.com/dev/blog/cassandra-2-2
• 3.0
- New storage engine (8099)
- A new way to denormalise/duplicate : Materialized View
So who’s taken some data out of C* and
serialised it as JSON?
Hello JSON
• create TABLE user (username text primary key,
first_name text , last_name text , emails set<text> ,
country text);
• INSERT INTO user JSON '{"username": "chbatey",
"first_name":"Christopher", "last_name": "Batey",
“emails":["christopher.batey@datastax.com"]}';
Goodbye Serialisation!
JSON + User Defined Types
• CREATE TYPE movie (title text, time timestamp,
description text);
• ALTER TABLE user ADD movies set<frozen<movie>>;
• UPDATE user SET movies = {{ title:'Batman',
time:'2011-02-03T04:05:00+0000', description:
'This film rocks' }} where username = 'chbatey';
Out it comes
• Run code on the server !Dangerous!
- Disabled by default
• Java + Java Script supported out of the box
• Any language that supports the Java Scripting API
(Java, Javascript, Ruby, Python …)
User Defined Functions
UDF example
CREATE TABLE user (
username text primary key,
first_name text ,
last_name text ,
emails set<text> ,
country text);
Concat function
CREATE FUNCTION name ( first_name text, last_name text )
CALLED ON NULL INPUT
RETURNS text LANGUAGE java
AS ‘return first_name + " " + last_name;’;
cqlsh:twotwo> select name(first_name, last_name) FROM user;
twotwo.name(first_name, last_name)
------------------------------------
Victor Coustenoble
User Defined Aggregates
CREATE AGGREGATE average ( int )
SFUNC averageState
STYPE tuple<int,bigint>
FINALFUNC averageFinal
INITCOND (0, 0);
Called for every row
state passed between
Initial state
Return type (CQL)
Optional function called on
final state
State function (like a UDF)
CREATE FUNCTION averageState ( state tuple<int,bigint>, value int )
CALLED ON NULL INPUT
RETURNS tuple<int,bigint>
LANGUAGE java
AS '
if (value != null) {
state.setInt(0, state.getInt(0)+1);
state.setLong(1, state.getLong(1)+val.intValue());
}
return state;
';
Type Columns
Final function
CREATE FUNCTION averageFinal ( state tuple<int,bigint> )
CALLED ON NULL INPUT
RETURNS double
LANGUAGE java
AS '
if (state.getInt(0) == 0) return null;
double r = state.getLong(1) / state.getInt(0);
return Double.valueOf(r);
';
State typeOverall return type
Putting it all together
Customer events
CREATE AGGREGATE count_by_type(text)
SFUNC countEventTypes
STYPE map<text, int>
INITCOND {};
CREATE FUNCTION countEventTypes( state map<text, int>, type text )
CALLED ON NULL INPUT
RETURNS map<text, int>
LANGUAGE java AS '
Integer count = (Integer) state.get(type);
if (count == null) count = 1;
else count = count + 1; state.put(type, count);
return state; ';
Customer events
Built in aggregates
• count
• max
• min
• avg
• sum
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/AggregateFcts.java
Built in time functions
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java
Built in aggregates in action
1/ “Materialised views” with Spark
2/ Pure C*
2/ Pure C*
JSON, UDF and UDA available in DevCenter
Roles based Access
Other bits and pieces…
• Compressed commit log
• Resumable bootstrapping
• New types
- smallint - short
- tinyint - byte
- date
- time
• Warnings now sent back to client
- batch too large
http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views
New Storage Engine
• CASSANDRA-8099
• More efficient storage
• Aware of CQL structure
• Reduce sstable size
• Reduce memory used
• …
Customer events table
CREATE TABLE if NOT EXISTS customer_events (
customer_id text,
staff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (customer_id, time))
create INDEX on customer_events (staff_id) ;
Indexes to the rescue?
customer_id time staff_id
chbatey 2015-03-03 08:52:45 trevor
chbatey 2015-03-03 08:52:54 trevor
chbatey 2015-03-03 08:53:11 bill
chbatey 2015-03-03 08:53:18 bill
rusty 2015-03-03 08:56:57
bill
rusty 2015-03-03 08:57:02
bill
rusty 2015-03-03 08:57:20 trevor
staff_id customer_id
trevor chbatey
trevor chbatey
bill chbatey
bill chbatey
bill rusty
bill rusty
trevor rusty
Secondary index are local
• The staff_id partition in the secondary index is not
distributed like a normal table
• The secondary index entries are only stored on the node
that contains the customer_id partition
Indexes to the rescue?
staff_id customer_id
trevor chbatey
trevor chbatey
bill chbatey
bill chbatey
staff_id customer_id
bill rusty
bill
rusty
trevor rusty
A B
chbatey rusty
customer_id time staff_id
chbatey 2015-03-03 08:52:45 trevor
chbatey 2015-03-03 08:52:54 trevor
chbatey 2015-03-03 08:53:11 bill
chbatey 2015-03-03 08:53:18 bill
rusty 2015-03-03 08:56:57
bill
rusty 2015-03-03 08:57:02
bill
rusty 2015-03-03 08:57:20 trevor
customer_events table
staff_id customer_id
trevor chbatey
trevor chbatey
bill chbatey
bill chbatey
bill rusty
bill
rusty
trevor rusty
staff_id index
Do it yourself index ?
CREATE TABLE if NOT EXISTS customer_events (
customer_id text,
staff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (customer_id, time))
CREATE TABLE if NOT EXISTS customer_events_by_staff (
customer_id text,
staff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (staff_id, time))
1.2 Logged batches
client
C
BATCH LOG
BL-R
BL-R
BL-R: Batch log replica
Pattern
• Write only:
- Duplicate with a different primary key
- (Optional) Logged batch for eventual consistency
• Full updates:
- No real difference
• Partial updates:
- No staff id in update?
Score Data Model
CREATE TABLE scores
(
user TEXT,
game TEXT,
year INT,
month INT,
day INT,
score INT,
PRIMARY KEY (user, game, year, month, day)
)
Materialized Views
CREATE MATERIALIZED VIEW alltimehigh AS
SELECT user FROM scores
WHERE game IS NOT NULL AND
score IS NOT NULL AND
user IS NOT NULL AND
year IS NOT NULL AND
month IS NOT NULL AND
day IS NOT NULL
PRIMARY KEY (game, score, user, year, month, day)
WITH CLUSTERING ORDER BY (score desc)
Materialized Views
INSERT INTO scores (user, game, year, month, day, score) VALUES ('pcmanus', 'Coup', 2015, 05, 01, 4000)
INSERT INTO scores (user, game, year, month, day, score) VALUES ('jbellis', 'Coup', 2015, 05, 03, 1750)
INSERT INTO scores (user, game, year, month, day, score) VALUES ('yukim', 'Coup', 2015, 05, 03, 2250)
INSERT INTO scores (user, game, year, month, day, score) VALUES ('tjake', 'Coup', 2015, 05, 03, 500)
INSERT INTO scores (user, game, year, month, day, score) VALUES ('jmckenzie', 'Coup', 2015, 06, 01, 2000)
INSERT INTO scores (user, game, year, month, day, score) VALUES ('iamaleksey', 'Coup', 2015, 06, 01, 2500)
INSERT INTO scores (user, game, year, month, day, score) VALUES ('tjake', 'Coup', 2015, 06, 02, 1000)
INSERT INTO scores (user, game, year, month, day, score) VALUES ('pcmanus', 'Coup', 2015, 06, 02, 2000)
SELECT user, score FROM alltimehigh WHERE game = 'Coup'
user | score
-----------+-------
pcmanus | 4000
iamaleksey | 2500
yukim | 2250
jmckenzie | 2000
pcmanus | 2000
jbellis | 1750
tjake | 1000
tjake | 500
KillrWeather data model
Combining aggregates + MVs
How it works…
http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views
https://issues.apache.org/jira/browse/CASSANDRA-6477
For more details
Fine print
• All Primary Key columns must be present in your view
• If the part of your primary key is NULL then it won't
appear in the materialised view
• Performance will be a factor!
- More operations to complete (read-before-write,
consistency check …)
- Batch writes for MV
• Bad for low cardinality data (hot spot)
Conclusions
• We still denormalise and duplicate to achieve scalability
and performance
• We just let C* do it for us :)
Find Out More
• Documentation: http://www.datastax.com/docs
• Developer Blog: http://www.datastax.com/dev/blog
• Academy: https://academy.datastax.com
• Community Site: http://planetcassandra.org

More Related Content

What's hot

Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Michaël Figuière
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
PostgresOpen
 
MongoDB: tips, trick and hacks
MongoDB: tips, trick and hacksMongoDB: tips, trick and hacks
MongoDB: tips, trick and hacks
Scott Hernandez
 

What's hot (20)

Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
greenDAO
greenDAOgreenDAO
greenDAO
 
Apache Cassandra & Data Modeling
Apache Cassandra & Data ModelingApache Cassandra & Data Modeling
Apache Cassandra & Data Modeling
 
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!
 
MongoDB-SESSION03
MongoDB-SESSION03MongoDB-SESSION03
MongoDB-SESSION03
 
How to Use JSON in MySQL Wrong
How to Use JSON in MySQL WrongHow to Use JSON in MySQL Wrong
How to Use JSON in MySQL Wrong
 
GreenDao Introduction
GreenDao IntroductionGreenDao Introduction
GreenDao Introduction
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized Views
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise Search
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
Cassandra Day Chicago 2015: Advanced Data Modeling
Cassandra Day Chicago 2015: Advanced Data ModelingCassandra Day Chicago 2015: Advanced Data Modeling
Cassandra Day Chicago 2015: Advanced Data Modeling
 
MongoDB: tips, trick and hacks
MongoDB: tips, trick and hacksMongoDB: tips, trick and hacks
MongoDB: tips, trick and hacks
 
Getting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETGetting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NET
 
Bulk Loading Data into Cassandra
Bulk Loading Data into CassandraBulk Loading Data into Cassandra
Bulk Loading Data into Cassandra
 
Slick: Bringing Scala’s Powerful Features to Your Database Access
Slick: Bringing Scala’s Powerful Features to Your Database Access Slick: Bringing Scala’s Powerful Features to Your Database Access
Slick: Bringing Scala’s Powerful Features to Your Database Access
 
Green dao
Green daoGreen dao
Green dao
 
Enter the Snake Pit for Fast and Easy Spark
Enter the Snake Pit for Fast and Easy SparkEnter the Snake Pit for Fast and Easy Spark
Enter the Snake Pit for Fast and Easy Spark
 
Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, stronger
 

Viewers also liked

Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Time Series Data with Apache Cassandra (ApacheCon EU 2014)Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Eric Evans
 
Wayne State University & DataStax: World's best data modeling tool for Apache...
Wayne State University & DataStax: World's best data modeling tool for Apache...Wayne State University & DataStax: World's best data modeling tool for Apache...
Wayne State University & DataStax: World's best data modeling tool for Apache...
DataStax Academy
 

Viewers also liked (20)

Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016
 
Quelles stratégies de Recherche avec Cassandra ?
Quelles stratégies de Recherche avec Cassandra ?Quelles stratégies de Recherche avec Cassandra ?
Quelles stratégies de Recherche avec Cassandra ?
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
DataStax Enterprise - La plateforme de base de données pour le Cloud
DataStax Enterprise - La plateforme de base de données pour le CloudDataStax Enterprise - La plateforme de base de données pour le Cloud
DataStax Enterprise - La plateforme de base de données pour le Cloud
 
DataStax Enterprise et Cas d'utilisation de Apache Cassandra
DataStax Enterprise et Cas d'utilisation de Apache CassandraDataStax Enterprise et Cas d'utilisation de Apache Cassandra
DataStax Enterprise et Cas d'utilisation de Apache Cassandra
 
Datastax Cassandra + Spark Streaming
Datastax Cassandra + Spark StreamingDatastax Cassandra + Spark Streaming
Datastax Cassandra + Spark Streaming
 
Apache Cassandra and Go
Apache Cassandra and GoApache Cassandra and Go
Apache Cassandra and Go
 
Cassandra 3 new features @ Geecon Krakow 2016
Cassandra 3 new features  @ Geecon Krakow 2016Cassandra 3 new features  @ Geecon Krakow 2016
Cassandra 3 new features @ Geecon Krakow 2016
 
Introduction spark
Introduction sparkIntroduction spark
Introduction spark
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Sasi, cassandra on full text search ride
Sasi, cassandra on full text search rideSasi, cassandra on full text search ride
Sasi, cassandra on full text search ride
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced Cassandra
 
Webinaire Business&Decision - Trifacta
Webinaire  Business&Decision - TrifactaWebinaire  Business&Decision - Trifacta
Webinaire Business&Decision - Trifacta
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
 
Webinar Degetel DataStax
Webinar Degetel DataStaxWebinar Degetel DataStax
Webinar Degetel DataStax
 
Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Time Series Data with Apache Cassandra (ApacheCon EU 2014)Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Time Series Data with Apache Cassandra (ApacheCon EU 2014)
 
Wayne State University & DataStax: World's best data modeling tool for Apache...
Wayne State University & DataStax: World's best data modeling tool for Apache...Wayne State University & DataStax: World's best data modeling tool for Apache...
Wayne State University & DataStax: World's best data modeling tool for Apache...
 
DataStax et Apache Cassandra pour la gestion des flux IoT
DataStax et Apache Cassandra pour la gestion des flux IoTDataStax et Apache Cassandra pour la gestion des flux IoT
DataStax et Apache Cassandra pour la gestion des flux IoT
 
CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)
 

Similar to Cassandra 2.2 & 3.0

Work in TDW
Work in TDWWork in TDW
Work in TDW
saso70
 
on SQL Managment studio(For the following exercise, use the Week 5.pdf
on SQL Managment studio(For the following exercise, use the Week 5.pdfon SQL Managment studio(For the following exercise, use the Week 5.pdf
on SQL Managment studio(For the following exercise, use the Week 5.pdf
formaxekochi
 
Starting from the database used in Project 1 (see the slightly cha.docx
Starting from the database used in Project 1 (see the slightly cha.docxStarting from the database used in Project 1 (see the slightly cha.docx
Starting from the database used in Project 1 (see the slightly cha.docx
dessiechisomjj4
 
CMS Project Phase II InstructionsIn this phase, you will create t.docx
CMS Project Phase II InstructionsIn this phase, you will create t.docxCMS Project Phase II InstructionsIn this phase, you will create t.docx
CMS Project Phase II InstructionsIn this phase, you will create t.docx
mary772
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson Portfolio
Kbengt521
 
Database Implementation Final Document
Database Implementation Final DocumentDatabase Implementation Final Document
Database Implementation Final Document
Conor O'Callaghan
 

Similar to Cassandra 2.2 & 3.0 (20)

2 Dundee - Cassandra-3
2 Dundee - Cassandra-32 Dundee - Cassandra-3
2 Dundee - Cassandra-3
 
Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0
 
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
 
Work in TDW
Work in TDWWork in TDW
Work in TDW
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony Davis
 
on SQL Managment studio(For the following exercise, use the Week 5.pdf
on SQL Managment studio(For the following exercise, use the Week 5.pdfon SQL Managment studio(For the following exercise, use the Week 5.pdf
on SQL Managment studio(For the following exercise, use the Week 5.pdf
 
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
 
SQL Server 2008 Portfolio
SQL Server 2008 PortfolioSQL Server 2008 Portfolio
SQL Server 2008 Portfolio
 
Laracon EU 2018: OMG MySQL 8.0 is out! are we there yet?
Laracon EU 2018: OMG MySQL 8.0 is out! are we there yet?Laracon EU 2018: OMG MySQL 8.0 is out! are we there yet?
Laracon EU 2018: OMG MySQL 8.0 is out! are we there yet?
 
Starting from the database used in Project 1 (see the slightly cha.docx
Starting from the database used in Project 1 (see the slightly cha.docxStarting from the database used in Project 1 (see the slightly cha.docx
Starting from the database used in Project 1 (see the slightly cha.docx
 
Writeable CTEs: The Next Big Thing
Writeable CTEs: The Next Big ThingWriteable CTEs: The Next Big Thing
Writeable CTEs: The Next Big Thing
 
My Portfolio
My PortfolioMy Portfolio
My Portfolio
 
My Portfolio
My PortfolioMy Portfolio
My Portfolio
 
Greg Lewis SQL Portfolio
Greg Lewis SQL PortfolioGreg Lewis SQL Portfolio
Greg Lewis SQL Portfolio
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2
 
CMS Project Phase II InstructionsIn this phase, you will create t.docx
CMS Project Phase II InstructionsIn this phase, you will create t.docxCMS Project Phase II InstructionsIn this phase, you will create t.docx
CMS Project Phase II InstructionsIn this phase, you will create t.docx
 
Java script
Java scriptJava script
Java script
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson Portfolio
 
Database Implementation Final Document
Database Implementation Final DocumentDatabase Implementation Final Document
Database Implementation Final Document
 

More from Victor Coustenoble

More from Victor Coustenoble (9)

Préparation de Données pour la Détection de Fraude
Préparation de Données pour la Détection de FraudePréparation de Données pour la Détection de Fraude
Préparation de Données pour la Détection de Fraude
 
Préparation de Données dans le Cloud
Préparation de Données dans le CloudPréparation de Données dans le Cloud
Préparation de Données dans le Cloud
 
Préparation de Données Hadoop avec Trifacta
Préparation de Données Hadoop avec TrifactaPréparation de Données Hadoop avec Trifacta
Préparation de Données Hadoop avec Trifacta
 
DataStax Enterprise BBL
DataStax Enterprise BBLDataStax Enterprise BBL
DataStax Enterprise BBL
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
DataStax et Cassandra dans Azure au Microsoft Techdays
DataStax et Cassandra dans Azure au Microsoft TechdaysDataStax et Cassandra dans Azure au Microsoft Techdays
DataStax et Cassandra dans Azure au Microsoft Techdays
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and Spark
 

Recently uploaded

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

Cassandra 2.2 & 3.0

  • 3. Where did 2.2 come from?
  • 4. Don't start Thrift rpc by default (CASSANDRA-9319)
  • 5. New features • 2.2 - JSON - User defined functions - User defined aggregates - Other useful features - http://docs.datastax.com/en/cassandra/2.2/cassandra/features.html - http://www.datastax.com/dev/blog/cassandra-2-2 • 3.0 - New storage engine (8099) - A new way to denormalise/duplicate : Materialized View
  • 6. So who’s taken some data out of C* and serialised it as JSON?
  • 7. Hello JSON • create TABLE user (username text primary key, first_name text , last_name text , emails set<text> , country text); • INSERT INTO user JSON '{"username": "chbatey", "first_name":"Christopher", "last_name": "Batey", “emails":["christopher.batey@datastax.com"]}';
  • 9. JSON + User Defined Types • CREATE TYPE movie (title text, time timestamp, description text); • ALTER TABLE user ADD movies set<frozen<movie>>; • UPDATE user SET movies = {{ title:'Batman', time:'2011-02-03T04:05:00+0000', description: 'This film rocks' }} where username = 'chbatey';
  • 11. • Run code on the server !Dangerous! - Disabled by default • Java + Java Script supported out of the box • Any language that supports the Java Scripting API (Java, Javascript, Ruby, Python …) User Defined Functions
  • 12. UDF example CREATE TABLE user ( username text primary key, first_name text , last_name text , emails set<text> , country text);
  • 13. Concat function CREATE FUNCTION name ( first_name text, last_name text ) CALLED ON NULL INPUT RETURNS text LANGUAGE java AS ‘return first_name + " " + last_name;’; cqlsh:twotwo> select name(first_name, last_name) FROM user; twotwo.name(first_name, last_name) ------------------------------------ Victor Coustenoble
  • 14. User Defined Aggregates CREATE AGGREGATE average ( int ) SFUNC averageState STYPE tuple<int,bigint> FINALFUNC averageFinal INITCOND (0, 0); Called for every row state passed between Initial state Return type (CQL) Optional function called on final state
  • 15. State function (like a UDF) CREATE FUNCTION averageState ( state tuple<int,bigint>, value int ) CALLED ON NULL INPUT RETURNS tuple<int,bigint> LANGUAGE java AS ' if (value != null) { state.setInt(0, state.getInt(0)+1); state.setLong(1, state.getLong(1)+val.intValue()); } return state; '; Type Columns
  • 16. Final function CREATE FUNCTION averageFinal ( state tuple<int,bigint> ) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS ' if (state.getInt(0) == 0) return null; double r = state.getLong(1) / state.getInt(0); return Double.valueOf(r); '; State typeOverall return type
  • 17. Putting it all together
  • 18. Customer events CREATE AGGREGATE count_by_type(text) SFUNC countEventTypes STYPE map<text, int> INITCOND {}; CREATE FUNCTION countEventTypes( state map<text, int>, type text ) CALLED ON NULL INPUT RETURNS map<text, int> LANGUAGE java AS ' Integer count = (Integer) state.get(type); if (count == null) count = 1; else count = count + 1; state.put(type, count); return state; ';
  • 20. Built in aggregates • count • max • min • avg • sum https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/AggregateFcts.java
  • 21. Built in time functions https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java
  • 22. Built in aggregates in action
  • 26. JSON, UDF and UDA available in DevCenter
  • 28. Other bits and pieces… • Compressed commit log • Resumable bootstrapping • New types - smallint - short - tinyint - byte - date - time • Warnings now sent back to client - batch too large
  • 30. New Storage Engine • CASSANDRA-8099 • More efficient storage • Aware of CQL structure • Reduce sstable size • Reduce memory used • …
  • 31. Customer events table CREATE TABLE if NOT EXISTS customer_events ( customer_id text, staff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time)) create INDEX on customer_events (staff_id) ;
  • 32. Indexes to the rescue? customer_id time staff_id chbatey 2015-03-03 08:52:45 trevor chbatey 2015-03-03 08:52:54 trevor chbatey 2015-03-03 08:53:11 bill chbatey 2015-03-03 08:53:18 bill rusty 2015-03-03 08:56:57 bill rusty 2015-03-03 08:57:02 bill rusty 2015-03-03 08:57:20 trevor staff_id customer_id trevor chbatey trevor chbatey bill chbatey bill chbatey bill rusty bill rusty trevor rusty
  • 33. Secondary index are local • The staff_id partition in the secondary index is not distributed like a normal table • The secondary index entries are only stored on the node that contains the customer_id partition
  • 34. Indexes to the rescue? staff_id customer_id trevor chbatey trevor chbatey bill chbatey bill chbatey staff_id customer_id bill rusty bill rusty trevor rusty A B chbatey rusty customer_id time staff_id chbatey 2015-03-03 08:52:45 trevor chbatey 2015-03-03 08:52:54 trevor chbatey 2015-03-03 08:53:11 bill chbatey 2015-03-03 08:53:18 bill rusty 2015-03-03 08:56:57 bill rusty 2015-03-03 08:57:02 bill rusty 2015-03-03 08:57:20 trevor customer_events table staff_id customer_id trevor chbatey trevor chbatey bill chbatey bill chbatey bill rusty bill rusty trevor rusty staff_id index
  • 35. Do it yourself index ? CREATE TABLE if NOT EXISTS customer_events ( customer_id text, staff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time)) CREATE TABLE if NOT EXISTS customer_events_by_staff ( customer_id text, staff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (staff_id, time))
  • 36. 1.2 Logged batches client C BATCH LOG BL-R BL-R BL-R: Batch log replica
  • 37. Pattern • Write only: - Duplicate with a different primary key - (Optional) Logged batch for eventual consistency • Full updates: - No real difference • Partial updates: - No staff id in update?
  • 38. Score Data Model CREATE TABLE scores ( user TEXT, game TEXT, year INT, month INT, day INT, score INT, PRIMARY KEY (user, game, year, month, day) )
  • 39. Materialized Views CREATE MATERIALIZED VIEW alltimehigh AS SELECT user FROM scores WHERE game IS NOT NULL AND score IS NOT NULL AND user IS NOT NULL AND year IS NOT NULL AND month IS NOT NULL AND day IS NOT NULL PRIMARY KEY (game, score, user, year, month, day) WITH CLUSTERING ORDER BY (score desc)
  • 40. Materialized Views INSERT INTO scores (user, game, year, month, day, score) VALUES ('pcmanus', 'Coup', 2015, 05, 01, 4000) INSERT INTO scores (user, game, year, month, day, score) VALUES ('jbellis', 'Coup', 2015, 05, 03, 1750) INSERT INTO scores (user, game, year, month, day, score) VALUES ('yukim', 'Coup', 2015, 05, 03, 2250) INSERT INTO scores (user, game, year, month, day, score) VALUES ('tjake', 'Coup', 2015, 05, 03, 500) INSERT INTO scores (user, game, year, month, day, score) VALUES ('jmckenzie', 'Coup', 2015, 06, 01, 2000) INSERT INTO scores (user, game, year, month, day, score) VALUES ('iamaleksey', 'Coup', 2015, 06, 01, 2500) INSERT INTO scores (user, game, year, month, day, score) VALUES ('tjake', 'Coup', 2015, 06, 02, 1000) INSERT INTO scores (user, game, year, month, day, score) VALUES ('pcmanus', 'Coup', 2015, 06, 02, 2000) SELECT user, score FROM alltimehigh WHERE game = 'Coup' user | score -----------+------- pcmanus | 4000 iamaleksey | 2500 yukim | 2250 jmckenzie | 2000 pcmanus | 2000 jbellis | 1750 tjake | 1000 tjake | 500
  • 44. Fine print • All Primary Key columns must be present in your view • If the part of your primary key is NULL then it won't appear in the materialised view • Performance will be a factor! - More operations to complete (read-before-write, consistency check …) - Batch writes for MV • Bad for low cardinality data (hot spot)
  • 45. Conclusions • We still denormalise and duplicate to achieve scalability and performance • We just let C* do it for us :)
  • 46. Find Out More • Documentation: http://www.datastax.com/docs • Developer Blog: http://www.datastax.com/dev/blog • Academy: https://academy.datastax.com • Community Site: http://planetcassandra.org

Editor's Notes

  1. also a toJson and fromJson if you want individual fields
  2. User defined types??
  3. time is modelled as a long, nanoseconds since midnight
  4. time is modelled as a long, nanoseconds since midnight
  5. Looks good so far. The first problem however is that a single query can result in many partitions being queries. We know why this is bad.
  6. Each of the segments of the index table
  7. Start by writing it out to a batch log on 2 other replicas Downside: Look at the extra round trips Extra complexity Serial reads