SlideShare une entreprise Scribd logo
1  sur  79
Télécharger pour lire hors ligne
Introduction to Cassandra
DuyHai DOAN
Apache Cassandra Evangelist
Datastax
•  Founded in April 2010
•  We contribute a lot to Apache Cassandra™
•  400+ customers (25 of the Fortune 100), 450+ employees
•  Headquarter in San Francisco Bay area
•  EU headquarter in London, offices in France and Germany
•  Datastax Enterprise = OSS Cassandra + extra features
© 2016 DataStax, All Rights Reserved.
 2
Cassandra history
•  created at Facebook
•  open-sourced since 2008
•  current version: 3.2
•  column-oriented ☞ distributed table
© 2016 DataStax, All Rights Reserved.
 3
5 Cassandra key points
•  Linear scalability
•  Continuous availability
•  Multi Data-center native
•  Operational simplicity
•  Spark integration
© 2016 DataStax, All Rights Reserved.
4
1) Linear scalability
© 2016 DataStax, All Rights Reserved.
 5
C*
C*	C*
NetcoSports
3 nodes, ≈3GB
1k+ nodes, PB+
YOU
2) Continuous availability
© 2016 DataStax, All Rights Reserved.
 6
•  thanks to the Dynamo architecture
3) Multi Data-centers
© 2016 DataStax, All Rights Reserved.
 7
•  out-of-the-box (config only)
•  AWS config for multi-regions DCs
•  GCE support
•  Microsoft Azure support
•  CloudStack support
Multi DC usages
Data locality, disaster recovery
© 2016 DataStax, All Rights Reserved.
 8
C*
C*
C*
C*
C* C*
C* C* C*
C*
C*
C*
C*
New York (DC1) London (DC2)
Async
replication
Multi DC usages
Virtual DC for workload segregation
© 2016 DataStax, All Rights Reserved.
 9
C*
C*
C*
C*
C* C*
C* C* C*
C*
C*
C*
C*
Production
(LIVE)
Analytics
(Spark)
Async
replication
Same room
Multi DC usages
Prod data copy for back-up/benchmark
© 2016 DataStax, All Rights Reserved.
 10
C*
C*
C*
C*
C* C*
C* C* C*
C*
C*
C*
C*
Use
LOCAL_XXX
Consistency
Levels
My tiny test DC
READ-ONLY!!!
Async
replication
4) Operational simplicity
© 2016 DataStax, All Rights Reserved.
 11
•  1 node = 1 process + 2 config files (cassandra.yaml + cassandra-rackdc.properties)
•  deployment automation
•  OpsCenter for
•  monitoring
•  provisioning*
•  services* (repair, performance, …)
* only with Datastax Enterprise
4) Operational simplicity
© 2016 DataStax, All Rights Reserved.
 12
5) Spark integration
© 2016 DataStax, All Rights Reserved.
 13
•  Cassandra + Spark = awesome !
•  Spark/Cassandra connector = most advanced connector right now for NoSQL
db
•  predicates push-down
•  early filtering
•  dataframe integration
•  Analytics, aggregation, streaming …
Main Cassandra use-cases
© 2016 DataStax, All Rights Reserved.
14
Cassandra use-cases
© 2016 DataStax, All Rights Reserved.
 15
Messaging
Collections/
Playlists
Fraud
detection
Recommendation/
Personalization
Internet of things/
Sensor data
Cassandra use-cases
© 2016 DataStax, All Rights Reserved.
 16
Messaging
Collections/
Playlists
Fraud
detection
Recommendation/
Personalization
Internet of things/
Sensor data
© 2016 DataStax, All Rights Reserved.
 17
Q & A
! "
Layers
© 2016 DataStax, All Rights Reserved.
 18
•  Cluster
•  Amazon DynamoDB paper
•  masterless
•  Storage engine
•  Google Big Table
•  columns/columns family ☞ distributed tables
Data Distribution
© 2016 DataStax, All Rights Reserved.
19
The tokens
© 2016 DataStax, All Rights Reserved.
 20
Random hash of #partition à token = hash(#p)
Hash: ] –x, x ]
hash range: 264 values
x = 264/2
C*
C*
C*
C*
C* C*
C* C*
Token ranges
© 2016 DataStax, All Rights Reserved.
 21
A: −x,−
3x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
B: −
3x
4
,−
2x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
C: −
2x
4
,−
x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
D: −
x
4
,0
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
E: 0,
x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
F:
x
4
,
2x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
G:
2x
4
,
3x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
H :
3x
4
,x
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
C*
C*
C*
C*
C* C*
C* C*
Distributed tables
© 2016 DataStax, All Rights Reserved.
 22
H
A
E
D
B C
G F
user_id1
user_id2
user_id3
user_id4
user_id5
CREATE TABLE users(
user_id int,
…,
PRIMARY KEY(user_id)
),
Distributed tables
© 2016 DataStax, All Rights Reserved.
 23
H
A
E
D
B C
G F
user_id1
user_id2
user_id3
user_id4
user_id5
Linear scalability
© 2016 DataStax, All Rights Reserved.
 24
H
A
E
D
B C
G F
Today = high load
•  disk occupation 80%
•  CPU 70%
•  saturated memory
Scaling out
© 2016 DataStax, All Rights Reserved.
 25
H
A
E
D
B
C
G
F
I
J
+2 nodes
•  disk occupation 50%
•  CPU 50%
•  memory ✌︎
Automatic data rebalancing
•  each node gives up some tokens
•  flag to throttle network bandwidth
•  streamingthroughput
Automatic data re-balancing with virtual nodes
© 2016 DataStax, All Rights Reserved.
 26
A:
B:
C:
D:
E:
F:
G:
H:
A:
B:
C:
D:
E:
F:
G:
H:
I:
J:
+2 nodes
© 2016 DataStax, All Rights Reserved.
 27
Q & A
! "
Replication Model & Consistency
© 2016 DataStax, All Rights Reserved.
28
Failure tolerance
© 2016 DataStax, All Rights Reserved.
 29
Replication factor (RF) = 3
H
A
E
D
B C
G F
1
2 3
{A, H, G}
{B, A, H} {C, B, A}
Coordinator node
© 2016 DataStax, All Rights Reserved.
 30
Responsible for handling requests (read/write)
Every node can be coordinator
•  masterless
•  round robin master for each request
•  no SPOF
•  proxy role
H
A
E
D
B C
G F
coordinator
request
1
2 3
Consistency level
© 2016 DataStax, All Rights Reserved.
 31
Tunable at runtime
•  ONE
•  QUORUM (strict majority w.r.t RF)
•  ALL
Applicable to any request (read/write)
Consistency in action
© 2016 DataStax, All Rights Reserved.
 32
B A A
B A A
Read ONE: A
data replication in progress …
Write ONE: B
ack
RF = 3, Write ONE, Read ONE
Consistency in action
© 2016 DataStax, All Rights Reserved.
 33
B A A
B A A
Read QUORUM: A
data replication in progress …
Write ONE: B
ack
RF = 3, Write ONE, Read QUORUM
Consistency in action
© 2016 DataStax, All Rights Reserved.
 34
B A A
B A A
Read ALL: B
data replication in progress …
Write ONE: B
ack
RF = 3, Write ONE, Read ALL
Consistency in action
© 2016 DataStax, All Rights Reserved.
 35
B B A
B B A
Read ONE: A
data replication in progress …
Write QUORUM: B
ack
RF = 3, Write QUORUM, Read ONE
Consistency in action
© 2016 DataStax, All Rights Reserved.
 36
B B A
B B A
Read QUORUM: A
data replication in progress …
Write QUORUM: B
ack
RF = 3, Write QUORUM, Read QUORUM
Consistency level = trade-off
© 2016 DataStax, All Rights Reserved.
 37
Consistency level
© 2016 DataStax, All Rights Reserved.
 38
ONE
Fast, may not read latest written value
Consistency level
© 2016 DataStax, All Rights Reserved.
 39
QUORUM
Strict majority w.r.t. Replication Factor
Good balance
Consistency level
© 2016 DataStax, All Rights Reserved.
 40
ALL
Paranoid
Slow, lost of high availability
Consistency level common patterns
© 2016 DataStax, All Rights Reserved.
 41
ONERead + ONEWrite
☞ available for read/write even (N-1) replicas down
QUORUMRead + QUORUMWrite
☞ available for read/write even if (RF - 1) replica (s) down
© 2016 DataStax, All Rights Reserved.
 42
Q & A
! "
Last Write Win & Compaction
© 2016 DataStax, All Rights Reserved.
43
Last Write Win (LWW)
© 2016 DataStax, All Rights Reserved.
 44
jdoe
age name
33 John DOE
INSERT INTO users(login, name, age) VALUES('jdoe', 'John DOE', 33);
#partition
Last Write Win (LWW)
© 2016 DataStax, All Rights Reserved.
 45
INSERT INTO users(login, name, age) VALUES('jdoe', 'John DOE', 33);
jdoe
age (t1) name (t1)
33 John DOE
auto-generated timestamp (μs)
.
Last Write Win (LWW)
© 2016 DataStax, All Rights Reserved.
 46
UPDATE users SET age = 34 WHERE login = 'jdoe';
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2
Last Write Win (LWW)
© 2016 DataStax, All Rights Reserved.
 47
DELETE age FROM users WHERE login = 'jdoe';
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2
tombstone
SSTable3
jdoe
age (t3)
ý
Last Write Win (LWW)
© 2016 DataStax, All Rights Reserved.
 48
SELECT age FROM users WHERE login = 'jdoe';
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
???
Last Write Win (LWW)
© 2016 DataStax, All Rights Reserved.
 49
SELECT age FROM users WHERE login = 'jdoe';
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
✓✕✕
Compaction
© 2016 DataStax, All Rights Reserved.
 50
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
New SSTable
jdoe
age (t3) name (t1)
ý John DOE
Basic Data Modeling
© 2016 DataStax, All Rights Reserved.
51
Table creation
© 2016 DataStax, All Rights Reserved.
 52
CREATE TABLE users (
login text,
name text,
age int,
…
PRIMARY KEY(login));
partition key (#partition)
DML statements
© 2016 DataStax, All Rights Reserved.
 53
INSERT INTO users(login, name, age) VALUES('jdoe', 'John DOE', 33);
UPDATE users SET age = 34 WHERE login = 'jdoe';
DELETE age FROM users WHERE login = 'jdoe';
SELECT age FROM users WHERE login = 'jdoe';
What’s about joins ?
© 2016 DataStax, All Rights Reserved.
 54
How can I join data between tables ?
How can I model 1 – N relationships ?
How to model a mailbox ?
EmailsUser
1 n
Compound primary key
© 2016 DataStax, All Rights Reserved.
 55
CREATE TABLE mailbox (
login text,
message_id timeuuid,
interlocutor text,
message text,
PRIMARY KEY((login), message_id));
partition key clustering column unicity
Compound primary key
© 2016 DataStax, All Rights Reserved.
 56
rsmith	
2014-11-21 16:00:00
‘bobm’, ‘It’s really…’
2014-11-21 17:32:12
‘bobm’, ‘It depends..’
2014-11-21 21:21:09
‘bobm’, ‘Don’t do…’
…	
hsue	
2014-11-21 11:04:43
‘jdoe’, ‘Hi, …’
2014-11-21 11:22:43
‘rsmith’, ‘Hello,…’
jdoe	
2014-11-21 11:00:00
‘hsue’, ‘Hi there!’
2014-11-21 11:22:43
‘rsmith’, ‘Hello,…’
2014-11-21 13:06:19
‘bobm’, ‘Do you…’
ordered by clustering column (date)
Not
ordered
Queries
© 2016 DataStax, All Rights Reserved.
 57
Get message by user and message_id (date)
Get message by user and date interval
SELECT * FROM mailbox WHERE login = 'jdoe'
and message_id = ‘2014-11-21 16:00:00’;
SELECT * FROM mailbox WHERE login = 'jdoe'
and message_id <= ‘2014-11-25 23:59:59’
and message_id >= ‘2014-11-20 00:00:00’;
Queries
© 2016 DataStax, All Rights Reserved.
 58
Get message by message_id only
Get message by date interval
SELECT * FROM mailbox WHERE message_id = ‘2014-11-21 16:00:00’; ???
SELECT * FROM mailbox WHERE
and message_id <= ‘2014-11-25 23:59:59’ ???
and message_id >= ‘2014-11-20 00:00:00’;
Queries
© 2016 DataStax, All Rights Reserved.
 59
Get message by message_id only (#partition not provided)
Get message by date interval (#partition not provided)
SELECT * FROM mailbox WHERE message_id = ‘2014-11-21 16:00:00’;
SELECT * FROM mailbox WHERE
and message_id <= ‘2014-11-25 23:59:59’
and message_id >= ‘2014-11-20 00:00:00’;
Without #partition
© 2016 DataStax, All Rights Reserved.
 60
No #partition
☞ no token
☞ where are my data ?
C*
C*
C*
C*
C* C*
C* C*
❓ ❓
❓ ❓
❓
❓
❓
❓
Queries
© 2016 DataStax, All Rights Reserved.
 61
Get message by user range (range query on #partition)
Get message by user pattern (non exact match on #partition)
SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe;
SELECT * FROM mailbox WHERE login like ‘%doe%‘;
WHERE clause restrictions
© 2016 DataStax, All Rights Reserved.
 62
All DML queries must provide #partition
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed
•  ☞ full cluster scan
On clustering columns, only range queries (<, ≤, >, ≥) and exact match (=)
WHERE clause only possible
•  on columns defined in PRIMARY KEY
•  on indexed columns ( )
WHERE clause restrictions
© 2016 DataStax, All Rights Reserved.
 63
What if I want to perform "arbitrary" WHERE clause ?
•  search form scenario, dynamic search fields
WHERE clause restrictions
© 2016 DataStax, All Rights Reserved.
 64
What if I want to perform "arbitrary" WHERE clause ?
•  search form scenario, dynamic search fields
DO NOT RE-INVENT THE WHEEL !
•  ☞ Apache Solr (Lucene) integration (Datastax Enterprise Search)
•  ☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)
WHERE clause restrictions
© 2016 DataStax, All Rights Reserved.
 65
What if I want to perform "arbitrary" WHERE clause ?
•  search form scenario, dynamic search fields
DO NOT RE-INVENT THE WHEEL !
•  ☞ Apache Solr (Lucene) integration (Datastax Enterprise Search)
•  ☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)
SELECT * FROM users WHERE solr_query = 'age:[33 TO *] AND gender:male';
SELECT * FROM users WHERE solr_query = 'lastname:*schwei?er';
© 2016 DataStax, All Rights Reserved.
 66
Q & A
! "
Advanced Data Modeling
© 2016 DataStax, All Rights Reserved.
67
Collection types
© 2016 DataStax, All Rights Reserved.
 68
CREATE TABLE users (
login text,
name text,
age int,
friends set<text>,
hobbies list<text>,
languages map<int, text>,
…
PRIMARY KEY(login));
User Defined Type (UDT)
© 2016 DataStax, All Rights Reserved.
 69
Instead of
CREATE TABLE users (
login text,
…
street_number int,
street_name text,
postcode int,
country text,
…
PRIMARY KEY(login));
User Defined Type (UDT)
© 2016 DataStax, All Rights Reserved.
 70
CREATE TYPE address (
street_number int,
street_name text,
postcode int,
country text);
CREATE TABLE users (
login text,
…
location frozen <address>,
…
PRIMARY KEY(login));
UDT Insert
© 2016 DataStax, All Rights Reserved.
 71
INSERT INTO users(login,name, location) VALUES (
'jdoe',
'John DOE',
{
'street_number': 124,
'street_name': 'Congress Avenue',
'postcode': 95054,
'country': ‘USA’
});
JSON syntax for INSERT/UPDATE/DELETE
© 2016 DataStax, All Rights Reserved.
 72
CREATE TABLE users (
id text PRIMARY KEY,
age int,
state text );
INSERT INTO users JSON '{"id": "user123", "age": 42, "state": "TX"}’;
INSERT INTO users(id, age, state) VALUES('me', fromJson('20'), 'CA');
UPDATE users SET age = fromJson('25’) WHERE id = fromJson('"me"');
DELETE FROM users WHERE id = fromJson('"me"');
JSON syntax for SELECT
© 2016 DataStax, All Rights Reserved.
 73
> SELECT JSON * FROM users WHERE id = 'me';
[json]
----------------------------------------
{"id": "me", "age": 25, "state": "CA”}
> SELECT JSON age,state FROM users WHERE id = 'me';
[json]
----------------------------------------
{"age": 25, "state": "CA"}
> SELECT age, toJson(state) FROM users WHERE id = 'me';
age | system.tojson(state)
-----+----------------------
25 | "CA"
Why Materialized Views ?
Relieve the pain of manual denormalization
© 2015 DataStax, All Rights Reserved.
 74
CREATE TABLE user(
id int PRIMARY KEY,
country text,
…);
CREATE TABLE user_by_country(
country text,
id int,
…,
PRIMARY KEY(country, id));
Materialzed View In Action
© 2015 DataStax, All Rights Reserved.
 75
CREATE MATERIALIZED VIEW user_by_country
AS SELECT country, id, firstname, lastname
FROM user
WHERE country IS NOT NULL AND id IS NOT NULL
PRIMARY KEY(country, id)
CREATE TABLE user_by_country (
country text,
id int,
firstname text,
lastname text,
PRIMARY KEY(country, id));
User Defined Functions (UDF)
© 2016 DataStax, All Rights Reserved.
 76
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
maxOf (col1 int, col2 int)
CALL ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURN int
LANGUAGE java
AS $$
return Math.max(col1, col2);
$$;
SELECT maxOf(col1, col2) FROM table WHERE id = xxx;
User Defined Aggregates (UDA)
© 2016 DataStax, All Rights Reserved.
 77
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
sum(bigint)
SFUNC accumulatorFunction
STYPE bigint
[FINALFUNC finalFunction]
INITCOND 0;
CREATE FUNCTION accumulatorFunction(accu bigint, column bigint)
RETURNS NULL ON NULL INPUT RETURN bigint LANGUAGE java
AS $$ return accu + colum; $$;
© 2016 DataStax, All Rights Reserved.
 78
Q & A
! "
© 2015 DataStax, All Rights Reserved.
 79
@doanduyhai
duy_hai.doan@datastax.com
https://academy.datastax.com/
Thank You

Contenu connexe

Tendances

Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisDuyhai Doan
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandraDuyhai Doan
 
Apache cassandra in 2016
Apache cassandra in 2016Apache cassandra in 2016
Apache cassandra in 2016Duyhai Doan
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceDuyhai Doan
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesDuyhai Doan
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016Duyhai Doan
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysDuyhai Doan
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesRussell Spitzer
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureRussell Spitzer
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplDuyhai Doan
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax EnablementVincent Poncet
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Matthias Niehoff
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDuyhai Doan
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandranickmbailey
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraRussell Spitzer
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryIlya Ganelin
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & SparkMatthias Niehoff
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterDon Drake
 

Tendances (20)

Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandra
 
Apache cassandra in 2016
Apache cassandra in 2016Apache cassandra in 2016
Apache cassandra in 2016
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practice
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev days
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxpl
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax Enablement
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basics
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and Cassandra
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
 

En vedette

Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Markus Klems
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Cassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGCassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGDuyhai Doan
 
Introduction to KillrChat
Introduction to KillrChatIntroduction to KillrChat
Introduction to KillrChatDuyhai Doan
 
Cassandra drivers and libraries
Cassandra drivers and librariesCassandra drivers and libraries
Cassandra drivers and librariesDuyhai Doan
 
Cassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUGCassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUGDuyhai Doan
 
KillrChat presentation
KillrChat presentationKillrChat presentation
KillrChat presentationDuyhai Doan
 
Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Duyhai Doan
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jugDuyhai Doan
 
KillrChat Data Modeling
KillrChat Data ModelingKillrChat Data Modeling
KillrChat Data ModelingDuyhai Doan
 
Cassandra introduction at FinishJUG
Cassandra introduction at FinishJUGCassandra introduction at FinishJUG
Cassandra introduction at FinishJUGDuyhai Doan
 
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaCassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaDuyhai Doan
 
Data stax academy
Data stax academyData stax academy
Data stax academyDuyhai Doan
 
Libon cassandra summiteu2014
Libon cassandra summiteu2014Libon cassandra summiteu2014
Libon cassandra summiteu2014Duyhai Doan
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 
Cassandra 3 new features @ Geecon Krakow 2016
Cassandra 3 new features  @ Geecon Krakow 2016Cassandra 3 new features  @ Geecon Krakow 2016
Cassandra 3 new features @ Geecon Krakow 2016Duyhai Doan
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemDuyhai Doan
 

En vedette (17)

Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Cassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGCassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUG
 
Introduction to KillrChat
Introduction to KillrChatIntroduction to KillrChat
Introduction to KillrChat
 
Cassandra drivers and libraries
Cassandra drivers and librariesCassandra drivers and libraries
Cassandra drivers and libraries
 
Cassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUGCassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUG
 
KillrChat presentation
KillrChat presentationKillrChat presentation
KillrChat presentation
 
Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
KillrChat Data Modeling
KillrChat Data ModelingKillrChat Data Modeling
KillrChat Data Modeling
 
Cassandra introduction at FinishJUG
Cassandra introduction at FinishJUGCassandra introduction at FinishJUG
Cassandra introduction at FinishJUG
 
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaCassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
 
Data stax academy
Data stax academyData stax academy
Data stax academy
 
Libon cassandra summiteu2014
Libon cassandra summiteu2014Libon cassandra summiteu2014
Libon cassandra summiteu2014
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
Cassandra 3 new features @ Geecon Krakow 2016
Cassandra 3 new features  @ Geecon Krakow 2016Cassandra 3 new features  @ Geecon Krakow 2016
Cassandra 3 new features @ Geecon Krakow 2016
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystem
 

Similaire à Cassandra introduction 2016

SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Robert Stupp
 
From PoCs to Production
From PoCs to ProductionFrom PoCs to Production
From PoCs to ProductionDataStax
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloudjbellis
 
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...NoSQLmatters
 
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...DataStax
 
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...Michal Němec
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 Databricks
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data modelDuyhai Doan
 
JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...
JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...
JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...David Taieb
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with SparkDataStax Academy
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Performance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and CassandraPerformance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and CassandraDave Bechberger
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jugGerald Muecke
 
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-..."Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...hamidsamadi
 
MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0Manyi Lu
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoNathaniel Braun
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014NoSQLmatters
 
Data Modeling Basics for the Cloud with DataStax
Data Modeling Basics for the Cloud with DataStaxData Modeling Basics for the Cloud with DataStax
Data Modeling Basics for the Cloud with DataStaxDataStax
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...DataStax
 

Similaire à Cassandra introduction 2016 (20)

SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
 
From PoCs to Production
From PoCs to ProductionFrom PoCs to Production
From PoCs to Production
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
 
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
 
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
 
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data model
 
JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...
JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...
JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Performance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and CassandraPerformance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and Cassandra
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
 
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-..."Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
 
MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Data Modeling Basics for the Cloud with DataStax
Data Modeling Basics for the Cloud with DataStaxData Modeling Basics for the Cloud with DataStax
Data Modeling Basics for the Cloud with DataStax
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
 

Plus de Duyhai Doan

Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Duyhai Doan
 
Le futur d'apache cassandra
Le futur d'apache cassandraLe futur d'apache cassandra
Le futur d'apache cassandraDuyhai Doan
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronDuyhai Doan
 
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Duyhai Doan
 
Cassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsCassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsDuyhai Doan
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemDuyhai Doan
 
Distributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDistributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDuyhai Doan
 
Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Duyhai Doan
 

Plus de Duyhai Doan (8)

Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
 
Le futur d'apache cassandra
Le futur d'apache cassandraLe futur d'apache cassandra
Le futur d'apache cassandra
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotron
 
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
 
Cassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsCassandra UDF and Materialized Views
Cassandra UDF and Materialized Views
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystem
 
Distributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDistributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeCon
 
Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015
 

Dernier

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Cassandra introduction 2016

  • 1. Introduction to Cassandra DuyHai DOAN Apache Cassandra Evangelist
  • 2. Datastax •  Founded in April 2010 •  We contribute a lot to Apache Cassandra™ •  400+ customers (25 of the Fortune 100), 450+ employees •  Headquarter in San Francisco Bay area •  EU headquarter in London, offices in France and Germany •  Datastax Enterprise = OSS Cassandra + extra features © 2016 DataStax, All Rights Reserved. 2
  • 3. Cassandra history •  created at Facebook •  open-sourced since 2008 •  current version: 3.2 •  column-oriented ☞ distributed table © 2016 DataStax, All Rights Reserved. 3
  • 4. 5 Cassandra key points •  Linear scalability •  Continuous availability •  Multi Data-center native •  Operational simplicity •  Spark integration © 2016 DataStax, All Rights Reserved. 4
  • 5. 1) Linear scalability © 2016 DataStax, All Rights Reserved. 5 C* C* C* NetcoSports 3 nodes, ≈3GB 1k+ nodes, PB+ YOU
  • 6. 2) Continuous availability © 2016 DataStax, All Rights Reserved. 6 •  thanks to the Dynamo architecture
  • 7. 3) Multi Data-centers © 2016 DataStax, All Rights Reserved. 7 •  out-of-the-box (config only) •  AWS config for multi-regions DCs •  GCE support •  Microsoft Azure support •  CloudStack support
  • 8. Multi DC usages Data locality, disaster recovery © 2016 DataStax, All Rights Reserved. 8 C* C* C* C* C* C* C* C* C* C* C* C* C* New York (DC1) London (DC2) Async replication
  • 9. Multi DC usages Virtual DC for workload segregation © 2016 DataStax, All Rights Reserved. 9 C* C* C* C* C* C* C* C* C* C* C* C* C* Production (LIVE) Analytics (Spark) Async replication Same room
  • 10. Multi DC usages Prod data copy for back-up/benchmark © 2016 DataStax, All Rights Reserved. 10 C* C* C* C* C* C* C* C* C* C* C* C* C* Use LOCAL_XXX Consistency Levels My tiny test DC READ-ONLY!!! Async replication
  • 11. 4) Operational simplicity © 2016 DataStax, All Rights Reserved. 11 •  1 node = 1 process + 2 config files (cassandra.yaml + cassandra-rackdc.properties) •  deployment automation •  OpsCenter for •  monitoring •  provisioning* •  services* (repair, performance, …) * only with Datastax Enterprise
  • 12. 4) Operational simplicity © 2016 DataStax, All Rights Reserved. 12
  • 13. 5) Spark integration © 2016 DataStax, All Rights Reserved. 13 •  Cassandra + Spark = awesome ! •  Spark/Cassandra connector = most advanced connector right now for NoSQL db •  predicates push-down •  early filtering •  dataframe integration •  Analytics, aggregation, streaming …
  • 14. Main Cassandra use-cases © 2016 DataStax, All Rights Reserved. 14
  • 15. Cassandra use-cases © 2016 DataStax, All Rights Reserved. 15 Messaging Collections/ Playlists Fraud detection Recommendation/ Personalization Internet of things/ Sensor data
  • 16. Cassandra use-cases © 2016 DataStax, All Rights Reserved. 16 Messaging Collections/ Playlists Fraud detection Recommendation/ Personalization Internet of things/ Sensor data
  • 17. © 2016 DataStax, All Rights Reserved. 17 Q & A ! "
  • 18. Layers © 2016 DataStax, All Rights Reserved. 18 •  Cluster •  Amazon DynamoDB paper •  masterless •  Storage engine •  Google Big Table •  columns/columns family ☞ distributed tables
  • 19. Data Distribution © 2016 DataStax, All Rights Reserved. 19
  • 20. The tokens © 2016 DataStax, All Rights Reserved. 20 Random hash of #partition à token = hash(#p) Hash: ] –x, x ] hash range: 264 values x = 264/2 C* C* C* C* C* C* C* C*
  • 21. Token ranges © 2016 DataStax, All Rights Reserved. 21 A: −x,− 3x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ B: − 3x 4 ,− 2x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ C: − 2x 4 ,− x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ D: − x 4 ,0 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ E: 0, x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ F: x 4 , 2x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ G: 2x 4 , 3x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ H : 3x 4 ,x ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ C* C* C* C* C* C* C* C*
  • 22. Distributed tables © 2016 DataStax, All Rights Reserved. 22 H A E D B C G F user_id1 user_id2 user_id3 user_id4 user_id5 CREATE TABLE users( user_id int, …, PRIMARY KEY(user_id) ),
  • 23. Distributed tables © 2016 DataStax, All Rights Reserved. 23 H A E D B C G F user_id1 user_id2 user_id3 user_id4 user_id5
  • 24. Linear scalability © 2016 DataStax, All Rights Reserved. 24 H A E D B C G F Today = high load •  disk occupation 80% •  CPU 70% •  saturated memory
  • 25. Scaling out © 2016 DataStax, All Rights Reserved. 25 H A E D B C G F I J +2 nodes •  disk occupation 50% •  CPU 50% •  memory ✌︎ Automatic data rebalancing •  each node gives up some tokens •  flag to throttle network bandwidth •  streamingthroughput
  • 26. Automatic data re-balancing with virtual nodes © 2016 DataStax, All Rights Reserved. 26 A: B: C: D: E: F: G: H: A: B: C: D: E: F: G: H: I: J: +2 nodes
  • 27. © 2016 DataStax, All Rights Reserved. 27 Q & A ! "
  • 28. Replication Model & Consistency © 2016 DataStax, All Rights Reserved. 28
  • 29. Failure tolerance © 2016 DataStax, All Rights Reserved. 29 Replication factor (RF) = 3 H A E D B C G F 1 2 3 {A, H, G} {B, A, H} {C, B, A}
  • 30. Coordinator node © 2016 DataStax, All Rights Reserved. 30 Responsible for handling requests (read/write) Every node can be coordinator •  masterless •  round robin master for each request •  no SPOF •  proxy role H A E D B C G F coordinator request 1 2 3
  • 31. Consistency level © 2016 DataStax, All Rights Reserved. 31 Tunable at runtime •  ONE •  QUORUM (strict majority w.r.t RF) •  ALL Applicable to any request (read/write)
  • 32. Consistency in action © 2016 DataStax, All Rights Reserved. 32 B A A B A A Read ONE: A data replication in progress … Write ONE: B ack RF = 3, Write ONE, Read ONE
  • 33. Consistency in action © 2016 DataStax, All Rights Reserved. 33 B A A B A A Read QUORUM: A data replication in progress … Write ONE: B ack RF = 3, Write ONE, Read QUORUM
  • 34. Consistency in action © 2016 DataStax, All Rights Reserved. 34 B A A B A A Read ALL: B data replication in progress … Write ONE: B ack RF = 3, Write ONE, Read ALL
  • 35. Consistency in action © 2016 DataStax, All Rights Reserved. 35 B B A B B A Read ONE: A data replication in progress … Write QUORUM: B ack RF = 3, Write QUORUM, Read ONE
  • 36. Consistency in action © 2016 DataStax, All Rights Reserved. 36 B B A B B A Read QUORUM: A data replication in progress … Write QUORUM: B ack RF = 3, Write QUORUM, Read QUORUM
  • 37. Consistency level = trade-off © 2016 DataStax, All Rights Reserved. 37
  • 38. Consistency level © 2016 DataStax, All Rights Reserved. 38 ONE Fast, may not read latest written value
  • 39. Consistency level © 2016 DataStax, All Rights Reserved. 39 QUORUM Strict majority w.r.t. Replication Factor Good balance
  • 40. Consistency level © 2016 DataStax, All Rights Reserved. 40 ALL Paranoid Slow, lost of high availability
  • 41. Consistency level common patterns © 2016 DataStax, All Rights Reserved. 41 ONERead + ONEWrite ☞ available for read/write even (N-1) replicas down QUORUMRead + QUORUMWrite ☞ available for read/write even if (RF - 1) replica (s) down
  • 42. © 2016 DataStax, All Rights Reserved. 42 Q & A ! "
  • 43. Last Write Win & Compaction © 2016 DataStax, All Rights Reserved. 43
  • 44. Last Write Win (LWW) © 2016 DataStax, All Rights Reserved. 44 jdoe age name 33 John DOE INSERT INTO users(login, name, age) VALUES('jdoe', 'John DOE', 33); #partition
  • 45. Last Write Win (LWW) © 2016 DataStax, All Rights Reserved. 45 INSERT INTO users(login, name, age) VALUES('jdoe', 'John DOE', 33); jdoe age (t1) name (t1) 33 John DOE auto-generated timestamp (μs) .
  • 46. Last Write Win (LWW) © 2016 DataStax, All Rights Reserved. 46 UPDATE users SET age = 34 WHERE login = 'jdoe'; jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34 SSTable1 SSTable2
  • 47. Last Write Win (LWW) © 2016 DataStax, All Rights Reserved. 47 DELETE age FROM users WHERE login = 'jdoe'; jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34 SSTable1 SSTable2 tombstone SSTable3 jdoe age (t3) ý
  • 48. Last Write Win (LWW) © 2016 DataStax, All Rights Reserved. 48 SELECT age FROM users WHERE login = 'jdoe'; jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34 SSTable1 SSTable2 SSTable3 jdoe age (t3) ý ???
  • 49. Last Write Win (LWW) © 2016 DataStax, All Rights Reserved. 49 SELECT age FROM users WHERE login = 'jdoe'; jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34 SSTable1 SSTable2 SSTable3 jdoe age (t3) ý ✓✕✕
  • 50. Compaction © 2016 DataStax, All Rights Reserved. 50 SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34 New SSTable jdoe age (t3) name (t1) ý John DOE
  • 51. Basic Data Modeling © 2016 DataStax, All Rights Reserved. 51
  • 52. Table creation © 2016 DataStax, All Rights Reserved. 52 CREATE TABLE users ( login text, name text, age int, … PRIMARY KEY(login)); partition key (#partition)
  • 53. DML statements © 2016 DataStax, All Rights Reserved. 53 INSERT INTO users(login, name, age) VALUES('jdoe', 'John DOE', 33); UPDATE users SET age = 34 WHERE login = 'jdoe'; DELETE age FROM users WHERE login = 'jdoe'; SELECT age FROM users WHERE login = 'jdoe';
  • 54. What’s about joins ? © 2016 DataStax, All Rights Reserved. 54 How can I join data between tables ? How can I model 1 – N relationships ? How to model a mailbox ? EmailsUser 1 n
  • 55. Compound primary key © 2016 DataStax, All Rights Reserved. 55 CREATE TABLE mailbox ( login text, message_id timeuuid, interlocutor text, message text, PRIMARY KEY((login), message_id)); partition key clustering column unicity
  • 56. Compound primary key © 2016 DataStax, All Rights Reserved. 56 rsmith 2014-11-21 16:00:00 ‘bobm’, ‘It’s really…’ 2014-11-21 17:32:12 ‘bobm’, ‘It depends..’ 2014-11-21 21:21:09 ‘bobm’, ‘Don’t do…’ … hsue 2014-11-21 11:04:43 ‘jdoe’, ‘Hi, …’ 2014-11-21 11:22:43 ‘rsmith’, ‘Hello,…’ jdoe 2014-11-21 11:00:00 ‘hsue’, ‘Hi there!’ 2014-11-21 11:22:43 ‘rsmith’, ‘Hello,…’ 2014-11-21 13:06:19 ‘bobm’, ‘Do you…’ ordered by clustering column (date) Not ordered
  • 57. Queries © 2016 DataStax, All Rights Reserved. 57 Get message by user and message_id (date) Get message by user and date interval SELECT * FROM mailbox WHERE login = 'jdoe' and message_id = ‘2014-11-21 16:00:00’; SELECT * FROM mailbox WHERE login = 'jdoe' and message_id <= ‘2014-11-25 23:59:59’ and message_id >= ‘2014-11-20 00:00:00’;
  • 58. Queries © 2016 DataStax, All Rights Reserved. 58 Get message by message_id only Get message by date interval SELECT * FROM mailbox WHERE message_id = ‘2014-11-21 16:00:00’; ??? SELECT * FROM mailbox WHERE and message_id <= ‘2014-11-25 23:59:59’ ??? and message_id >= ‘2014-11-20 00:00:00’;
  • 59. Queries © 2016 DataStax, All Rights Reserved. 59 Get message by message_id only (#partition not provided) Get message by date interval (#partition not provided) SELECT * FROM mailbox WHERE message_id = ‘2014-11-21 16:00:00’; SELECT * FROM mailbox WHERE and message_id <= ‘2014-11-25 23:59:59’ and message_id >= ‘2014-11-20 00:00:00’;
  • 60. Without #partition © 2016 DataStax, All Rights Reserved. 60 No #partition ☞ no token ☞ where are my data ? C* C* C* C* C* C* C* C* ❓ ❓ ❓ ❓ ❓ ❓ ❓ ❓
  • 61. Queries © 2016 DataStax, All Rights Reserved. 61 Get message by user range (range query on #partition) Get message by user pattern (non exact match on #partition) SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe; SELECT * FROM mailbox WHERE login like ‘%doe%‘;
  • 62. WHERE clause restrictions © 2016 DataStax, All Rights Reserved. 62 All DML queries must provide #partition Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed •  ☞ full cluster scan On clustering columns, only range queries (<, ≤, >, ≥) and exact match (=) WHERE clause only possible •  on columns defined in PRIMARY KEY •  on indexed columns ( )
  • 63. WHERE clause restrictions © 2016 DataStax, All Rights Reserved. 63 What if I want to perform "arbitrary" WHERE clause ? •  search form scenario, dynamic search fields
  • 64. WHERE clause restrictions © 2016 DataStax, All Rights Reserved. 64 What if I want to perform "arbitrary" WHERE clause ? •  search form scenario, dynamic search fields DO NOT RE-INVENT THE WHEEL ! •  ☞ Apache Solr (Lucene) integration (Datastax Enterprise Search) •  ☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)
  • 65. WHERE clause restrictions © 2016 DataStax, All Rights Reserved. 65 What if I want to perform "arbitrary" WHERE clause ? •  search form scenario, dynamic search fields DO NOT RE-INVENT THE WHEEL ! •  ☞ Apache Solr (Lucene) integration (Datastax Enterprise Search) •  ☞ Same JVM, 1-cluster-2-products (Solr & Cassandra) SELECT * FROM users WHERE solr_query = 'age:[33 TO *] AND gender:male'; SELECT * FROM users WHERE solr_query = 'lastname:*schwei?er';
  • 66. © 2016 DataStax, All Rights Reserved. 66 Q & A ! "
  • 67. Advanced Data Modeling © 2016 DataStax, All Rights Reserved. 67
  • 68. Collection types © 2016 DataStax, All Rights Reserved. 68 CREATE TABLE users ( login text, name text, age int, friends set<text>, hobbies list<text>, languages map<int, text>, … PRIMARY KEY(login));
  • 69. User Defined Type (UDT) © 2016 DataStax, All Rights Reserved. 69 Instead of CREATE TABLE users ( login text, … street_number int, street_name text, postcode int, country text, … PRIMARY KEY(login));
  • 70. User Defined Type (UDT) © 2016 DataStax, All Rights Reserved. 70 CREATE TYPE address ( street_number int, street_name text, postcode int, country text); CREATE TABLE users ( login text, … location frozen <address>, … PRIMARY KEY(login));
  • 71. UDT Insert © 2016 DataStax, All Rights Reserved. 71 INSERT INTO users(login,name, location) VALUES ( 'jdoe', 'John DOE', { 'street_number': 124, 'street_name': 'Congress Avenue', 'postcode': 95054, 'country': ‘USA’ });
  • 72. JSON syntax for INSERT/UPDATE/DELETE © 2016 DataStax, All Rights Reserved. 72 CREATE TABLE users ( id text PRIMARY KEY, age int, state text ); INSERT INTO users JSON '{"id": "user123", "age": 42, "state": "TX"}’; INSERT INTO users(id, age, state) VALUES('me', fromJson('20'), 'CA'); UPDATE users SET age = fromJson('25’) WHERE id = fromJson('"me"'); DELETE FROM users WHERE id = fromJson('"me"');
  • 73. JSON syntax for SELECT © 2016 DataStax, All Rights Reserved. 73 > SELECT JSON * FROM users WHERE id = 'me'; [json] ---------------------------------------- {"id": "me", "age": 25, "state": "CA”} > SELECT JSON age,state FROM users WHERE id = 'me'; [json] ---------------------------------------- {"age": 25, "state": "CA"} > SELECT age, toJson(state) FROM users WHERE id = 'me'; age | system.tojson(state) -----+---------------------- 25 | "CA"
  • 74. Why Materialized Views ? Relieve the pain of manual denormalization © 2015 DataStax, All Rights Reserved. 74 CREATE TABLE user( id int PRIMARY KEY, country text, …); CREATE TABLE user_by_country( country text, id int, …, PRIMARY KEY(country, id));
  • 75. Materialzed View In Action © 2015 DataStax, All Rights Reserved. 75 CREATE MATERIALIZED VIEW user_by_country AS SELECT country, id, firstname, lastname FROM user WHERE country IS NOT NULL AND id IS NOT NULL PRIMARY KEY(country, id) CREATE TABLE user_by_country ( country text, id int, firstname text, lastname text, PRIMARY KEY(country, id));
  • 76. User Defined Functions (UDF) © 2016 DataStax, All Rights Reserved. 76 CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] maxOf (col1 int, col2 int) CALL ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURN int LANGUAGE java AS $$ return Math.max(col1, col2); $$; SELECT maxOf(col1, col2) FROM table WHERE id = xxx;
  • 77. User Defined Aggregates (UDA) © 2016 DataStax, All Rights Reserved. 77 CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] sum(bigint) SFUNC accumulatorFunction STYPE bigint [FINALFUNC finalFunction] INITCOND 0; CREATE FUNCTION accumulatorFunction(accu bigint, column bigint) RETURNS NULL ON NULL INPUT RETURN bigint LANGUAGE java AS $$ return accu + colum; $$;
  • 78. © 2016 DataStax, All Rights Reserved. 78 Q & A ! "
  • 79. © 2015 DataStax, All Rights Reserved. 79 @doanduyhai duy_hai.doan@datastax.com https://academy.datastax.com/ Thank You