SlideShare une entreprise Scribd logo
1  sur  63
Télécharger pour lire hors ligne
C* Path:
Denormalize your data
Eric Zoerner | Software Developer, eBuddy BV

#CASSANDRAEU

Cassandra Summit Europe 2013
London
CASSANDRASUMMITEU
About eBuddy
#CASSANDRAEU

CASSANDRASUMMITEU
XMS

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra in

eBuddy Messaging Platform
• User Data Service

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra in

eBuddy Messaging Platform
• User Data Service
• User Discovery Service

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra in

eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra in

eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
• Message History

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra in

eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
• Message History
• Location-based Discovery

#CASSANDRAEU

CASSANDRASUMMITEU
Some Statistics
• Current size of data
– 1,4 TB total (replication of 3x); 467 GB actual data

!
• 12 million sessions (11 million users plus groups)

!
• Almost a billion rows in one column family

(inverse social graph)

#CASSANDRAEU

CASSANDRASUMMITEU
C* Path
#CASSANDRAEU

CASSANDRASUMMITEU
The Problem (a “classic”)
Key-Value Store
(RDB table, NoSQL, etc.)

Complex Object

Person
name: String
birthdate: Date
nickname: String

*

1

Address
street: String
city: String
province: String
postalCode: String
countryCode: String

?

?

?

?

1

?

?

?

?

*
Phone
name: String
number: String

#CASSANDRAEU

CASSANDRASUMMITEU
Some Strategies

Serialization!

#CASSANDRAEU

CASSANDRASUMMITEU
Serialization!

Some Strategies
Person
id

birthdate

nickname

110

John

1985-04-06

Jack

111

Mary

1979-11-30

Mary

person_id

address_id

street

city

110

001

123 Main St

New York

110

002

456 Singel

Amsterdam

111

Normalization!

name

003

78 Hoofd Str

London

Address

Phone
person_id

phone

110

mobile

+15551234

111

home

+44884800

111

#CASSANDRAEU

name

mobile

+44030393

CASSANDRASUMMITEU
Some Strategies

Serialization!

Person
id

birthdate

nickname

110

John

1985-04-06

Jack

111

Mary

1979-11-30

Mary

person_id

address_id

street

city

110

Normalization!

name

001

123 Main St

New York

110

002

456 Singel

Amsterdam

111

003

78 Hoofd Str

London

Address

Decomposition!
name/

John

addresses/@0/street

123 Main St.

phones/@0/number

+31123456789

...

...

Phone
name

phone

110

mobile

+15551234

111

home

+44884800

111

#CASSANDRAEU

person_id

mobile

+44030393

CASSANDRASUMMITEU
Strategies Comparison
Serialization
Single Write
Single Read
Consistent Updates
Structural Access
Cycles

#CASSANDRAEU

Normalization

Decomposition

✔
✔
✔
✘
✔

✘
✘
✔
✔
✔

✔
✔
not enforced

✔
✘

CASSANDRASUMMITEU
C* Path
Open Source Java Library for decomposing
complex objects into Path-Value pairs —
and storing them in Cassandra
https://github.com/

ebuddy/c-star-path

!
!

*

Artifacts available at Maven Central.

#CASSANDRAEU

CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API

#CASSANDRAEU

CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
• Good for Cassandra because:
– Structural Access: Write parts of objects without reading first

#CASSANDRAEU

CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
• Good for Cassandra because:
– Structural Access: Write parts of objects without reading first
– Good for denormalizing data, can read or write large complex
objects with one read or write operation

#CASSANDRAEU

CASSANDRASUMMITEU
How does it work?
#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ;
UUID rowKey = … ;
Pojo pojo = … ;
!

#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ;
UUID rowKey = … ;
Pojo pojo = … ;
!
Path path = dao.createPath(“some”, “path”,
”to”,”my”,”pojo”);
!

#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ;
UUID rowKey = … ;
Pojo pojo = … ;
!
Path path = dao.createPath(“some”, “path”,
”to”,”my”,”pojo”);
!
dao.writeToPath(rowKey, path, pojo);

#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Read from a Path
!
Path path = dao.createPath(“some”, “path”,
”to”,”my”,”pojo”);
!
!

#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Read from a Path
!
Path path = dao.createPath(“some”, “path”,
”to”,”my”,”pojo”);
!
!
Pojo pojo = dao.readFromPath(rowKey, path,
new TypeReference<Pojo>() { });

#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Delete
!
!
dao.deletePath(rowKey, path);

#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Batch Operations
!
BatchContext batch = dao.beginBatch();
!
dao.writeToPath(rowKey1, path, pojo1, batch);
dao.writeToPath(rowKey2, path, pojo2, batch);
dao.deletePath(rowKey3, path, pojo3, batch);
!
dao.applyBatch(batch);

#CASSANDRAEU

CASSANDRASUMMITEU
Read or write at any level of a path
Person person = …;
!
Path path = dao.createPath(“x”);
dao.writeToPath(rowKey, path, person);
!

#CASSANDRAEU

CASSANDRASUMMITEU
Read or write at any level of a path
Person person = …;
!
Path path = dao.createPath(“x”);
dao.writeToPath(rowKey, path, person);
!
Path pathToName =
path.withElements(“name”);
String name = dao.readFromPath(rowKey,
pathToName, stringTypeReference);

#CASSANDRAEU

CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and
simple values. Uses the jackson (fasterxml) library for this and
honors the jackson annotations

#CASSANDRAEU

CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and
simple values. Uses the jackson (fasterxml) library for this and
honors the jackson annotations

• Step 2:
– Decompose this basic structure into a map of paths to simple
values (i.e. String, Number, Boolean), done by Decomposer

#CASSANDRAEU

CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and
simple values. Uses the jackson (fasterxml) library for this and
honors the jackson annotations

• Step 2:
– Decompose this basic structure into a map of paths to simple
values (i.e. String, Number, Boolean), done by Decomposer

• Step 3:
– Write this map as key-value pairs in the database

#CASSANDRAEU

CASSANDRASUMMITEU
Example Decomposition - step 1

Person
name: String
birthdate: Date
nickname: String

*

1

Address
street: String
city: String
province: String
postalCode: String
countryCode: String

Simplify structure into regular
Maps, Lists, and simple values

1
*
Phone
name: String
number: String

#CASSANDRAEU

CASSANDRASUMMITEU
Example Decomposition - step 1
Simplify structure into regular
Maps, Lists, and simple values
Map

name = "John"

birthdate = "-39080932298"

nickname="Jack"

addresses=<List>

[0] = <Map>

phones=<List>

[0] = <Map>

street="123 Main"

number="+31651234567"

place="New York"

name="mobile"

[1] = <Map>
street="Singel 45"
place="Amsterdam"

#CASSANDRAEU

CASSANDRASUMMITEU
Example Decomposition - step 2
path

value

name/

“John”

birthdate/

“-39080932298”

nickname/

“Jack”

addresses/@0/street

“123 Main St.”

addresses/@0/place

“New York”

addresses/@1/street

“Singel 45”

addresses/@1/place

“Amsterdam”

phones/@0/name

“mobile”

phones/@1/number

"+31651234567"

#CASSANDRAEU

CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database

#CASSANDRAEU

CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database

• Step 2:
– “Merge” path-value maps back into basic structure

(Maps, Lists, simple values), done by Composer

#CASSANDRAEU

CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database

• Step 2:
– “Merge” path-value maps back into basic structure

(Maps, Lists, simple values), done by Composer

• Step 3:
– Use Jackson to convert basic structure back into domain object
using a TypeReference

#CASSANDRAEU

CASSANDRASUMMITEU
Design & Challenges
#CASSANDRAEU

CASSANDRASUMMITEU
Path Encoding
• Paths stored as strings
• Forward slashes in paths (but hidden by Path API)
• Path elements are internally URL encoded allowing
use of special characters in the implementation
• Special characters: @ for list indices

(@0, @1, @2, ...)
#CASSANDRAEU

CASSANDRASUMMITEU
Challenge: “Shrinking Lists”
➀ Write a list.

x/@0/

“1”

x/@1/

“2”

dao.writeToPath(key, “x”, {“1”,”2”});

#CASSANDRAEU

CASSANDRASUMMITEU
Challenge: “Shrinking Lists”
➀ Write a list.
➁ Write a shorter list.
x/@0/

“1”

x/@1/

“2”

x/@0/

“3”

x/@1/

“2”

dao.writeToPath(key, “x”, {“1”,”2”});

dao.writeToPath(key, “x”, {“3”});

#CASSANDRAEU

CASSANDRASUMMITEU
Challenge: “Shrinking Lists”
➀ Write a list.
➁ Write a shorter list.
➂ Read the list.
x/@0/

“1”

x/@1/

“2”

x/@0/

“3”

x/@1/

“2”

dao.writeToPath(key, “x”, {“1”,”2”});

dao.writeToPath(key, “x”, {“3”});

dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});
{“3”,”2”}

#CASSANDRAEU

✘
CASSANDRASUMMITEU
Challenge: “Shrinking Lists”

✔

Solution:
Implementation writes a list
terminator value.
x/@0/
x/@1/

0xFFFFFFFF

x/@0/

“3”

x/@1/

0xFFFFFFFF

x/@2/

dao.writeToPath(key, “x”, {“3”});

“2”

x/@2/

dao.writeToPath(key, “x”, {“1”,”2”});

“1”

0xFFFFFFFF

dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});
{“3”}

#CASSANDRAEU

✔
CASSANDRASUMMITEU
Challenge: “Shrinking Lists”

✔

Solution:
Implementation writes a list
terminator value.

Unfortunately, this is only a partial solution, because it is still possible to
read “stale” list elements using a positional index in the path.

!
This can be avoided by doing a delete before a write, but for performance
reasons the library will not do that automatically.

!
Conclusion: The user must know what they are doing and understand the
implementation.

#CASSANDRAEU

CASSANDRASUMMITEU
Challenge: Inconsistent Updates
Because objects can be updated at any path, there is no
protection against a write “corrupting” an object
structure
Path path = dao.createPath(“x”);
dao.writeToPath(key, path, person1);

#CASSANDRAEU

x/address/street/

“Singel 45”

x/name/

“John”

CASSANDRASUMMITEU
Challenge: Inconsistent Updates
Because objects can be updated at any path, there is no
protection against a write “corrupting” an object
structure
Path path = dao.createPath(“x”);
dao.writeToPath(key, path, person1);

x/address/street/

“Singel 45”

x/name/

“John”

x/address/street/
path = dao.createPath(“x”,”name”);
dao.writeToPath(key, path, person1);

✘
#CASSANDRAEU

“Singel 45”

x/name/

“John”

x/name/address/street/ “Singel 45”
x/name/name/

“John”

CASSANDRASUMMITEU
Challenge: Inconsistent Updates

✔

Solution:
Don’t do that!

* If it does happen...

!

The implementation provides a way to still get the “corrupted” data as
simple structures, but an attempt to convert to a now incompatible POJO
will fail.
Conclusion: The user must know what they are doing and understand
the implementation.
#CASSANDRAEU

CASSANDRASUMMITEU
Issue: Sorting
Question:

What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?

!
!

#CASSANDRAEU

CASSANDRASUMMITEU
Issue: Sorting
Question:

What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?

!
Instead of storing paths as strings, the implementation
could have used DynamicComposite.
!

#CASSANDRAEU

CASSANDRASUMMITEU
Issue: Sorting
Question:

What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?

!
Instead of storing paths as strings, the implementation
could have used DynamicComposite.

!
We tried it.

#CASSANDRAEU

CASSANDRASUMMITEU
Issue: Sorting
Question:

What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?

!
It can work. CQL supports it as a user-defined type.
!
Unfortunately it causes cqlsh to crash, making it
difficult to “browse” the data.

#CASSANDRAEU

CASSANDRASUMMITEU
Issue: Sorting
Question:

What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?

!
It is still in consideration to use DynamicComposite for
paths in a future version.

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra Data Model
#CASSANDRAEU

CASSANDRASUMMITEU
Thrift

row key

column value

column name

“Singel 45”
“John”

…

column family

x/address/street/
x/name

<UUID>

…

- OR super column name
row key
x

<UUID>

super column family

!

(coming soon)

#CASSANDRAEU

address/street/

“Singel 45”

name

“John”

…

…

CASSANDRASUMMITEU
Thrift
Thrift implementation relies on the Hector client.

ColumnFamilyOperations<K,String,Object> operations =
new ColumnFamilyTemplate<K,String,Object>(

keyspace,KeySerializer,StringSerializer,StructureSerializer);

!
!
!
!

StructuredDataSupport<K> dao = new ThriftStructuredDataSupport<K>(operations);

#CASSANDRAEU

CASSANDRASUMMITEU
CQL
CREATE TABLE person (
key text,
path text,
value text,
PRIMARY KEY (key, path)
)

• Cannot use the path itself as a column name because it
is “dynamic”
• Dynamic column family

#CASSANDRAEU

CASSANDRASUMMITEU
CQL: Data Model Constraints
CREATE TABLE person (
key text,
path text,
value text,
PRIMARY KEY (key, path)
)
•

Need to do a range (“slice”) query on the path

path must be a clustering key

•

Also, the path must be the first clustering key, since otherwise we would need to
have to provide an equals condition on previous clustering keys in a query.

•

One might try putting a secondary index on the path instead of making it a
clustering key, but this doesn’t work since Cassandra indexes only work with
equals conditions

Bad Request: No indexed columns present in by-columns clause with Equal operator

#CASSANDRAEU

CASSANDRASUMMITEU
CQL
CQL implementation relies on the DataStax Java driver.

!

StructuredDataSupport<K> dao = 

new CqlStructuredDataSupport<K>(String tableName,
String partitionKeyColumnName,
String pathColumnName,
String valueColumnName,
Session session);

#CASSANDRAEU

CASSANDRASUMMITEU
And the rest…
#CASSANDRAEU

CASSANDRASUMMITEU
Planned Features

• Sets with simple values: element
values stored in path
• DynamicComposites?
• Multiple row reads and writes
• Slice queries on path ranges
#CASSANDRAEU

CASSANDRASUMMITEU
Credits and Acknowledgements
•

Thanks to Joost van de Wijgerd at eBuddy for his ideas and feedback

•

jackson JSON Processor, which is core to the C* Path implementation

http://wiki.fasterxml.com/JacksonHome

•

Image credits:
Slide

image name

author

link

Some Strategies

binary

noegranado

http://www.flickr.com/photos/
43360884@N04/6949896929/

#CASSANDRAEU

CASSANDRASUMMITEU
C* Path
Open Source Java Library for decomposing
complex objects into Path-Value pairs —
and storing them in Cassandra
https://github.com/

ebuddy/c-star-path

!
!

*

Artifacts available at Maven Central.

#CASSANDRAEU

CASSANDRASUMMITEU

Contenu connexe

Tendances

Perform Like a frAg Star
Perform Like a frAg StarPerform Like a frAg Star
Perform Like a frAg Starrenaebair
 
Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch? Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch? DataWorks Summit
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDBDavid Coallier
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBBradley Holt
 
WordPress Café: Using WordPress as a Framework
WordPress Café: Using WordPress as a FrameworkWordPress Café: Using WordPress as a Framework
WordPress Café: Using WordPress as a FrameworkExove
 
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"South Tyrol Free Software Conference
 

Tendances (7)

Callimachus
CallimachusCallimachus
Callimachus
 
Perform Like a frAg Star
Perform Like a frAg StarPerform Like a frAg Star
Perform Like a frAg Star
 
Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch? Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch?
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDB
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDB
 
WordPress Café: Using WordPress as a Framework
WordPress Café: Using WordPress as a FrameworkWordPress Café: Using WordPress as a Framework
WordPress Café: Using WordPress as a Framework
 
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
 

Similaire à C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGDuyhai Doan
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandranickmbailey
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceDuyhai Doan
 
Overiew of Cassandra and Doradus
Overiew of Cassandra and DoradusOveriew of Cassandra and Doradus
Overiew of Cassandra and Doradusrandyguck
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Wes McKinney
 
Spring Data Cassandra
Spring Data CassandraSpring Data Cassandra
Spring Data Cassandraniallmilton
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksDatabricks
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBJanos Geronimo
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...DataStax
 
Suicide Risk Prediction Using Social Media and Cassandra
Suicide Risk Prediction Using Social Media and CassandraSuicide Risk Prediction Using Social Media and Cassandra
Suicide Risk Prediction Using Social Media and CassandraKen Krugler
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceMongoDB
 
Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Vincent Royer
 

Similaire à C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra (20)

Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
 
C* path
C* pathC* path
C* path
 
Presentation
PresentationPresentation
Presentation
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practice
 
Overiew of Cassandra and Doradus
Overiew of Cassandra and DoradusOveriew of Cassandra and Doradus
Overiew of Cassandra and Doradus
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
 
Spring Data Cassandra
Spring Data CassandraSpring Data Cassandra
Spring Data Cassandra
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDB
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
 
Suicide Risk Prediction Using Social Media and Cassandra
Suicide Risk Prediction Using Social Media and CassandraSuicide Risk Prediction Using Social Media and Cassandra
Suicide Risk Prediction Using Social Media and Cassandra
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019
 

Plus de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Plus de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Dernier

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Dernier (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

  • 1. C* Path: Denormalize your data Eric Zoerner | Software Developer, eBuddy BV #CASSANDRAEU Cassandra Summit Europe 2013 London CASSANDRASUMMITEU
  • 4. Cassandra in
 eBuddy Messaging Platform • User Data Service #CASSANDRAEU CASSANDRASUMMITEU
  • 5. Cassandra in
 eBuddy Messaging Platform • User Data Service • User Discovery Service #CASSANDRAEU CASSANDRASUMMITEU
  • 6. Cassandra in
 eBuddy Messaging Platform • User Data Service • User Discovery Service • Persistent Session Store #CASSANDRAEU CASSANDRASUMMITEU
  • 7. Cassandra in
 eBuddy Messaging Platform • User Data Service • User Discovery Service • Persistent Session Store • Message History #CASSANDRAEU CASSANDRASUMMITEU
  • 8. Cassandra in
 eBuddy Messaging Platform • User Data Service • User Discovery Service • Persistent Session Store • Message History • Location-based Discovery #CASSANDRAEU CASSANDRASUMMITEU
  • 9. Some Statistics • Current size of data – 1,4 TB total (replication of 3x); 467 GB actual data ! • 12 million sessions (11 million users plus groups) ! • Almost a billion rows in one column family
 (inverse social graph) #CASSANDRAEU CASSANDRASUMMITEU
  • 11. The Problem (a “classic”) Key-Value Store (RDB table, NoSQL, etc.) Complex Object Person name: String birthdate: Date nickname: String * 1 Address street: String city: String province: String postalCode: String countryCode: String ? ? ? ? 1 ? ? ? ? * Phone name: String number: String #CASSANDRAEU CASSANDRASUMMITEU
  • 13. Serialization! Some Strategies Person id birthdate nickname 110 John 1985-04-06 Jack 111 Mary 1979-11-30 Mary person_id address_id street city 110 001 123 Main St New York 110 002 456 Singel Amsterdam 111 Normalization! name 003 78 Hoofd Str London Address Phone person_id phone 110 mobile +15551234 111 home +44884800 111 #CASSANDRAEU name mobile +44030393 CASSANDRASUMMITEU
  • 14. Some Strategies Serialization! Person id birthdate nickname 110 John 1985-04-06 Jack 111 Mary 1979-11-30 Mary person_id address_id street city 110 Normalization! name 001 123 Main St New York 110 002 456 Singel Amsterdam 111 003 78 Hoofd Str London Address Decomposition! name/ John addresses/@0/street 123 Main St. phones/@0/number +31123456789 ... ... Phone name phone 110 mobile +15551234 111 home +44884800 111 #CASSANDRAEU person_id mobile +44030393 CASSANDRASUMMITEU
  • 15. Strategies Comparison Serialization Single Write Single Read Consistent Updates Structural Access Cycles #CASSANDRAEU Normalization Decomposition ✔ ✔ ✔ ✘ ✔ ✘ ✘ ✔ ✔ ✔ ✔ ✔ not enforced ✔ ✘ CASSANDRASUMMITEU
  • 16. C* Path Open Source Java Library for decomposing complex objects into Path-Value pairs — and storing them in Cassandra https://github.com/ ebuddy/c-star-path ! ! * Artifacts available at Maven Central. #CASSANDRAEU CASSANDRASUMMITEU
  • 17. C* Path: Decomposition • Easy to Use • Simple API #CASSANDRAEU CASSANDRASUMMITEU
  • 18. C* Path: Decomposition • Easy to Use • Simple API • Good for Cassandra because: – Structural Access: Write parts of objects without reading first #CASSANDRAEU CASSANDRASUMMITEU
  • 19. C* Path: Decomposition • Easy to Use • Simple API • Good for Cassandra because: – Structural Access: Write parts of objects without reading first – Good for denormalizing data, can read or write large complex objects with one read or write operation #CASSANDRAEU CASSANDRASUMMITEU
  • 20. How does it work? #CASSANDRAEU CASSANDRASUMMITEU
  • 21. API Example - Write to a Path StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; ! #CASSANDRAEU CASSANDRASUMMITEU
  • 22. API Example - Write to a Path StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; ! Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); ! #CASSANDRAEU CASSANDRASUMMITEU
  • 23. API Example - Write to a Path StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; ! Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); ! dao.writeToPath(rowKey, path, pojo); #CASSANDRAEU CASSANDRASUMMITEU
  • 24. API Example - Read from a Path ! Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); ! ! #CASSANDRAEU CASSANDRASUMMITEU
  • 25. API Example - Read from a Path ! Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); ! ! Pojo pojo = dao.readFromPath(rowKey, path, new TypeReference<Pojo>() { }); #CASSANDRAEU CASSANDRASUMMITEU
  • 26. API Example - Delete ! ! dao.deletePath(rowKey, path); #CASSANDRAEU CASSANDRASUMMITEU
  • 27. API Example - Batch Operations ! BatchContext batch = dao.beginBatch(); ! dao.writeToPath(rowKey1, path, pojo1, batch); dao.writeToPath(rowKey2, path, pojo2, batch); dao.deletePath(rowKey3, path, pojo3, batch); ! dao.applyBatch(batch); #CASSANDRAEU CASSANDRASUMMITEU
  • 28. Read or write at any level of a path Person person = …; ! Path path = dao.createPath(“x”); dao.writeToPath(rowKey, path, person); ! #CASSANDRAEU CASSANDRASUMMITEU
  • 29. Read or write at any level of a path Person person = …; ! Path path = dao.createPath(“x”); dao.writeToPath(rowKey, path, person); ! Path pathToName = path.withElements(“name”); String name = dao.readFromPath(rowKey, pathToName, stringTypeReference); #CASSANDRAEU CASSANDRASUMMITEU
  • 30. Write Implementation: Decomposition • Step 1: – Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations #CASSANDRAEU CASSANDRASUMMITEU
  • 31. Write Implementation: Decomposition • Step 1: – Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations • Step 2: – Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer #CASSANDRAEU CASSANDRASUMMITEU
  • 32. Write Implementation: Decomposition • Step 1: – Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations • Step 2: – Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer • Step 3: – Write this map as key-value pairs in the database #CASSANDRAEU CASSANDRASUMMITEU
  • 33. Example Decomposition - step 1 Person name: String birthdate: Date nickname: String * 1 Address street: String city: String province: String postalCode: String countryCode: String Simplify structure into regular Maps, Lists, and simple values 1 * Phone name: String number: String #CASSANDRAEU CASSANDRASUMMITEU
  • 34. Example Decomposition - step 1 Simplify structure into regular Maps, Lists, and simple values Map name = "John" birthdate = "-39080932298" nickname="Jack" addresses=<List> [0] = <Map> phones=<List> [0] = <Map> street="123 Main" number="+31651234567" place="New York" name="mobile" [1] = <Map> street="Singel 45" place="Amsterdam" #CASSANDRAEU CASSANDRASUMMITEU
  • 35. Example Decomposition - step 2 path value name/ “John” birthdate/ “-39080932298” nickname/ “Jack” addresses/@0/street “123 Main St.” addresses/@0/place “New York” addresses/@1/street “Singel 45” addresses/@1/place “Amsterdam” phones/@0/name “mobile” phones/@1/number "+31651234567" #CASSANDRAEU CASSANDRASUMMITEU
  • 36. Read implementation: Composition • Step 1: – Read path-value pairs from database #CASSANDRAEU CASSANDRASUMMITEU
  • 37. Read implementation: Composition • Step 1: – Read path-value pairs from database • Step 2: – “Merge” path-value maps back into basic structure
 (Maps, Lists, simple values), done by Composer #CASSANDRAEU CASSANDRASUMMITEU
  • 38. Read implementation: Composition • Step 1: – Read path-value pairs from database • Step 2: – “Merge” path-value maps back into basic structure
 (Maps, Lists, simple values), done by Composer • Step 3: – Use Jackson to convert basic structure back into domain object using a TypeReference #CASSANDRAEU CASSANDRASUMMITEU
  • 40. Path Encoding • Paths stored as strings • Forward slashes in paths (but hidden by Path API) • Path elements are internally URL encoded allowing use of special characters in the implementation • Special characters: @ for list indices
 (@0, @1, @2, ...) #CASSANDRAEU CASSANDRASUMMITEU
  • 41. Challenge: “Shrinking Lists” ➀ Write a list. x/@0/ “1” x/@1/ “2” dao.writeToPath(key, “x”, {“1”,”2”}); #CASSANDRAEU CASSANDRASUMMITEU
  • 42. Challenge: “Shrinking Lists” ➀ Write a list. ➁ Write a shorter list. x/@0/ “1” x/@1/ “2” x/@0/ “3” x/@1/ “2” dao.writeToPath(key, “x”, {“1”,”2”}); dao.writeToPath(key, “x”, {“3”}); #CASSANDRAEU CASSANDRASUMMITEU
  • 43. Challenge: “Shrinking Lists” ➀ Write a list. ➁ Write a shorter list. ➂ Read the list. x/@0/ “1” x/@1/ “2” x/@0/ “3” x/@1/ “2” dao.writeToPath(key, “x”, {“1”,”2”}); dao.writeToPath(key, “x”, {“3”}); dao.readFromPath(key, “x”, new TypeReference<List<String>>() {}); {“3”,”2”} #CASSANDRAEU ✘ CASSANDRASUMMITEU
  • 44. Challenge: “Shrinking Lists” ✔ Solution: Implementation writes a list terminator value. x/@0/ x/@1/ 0xFFFFFFFF x/@0/ “3” x/@1/ 0xFFFFFFFF x/@2/ dao.writeToPath(key, “x”, {“3”}); “2” x/@2/ dao.writeToPath(key, “x”, {“1”,”2”}); “1” 0xFFFFFFFF dao.readFromPath(key, “x”, new TypeReference<List<String>>() {}); {“3”} #CASSANDRAEU ✔ CASSANDRASUMMITEU
  • 45. Challenge: “Shrinking Lists” ✔ Solution: Implementation writes a list terminator value. Unfortunately, this is only a partial solution, because it is still possible to read “stale” list elements using a positional index in the path. ! This can be avoided by doing a delete before a write, but for performance reasons the library will not do that automatically. ! Conclusion: The user must know what they are doing and understand the implementation. #CASSANDRAEU CASSANDRASUMMITEU
  • 46. Challenge: Inconsistent Updates Because objects can be updated at any path, there is no protection against a write “corrupting” an object structure Path path = dao.createPath(“x”); dao.writeToPath(key, path, person1); #CASSANDRAEU x/address/street/ “Singel 45” x/name/ “John” CASSANDRASUMMITEU
  • 47. Challenge: Inconsistent Updates Because objects can be updated at any path, there is no protection against a write “corrupting” an object structure Path path = dao.createPath(“x”); dao.writeToPath(key, path, person1); x/address/street/ “Singel 45” x/name/ “John” x/address/street/ path = dao.createPath(“x”,”name”); dao.writeToPath(key, path, person1); ✘ #CASSANDRAEU “Singel 45” x/name/ “John” x/name/address/street/ “Singel 45” x/name/name/ “John” CASSANDRASUMMITEU
  • 48. Challenge: Inconsistent Updates ✔ Solution: Don’t do that! * If it does happen... ! The implementation provides a way to still get the “corrupted” data as simple structures, but an attempt to convert to a now incompatible POJO will fail. Conclusion: The user must know what they are doing and understand the implementation. #CASSANDRAEU CASSANDRASUMMITEU
  • 49. Issue: Sorting Question:
 What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? ! ! #CASSANDRAEU CASSANDRASUMMITEU
  • 50. Issue: Sorting Question:
 What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? ! Instead of storing paths as strings, the implementation could have used DynamicComposite. ! #CASSANDRAEU CASSANDRASUMMITEU
  • 51. Issue: Sorting Question:
 What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? ! Instead of storing paths as strings, the implementation could have used DynamicComposite. ! We tried it. #CASSANDRAEU CASSANDRASUMMITEU
  • 52. Issue: Sorting Question:
 What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? ! It can work. CQL supports it as a user-defined type. ! Unfortunately it causes cqlsh to crash, making it difficult to “browse” the data. #CASSANDRAEU CASSANDRASUMMITEU
  • 53. Issue: Sorting Question:
 What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? ! It is still in consideration to use DynamicComposite for paths in a future version. #CASSANDRAEU CASSANDRASUMMITEU
  • 55. Thrift row key column value column name “Singel 45” “John” … column family x/address/street/ x/name <UUID> … - OR super column name row key x <UUID> super column family ! (coming soon) #CASSANDRAEU address/street/ “Singel 45” name “John” … … CASSANDRASUMMITEU
  • 56. Thrift Thrift implementation relies on the Hector client. ColumnFamilyOperations<K,String,Object> operations = new ColumnFamilyTemplate<K,String,Object>(
 keyspace,KeySerializer,StringSerializer,StructureSerializer); ! ! ! ! StructuredDataSupport<K> dao = new ThriftStructuredDataSupport<K>(operations); #CASSANDRAEU CASSANDRASUMMITEU
  • 57. CQL CREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) ) • Cannot use the path itself as a column name because it is “dynamic” • Dynamic column family #CASSANDRAEU CASSANDRASUMMITEU
  • 58. CQL: Data Model Constraints CREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) ) • Need to do a range (“slice”) query on the path path must be a clustering key • Also, the path must be the first clustering key, since otherwise we would need to have to provide an equals condition on previous clustering keys in a query. • One might try putting a secondary index on the path instead of making it a clustering key, but this doesn’t work since Cassandra indexes only work with equals conditions
 Bad Request: No indexed columns present in by-columns clause with Equal operator #CASSANDRAEU CASSANDRASUMMITEU
  • 59. CQL CQL implementation relies on the DataStax Java driver. ! StructuredDataSupport<K> dao = 
 new CqlStructuredDataSupport<K>(String tableName, String partitionKeyColumnName, String pathColumnName, String valueColumnName, Session session); #CASSANDRAEU CASSANDRASUMMITEU
  • 61. Planned Features • Sets with simple values: element values stored in path • DynamicComposites? • Multiple row reads and writes • Slice queries on path ranges #CASSANDRAEU CASSANDRASUMMITEU
  • 62. Credits and Acknowledgements • Thanks to Joost van de Wijgerd at eBuddy for his ideas and feedback • jackson JSON Processor, which is core to the C* Path implementation
 http://wiki.fasterxml.com/JacksonHome • Image credits: Slide image name author link Some Strategies binary noegranado http://www.flickr.com/photos/ 43360884@N04/6949896929/ #CASSANDRAEU CASSANDRASUMMITEU
  • 63. C* Path Open Source Java Library for decomposing complex objects into Path-Value pairs — and storing them in Cassandra https://github.com/ ebuddy/c-star-path ! ! * Artifacts available at Maven Central. #CASSANDRAEU CASSANDRASUMMITEU