Big Data Camp LA 2014, Don't re-invent the Big-Data Wheel, Building real-time, Big Data applications on Cassandra with the open-source Kiji project by Clint Kelly of Wibidata
1. Don’t Reinvent the
Big-Data Wheel!
Clint Kelly - @clintwkelly
WibiData
Building real-time, Big Data applications on
Cassandra with the open-source Kiji project
Big Data Camp LA
14 June 2014
98. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
C
C
C
EngineeringChannels
99. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
Data
C
C
C
EngineeringChannels
100. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
Data
C
C
C
EngineeringChannels
101. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
Data
C
C
C
EngineeringChannels
102. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
Data
C
C
C
EngineeringChannels
103. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiMR
C
C
C
EngineeringChannels
Data
104. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
C
C
C
EngineeringChannels
Data
105. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
Scorer
C
C
C
EngineeringChannels
Data
106. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
Scorer
C
C
C
EngineeringChannels
Data
107. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
Scorer
C
C
C
R
EngineeringChannels
Data
108. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
Scorer
C
C
C
EngineeringChannels
Data
109. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
Scorer
C
C
C
EngineeringChannels
Data
110. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
Scorer
C
C
C
R
R
R
EngineeringChannels
Data
111. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
KijiScoring
C
C
C
R
Kiji Model
Repository
EngineeringChannels
Data
Scorer
112. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
KijiScoring
C
C
C
R
Kiji Model
Repository
EngineeringChannels
Data
Scorer
113. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
KijiScoring
C
C
C
R
Kiji Model
Repository
EngineeringChannels
Data
Scorer
114. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
KijiScoring
C
C
C
R
Kiji Model
Repository
EngineeringChannels
Data
Scorer
115. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
KijiScoring
C
C
C
R
Kiji Model
Repository
EngineeringChannels
Data
Scorer
116. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
KijiScoring
C
C
C
R
Kiji Model
Repository
EngineeringChannels
Data
Scorer
117. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
KijiScoring
C
C
C
R
Kiji Model
Repository
EngineeringChannels
Data
Scorer
118. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
KijiScoring
C
C
C
R
Kiji Model
Repository
EngineeringChannels
Data
Scorer
119. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
KijiScoring
C
C
C
R
Kiji Model
Repository
EngineeringChannels
Data
Scorer
R
120. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
KijiScoring
C
C
C
R
Kiji Model
Repository
EngineeringChannels
Data
Scorer
R
R
121. KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiExpress
KijiMR
KijiScoring
C
C
C
R
Kiji Model
Repository
EngineeringChannels
Data
Scorer
R
R
R
147. Timestamped versions
songs:
let it be
inter:
search0xfa “bob” songs:
let it besongs:
let it besongs:
let it be
inter:
clicks
1396560123
payment:
cardnum
payment:
address
rec:
scorer2
rec:
scorer3rec:
scorer3rec:
scorer3
rec:
scorer1
1395650231
148. Complex data types
record Search {
string search_term;
long session_id;
device_type device;
}
songs:
let it be
inter:
search0xfa “bob” songs:
let it besongs:
let it besongs:
let it be
inter:
clicks
1396560123
payment:
cardnum
payment:
address
rec:
scorer2
rec:
scorer3rec:
scorer3rec:
scorer3
rec:
scorer1
1395650231
161. Locality group ➔ Column family
CREATE TABLE loc_grp
songs:
let it be
inter:
search0xfa “bob” songs:
let it besongs:
let it besongs:
let it be
inter:
clicks
1396560123
payment:
cardnum
payment:
address
rec:
scorer2
rec:
scorer3rec:
scorer3rec:
scorer3
rec:
scorer1
1395650231
162. Entity ID ➔ Primary key
CREATE TABLE loc_grp (city text, user text,
PRIMARY KEY (city, user) )
WITH CLUSTERING ORDER BY (user ASC);
songs:
let it be
inter:
search0xfa “bob” songs:
let it besongs:
let it besongs:
let it be
inter:
clicks
1396560123
payment:
cardnum
payment:
address
rec:
scorer2
rec:
scorer3rec:
scorer3rec:
scorer3
rec:
scorer1
1395650231
163. Family, Qualifier,Version ➔ Clustering Columns
CREATE TABLE loc_grp (city text, user text,
family text, qualifier text, version bigint,
PRIMARY KEY (city, user, family, qualifier, version) )
WITH CLUSTERING ORDER BY (user ASC, family ASC, qualifier ASC,
version DESC);
songs:
let it be
inter:
search0xfa “bob” songs:
let it besongs:
let it besongs:
let it be
inter:
clicks
1396560123
payment:
cardnum
payment:
address
rec:
scorer2
rec:
scorer3rec:
scorer3rec:
scorer3
rec:
scorer1
1395650231
164. Column values ➔ Blobs
CREATE TABLE loc_grp (city text, user text,
family text, qualifier text, version bigint, value blob,
PRIMARY KEY (city, user, family, qualifier, version) )
WITH CLUSTERING ORDER BY (user ASC, family ASC, qualifier ASC,
version DESC);
songs:
let it be
inter:
search0xfa “bob” songs:
let it besongs:
let it besongs:
let it be
inter:
clicks
1396560123
payment:
cardnum
payment:
address
rec:
scorer2
rec:
scorer3rec:
scorer3rec:
scorer3
rec:
scorer1
1395650231
175. Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (async API!)
176. Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (async API!)
177. Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (async API!)
Compare-and-set across locality groups
178. Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (async API!)
Compare-and-set across locality groups
➔ not allowed in C* Kiji
179. Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (async API!)
Compare-and-set across locality groups
➔ not allowed in C* Kiji
180. Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (async API!)
Compare-and-set across locality groups
➔ not allowed in C* Kiji
Lose transactional consistency