Realtime Analytics with Cassandra2. • Motivation / alternatives
• What is it?
• How does it work?
• Approximate Analytics
• Whats it good for?
2
Analytics
3. • Motivation / alternatives
• What is it?
• How does it work?
• Approximate Analytics
• Whats it good for?
3
Analytics
4. Why bother?
“Companies that can harness big data will
trample data incompetents”
The Economist, May 26th 2011
4
Analytics
5. time page session id duration
time page session id duration
time ...page session id duration
... time
... time
page
... page session id ......
...
... duration
... page session id duration
... time
14:58:03.234 time ...
/index.html page session id 175 ......
...
... duration
...
...
248.180.3.40 session id 175 duration
14:58:03.234 time...
14:58:03.234 time
/index.html page
...
/index.html page
248.180.3.40 session id 175 ......
...
... duration
...
14:58:03.234 /csi/csi/council/freedom.html
14:58:03.409 ... time ...
248.180.3.40
/index.html page 248.180.3.40 session id session id 175 duration
...
248.180.3.40 1234 ...
14:58:03.409 ... time /index.html page 248.180.3.40 session id duration
/csi/csi/council/freedom.html ... 248.180.3.40 1234 175 ...
...
/index.html page 248.180.3.40 session id duration
14:58:03.234
/docs/access/chapter8.txt ...... page 248.180.3.40 ...session id ......
14:58:03.234 /csi/csi/council/freedom.html
14:58:03.409 ... time 248.180.3.40 1234 175 duration
/csi/csi/council/freedom.html 99.1.10.178 52
/docs/access/chapter8.txt ... page 248.180.3.40 ...session id duration
14:58:03.409 ... time
14:58:03.877 14:58:03.234 /index.html 248.180.3.40 1234 175
14:58:03.877 14:58:03.234 /csi/csi/council/freedom.html
14:58:03.409 ... time /index.html 99.1.10.178 52
248.180.3.40 1234 175 ...
... 52 1234 175 duration
14:58:03.409 ... /csi/csi/council/freedom.html 99.1.10.178
/docs/access/chapter8.txt ... page 248.180.3.40 session id
14:58:03.87714:58:03.234 time 248.180.3.40 ...session id ......
248.180.3.40 1234 175 duration
14:58:03.877 /index.html 248.180.3.40
14:58:03.877 /docs/access/chapter8.txt
14:58:03.409 ... time/docs/access/chapter8.txt ...99.1.10.178
/csi/csi/council/freedom.html 99.1.10.178
/index.html page 52 ...52
248.180.3.40 session id duration
14:58:03.234
14:58:03.877 /docs/access/chapter8.txt
14:58:03.877 14:58:03.234 time /docs/access/chapter8.txt ...99.1.10.178
/csi/csi/council/freedom.html 99.1.10.178
/index.html page 52 ... 1234 175 ...
52
52 ... 1234 175 duration
248.180.3.40
14:58:03.409 ...
/docs/access/chapter8.txt
/docs/access/chapter8.txt ...99.1.10.178 99.1.10.178 52
14:58:03.87714:58:03.409 ...... /csi/csi/council/freedom.html 99.1.10.17852 52session id 175 ......
/docs/access/chapter8.txt /index.html page 248.180.3.40
14:58:03.877 14:58:03.234 time /csi/csi/council/freedom.html
14:58:03.409 /docs/access/chapter8.txt 99.1.10.178 248.180.3.40 session id duration
99.1.10.178 248.180.3.40... 1234
14:58:03.877
/docs/access/chapter8.txt 99.1.10.178 52
/docs/access/chapter8.txt ......99.1.10.178
14:58:03.877 14:58:03.234 time
14:58:03.877 14:58:03.409 ... /csi/csi/council/freedom.html 99.1.10.17852 52 ... 1234 175 duration
/docs/access/chapter8.txt /index.html page 248.180.3.40
14:58:03.877
14:58:03.877 /docs/access/chapter8.txt 52
14:58:03.877 14:58:03.409 ... /csi/csi/council/freedom.html 99.1.10.17852 52 ... 1234 175 ......
14:58:03.877 14:58:03.234 248.180.3.40
/docs/access/chapter8.txt /index.html 99.1.10.178 248.180.3.40 52
99.1.10.178
14:58:03.877 /docs/access/chapter8.txt248.180.3.40
14:58:03.877 14:58:03.234 /docs/access/chapter8.txt ... 248.180.3.40
/docs/access/chapter8.txt /index.html 99.1.10.178 248.180.3.40 52
14:58:03.877 /csi/csi/council/freedom.html
14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
14:58:03.409 ... /docs/access/chapter8.txt ... 1234
14:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.17852 52 ... 1234 175 ...
99.1.10.178 248.180.3.40
/index.html 99.1.10.178 248.180.3.40
/csi/csi/council/freedom.html
/docs/access/chapter8.txt /index.html
14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40
14:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.17852 52 52 1234 175
99.1.10.178 1234 248.180.3.40
248.180.3.40
14:58:03.877 /csi/csi/council/freedom.html 248.180.3.40
/docs/access/chapter8.txt /index.html 99.1.10.178
14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 1234 52
/docs/access/chapter8.txt/csi/csi/council/freedom.html 99.1.10.17852 52 52 1234 175
14:58:03.87714:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.178 248.180.3.40 1234 175
14:58:03.409 /docs/access/chapter8.txt 99.1.10.17852248.180.3.40
/docs/access/chapter8.txt /index.html 99.1.10.178 248.180.3.40 52
/csi/csi/council/freedom.html 99.1.10.178
14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 1234 52
/docs/access/chapter8.txt
/csi/csi/council/freedom.html 99.1.10.178 248.180.3.40
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 99.1.10.178
14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 99.1.10.17852248.180.3.40
14:58:03.87714:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.17852 52 52 1234
1234
14:58:03.877 14:58:03.877 /docs/access/chapter8.txt 248.180.3.40
14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852248.180.3.40
/docs/access/chapter8.txt/csi/csi/council/freedom.html 99.1.10.17852 52 52 1234
1234
14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.178 248.180.3.40
14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
14:58:03.409 /docs/access/chapter8.txt 99.1.10.17852248.180.3.40
/csi/csi/council/freedom.html/docs/access/chapter8.txt 1234 99.1.10.17852 52 52 1234
248.180.3.40
/docs/access/chapter8.txt/csi/csi/council/freedom.html
14:58:03.877 14:58:03.877 /docs/access/chapter8.txt 248.180.3.40
14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 1234
99.1.10.178
14:58:03.409 14:58:03.87714:58:03.409
14:58:03.877 /csi/csi/council/freedom.html/docs/access/chapter8.txt 1234 99.1.10.17852 52 52
/docs/access/chapter8.txt 248.180.3.40
/csi/csi/council/freedom.html 99.1.10.178
/docs/access/chapter8.txt 248.180.3.40123452 1234
99.1.10.178
14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.178 1234 52 52 52
14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
14:58:03.87714:58:03.877
/csi/csi/council/freedom.html/docs/access/chapter8.txt
/docs/access/chapter8.txt 248.180.3.40 52
99.1.10.178 248.180.3.40 99.1.10.178
/docs/access/chapter8.txt /docs/access/chapter8.txt 99.1.10.178 99.1.10.17852 52 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.17852 99.1.10.178
14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
/docs/access/chapter8.txt 248.180.3.40
14:58:03.877 14:58:03.409 14:58:03.877
14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 248.180.3.40 99.1.10.178
14:58:03.877 /csi/csi/council/freedom.html/docs/access/chapter8.txt 1234 52 1234
14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 52
14:58:03.87714:58:03.877
/docs/access/chapter8.txt
14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234
/docs/access/chapter8.txt /docs/access/chapter8.txt
/csi/csi/council/freedom.html 99.1.10.178
/docs/access/chapter8.txt 248.180.3.401234
14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 52
14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 99.1.10.178
14:58:03.87714:58:03.877
/docs/access/chapter8.txt /docs/access/chapter8.txt
/csi/csi/council/freedom.html 99.1.10.178
/docs/access/chapter8.txt 248.180.3.401234
14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 52
14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 99.1.10.178
14:58:03.87714:58:03.877
/docs/access/chapter8.txt /docs/access/chapter8.txt
/csi/csi/council/freedom.html 99.1.10.178
/docs/access/chapter8.txt 248.180.3.401234
14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52
14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 99.1.10.178
14:58:03.87714:58:03.877
/docs/access/chapter8.txt
/csi/csi/council/freedom.html 99.1.10.178
/docs/access/chapter8.txt 248.180.3.40
14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52
14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234
/docs/access/chapter8.txt
/csi/csi/council/freedom.html 99.1.10.178
/docs/access/chapter8.txt 248.180.3.40
14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52
14:58:03.877
14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234
14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178
/docs/access/chapter8.txt /docs/access/chapter8.txt 248.180.3.40
14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234
14:58:03.409 /docs/access/chapter8.txt
14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178
/csi/csi/council/freedom.html 99.1.10.17852 1234
248.180.3.40
14:58:03.409 /docs/access/chapter8.txt
14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178
/csi/csi/council/freedom.html 99.1.10.17852 1234 52 1234
14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 248.180.3.40
14:58:03.409 /docs/access/chapter8.txt
/csi/csi/council/freedom.html 99.1.10.178
14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40
14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234 52 1234
248.180.3.40
14:58:03.409 /docs/access/chapter8.txt
14:58:03.877 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.17852 1234 52
/docs/access/chapter8.txt 248.180.3.40
14:58:03.409 /docs/access/chapter8.txt
14:58:03.877 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.17852 1234 52
/docs/access/chapter8.txt 248.180.3.40
14:58:03.877 /docs/access/chapter8.txt
14:58:03.877 /csi/csi/council/freedom.html
14:58:03.409 99.1.10.17899.1.10.17852 1234 52
/docs/access/chapter8.txt 248.180.3.40
14:58:03.409 /docs/access/chapter8.txt
14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178
248.180.3.40 52 1234
14:58:03.409 /docs/access/chapter8.txt
14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178
248.180.3.40 52 1234
14:58:03.877
14:58:03.409 /docs/access/chapter8.txt
/csi/csi/council/freedom.html 99.1.10.178
248.180.3.40 52 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
5
Analytics
6. Combining “big” and “real-time” is hard
Live & historical Drill downs
Trends...
aggregates... and roll ups
6
Analytics
7. Solution Con
Scalability
$$$
Not realtime
Spartan query semantics =>
complex, DIY solutions
7
Analytics
8. • Motivation / alternatives
• What is it?
• How does it work?
• Approximate Analytics
• Whats it good for?
8
Analytics
9. Analytics
counter
updates
Click stream events
Acunu
Sensor data
Analytics
etc
• Aggregate incrementally, on the fly
• Store live + historical aggregates
10. {
time : TIME(HOUR; MIN; SEC),
page : PATH(/),
category : STRING,
loadTime : LONG
}
{
select : ["COUNT", "AVG(loadTime)"],
where : “time, ?path”,
group : “time, ?category”
}
10
Analytics
12. • Motivation / alternatives
• What is it?
• How does it work?
• Approximate Analytics
• Whats it good for?
12
Analytics
13. count
grouped by ...
day
count
distinct
(session)
count ... geography
avg(duration)
... browser
13
Analytics
14. time : TIME(HOUR; MIN; SEC),
cust_id : LONG,
Data session_id : LONG,
Definition geography : STRING,
browser : STRING,
load_time : LONG
{ select: “COUNT”
patterns: [
{ where : “?time”, group : “?time” },
Query { where : “”, group : “geography” },
{ where : “”, group : “browser” }
Patterns ]
}, {
select: [“COUNT_DISTINCT(session_id)”,
“AVG(load_time)”],
where: “time”, group: “”
}
14
Analytics
15. 21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3221 :00→22 :00→19 :02→104 ...
{
cust_id: user01, ... ...
session_id: 102, UK all→228 user01→1 user14→12 user99→7 ...
geography: UK,
US all→354 user01→4 user04→8 user56→17 ...
browser: IE,
time: 22:02, ...
} UK, 22:00 all→1904 ...
∅ all→87314 UK→238 US→354 ...
15
Analytics
16. 21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :00→19 :02→105 ...
{
cust_id: user01, ... ...
session_id: 102, UK all→229 user01→2 user14→12 user99→7 ...
geography: UK,
US all→354 user01→4 user04→8 user56→17 ...
browser: IE,
time: 22:02, ...
} UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
16
Analytics
17. 21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3221 :00→22 :00→19 :02→104 ...
... ...
UK all→228 user01→1 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1904 ...
∅ all→87314 UK→238 US→354 ...
17
Analytics
18. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
18
Analytics
19. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ...
group by minute ... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
19
Analytics
20. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ...
group by minute ... ...
UK all→229 user01→2 user14→12 user99→7 ...
where geography=UK US all→354 user01→4 user04→8 user56→17 ...
group all by user, ...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
20
Analytics
21. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ...
group by minute ... ...
UK all→229 user01→2 user14→12 user99→7 ...
where geography=UK US all→354 user01→4 user04→8 user56→17 ...
group all by user, ...
UK, 22:00 all→1905 ...
count all ∅ all→87315 UK→239 US→354 ...
21
Analytics
22. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ...
group by minute ... ...
UK all→229 user01→2 user14→12 user99→7 ...
where geography=UK US all→354 user01→4 user04→8 user56→17 ...
group all by user, ...
UK, 22:00 all→1905 ...
count all ∅ all→87315 UK→239 US→354 ...
group all by geo
22
Analytics
23. • Motivation / alternatives
• What is it?
• How does it work?
• Approximate Analytics
• Whats it good for?
23
Analytics
25. Count Distinct
Plan A: keep a list of all the things you’ve seen
count them at query time
Quick to update
... but at scale ...
Takes lots of space
Takes a long time to query
25
Analytics
26. Approximate Distinct
max # leading zeroes seen so far
item hash leading zeroes max so far
x 00101001110... 2 2
y 11010100111... 0 2
z 00011101011... 3 3
...
... to see a max of M takes about 2M items
26
Analytics
27. Approximate Distinct
to reduce var, average over m=2k sub-streams
item hash index, zeroes max so far
x 00101001110... 0, 0 0,0,0,0
y 11010100111... 3, 1 0,0,1,0
z 00011101011... 0, 1 1,0,1,0
...
take the harmonic mean
27
Analytics
28. • Motivation / alternatives
• What is it?
• How does it work?
• Approximate Analytics
• Whats it good for?
28
Analytics
30. What’s Coming?
• Ad Hoc: same queries, but without the need
to pre-define them
• Geolocation: support for location-based
events and queries
• Drill down: see the events that make up any
given aggregate
30
Analytics
31. • Motivation / alternatives
• What is it?
• How does it work?
• Approximate Analytics
• Whats it good for?
31
Analytics
32. Manufacturing Social Media Ad Analytics
Systems Financial
Oil + Gas
Monitoring Services
Analytics
33. “Up and running in about 4 hours”
“We found out a competitor
was scraping our data”
“We keep discovering use cases
we hadn’t thought of ”
Analytics