Building Sexy Real-Time Analytics Systems - Erlang Factory NYC / Toronto 2013

Building “sexy” real-time analytics systems

AdGear is full-stack ad platform for publishers and advertisers, with advanced
analytics, attribution measurement, ad serving, and real-time bidding technology.

Real-time reporting... why?

•
•
•

help clients to make informed decisions

•
•

should I increase the bid price?

should I bid on exchange X?

inventory control (brand safety)

debugging (bots detection, creatives audits)

“Sexy” real-time analytics systems

“Sexy”?

•
•

elegant backend

beautiful user interface

Architecture #1

(3 years ago)

•
•
•

ssh

node.js

socket.io

Problems

•
•
•
•
•

no SMP support

•
•

each process needs to be monitored

requires load-balancing (nginx)

duplicated state (per process)

duplicated work (de-serialization)

bad error handling (event loop explodes)

callbacks...

Architecture #2

(1.5 years ago)

•
•
•
•

ssh_channel *

gproc (pub sub)

ETS counters

bullet (cowboy)

* https://gist.github.com/lpgauth/6529807

Architecture #2
1. receive buffered events, split
and de-serialize

2. each event is sent to a
collector process (3) using
gproc (pubsub) for ﬁltering

3. collector (gen_server)
aggregates message using ETS
counters and ﬂush every
second

4. bullet handler serializes the
aggregates (tab2list to json)

Problems

•

ssh_channel process and collector process are
bottlenecks

•

number of messages increases with the number of
clients

•
•

requires lots of bandwidth for large streams

limited ﬁltering (match specs)

Improvements...
(6 months ago)

•
•

optimize collector’s msg loop (gen_server to proc_lib)

use ssh compression

•
•

added support for openssh zlib compression *

R16B02

* https://github.com/lpgauth/otp/tree/openssh_zlib

“Hey man, it would be very cool if you could show in
real-time the number of bid requests per domain for
Friday’s demo... Can you do it?” - boss

What did I just agree too...

•
•

I only have 3 days to build this...

bid requests stream is too large to aggregate in a
central location (1+ Gbit/s - 80K+/s)

Strategy for demo
1. move aggregation upstream

2. use ETS match select to find table ids (filtering)

3. increment counters in process (no message!)

4. periodically flush aggregates via message to
collector node

5. collector node increments local counters and
periodically flush aggregates to bullet handler

Introducing swirl!

“lightweight distributed stream processor”

Swirl components

•
•

“dynamic” streams (swirl_stream)

•
•

powerful ﬁltering language (swirl_ql)

simple behavior that implements a map-reduce like
interface (swirl_ﬂow)

process registry (swirl_tracker)

Flows

* application:start(swirl).

Mapper Node
1. process “emits” event

2. lookup in ETS if there’s a
flow that matches the
stream name and filter

3. if there’s a match, call
flow_mod:map/4

4. if map returns counters,
increment in ETS

5. swirl_mapper periodically
flush aggregates to
reducer node

Reducer Node

1. swirl_tracker receives
mapper aggregates and
forwards it to reducer

2. reducer increments
counters in ets

3. reducer ﬂushes counters
to ﬂow_mod:reduce/4

Swirl-ql

•
•

sql where clause like syntax

supported operators:

•
•
•
•

AND / OR

<, <=, =, >, <>

IN (x, y) / NOT IN (x, y, z)

IS NULL / IS NOT NULL (undeﬁned)

* https://github.com/lpgauth/swirl-ql

Swirl-ql

•

examples:

•
•
•

“event IN (‘impression’, ‘click’)”!
“buyer_id IS NOT NULL AND buyer_id <> 3”!
“event = ‘impressions’ AND (buyer_id IN (3, 5) OR
buyer_id IS NULL)

Swirl-ql

•
•
•
•

leex / yecc for parsing (use lex / yacc doc)

pattern match ftw!

use hipe (~200% speed gain in micro benchmarks)

•

0.286 vs 0.097 microseconds *

experimenting with dynamic compilation

* http://theory.stanford.edu/~sergei/papers/sigmod10-index.pdf

Swirl limitations

•
•

best-effort (hard problem!)

•
•

netsplits

crash

in-memory only

Todo

•
•
•
•

node discovery

code distribution

resource limitation

better documentation!

Architecture #3

(now!)

•
•

swirl

bullet (cowboy)

Demo!

* https://github.com/lpgauth/swirl-demo

Thank You!

pssst: we’re hiring!

twitter: lpgauth

github: lpgauth

Building Sexy Real-Time Analytics Systems - Erlang Factory NYC / Toronto 2013

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Building Sexy Real-Time Analytics Systems - Erlang Factory NYC / Toronto 2013

Similar to Building Sexy Real-Time Analytics Systems - Erlang Factory NYC / Toronto 2013 (20)

Recently uploaded

Recently uploaded (20)

Building Sexy Real-Time Analytics Systems - Erlang Factory NYC / Toronto 2013