London hug

Real-time and Long-time with
Storm and Hadoop
©MapR Technologies - Confidential 1

Real-time and Long-time with
Storm and Hadoop MapR

 Contact:
– tdunning@maprtech.com
– @ted_dunning

 Slides and such:
– http://info.mapr.com/ted-uk-05-2012

 Hash tag: #mapr_uk

Collective notes: http://bit.ly/JDCRhc


Company Background
 MapR provides the industry’s best Hadoop Distribution
– Combines the best of the Hadoop community
contributions with significant internally
financed infrastructure development
 Background of Team
– Deep management bench with extensive analytic,
storage, virtualization, and open source experience
– Google, EMC, Cisco, VMWare, Network Appliance, IBM,
Microsoft, Apache Foundation, Aster Data, Brio, ParAccel
 Proven
– MapR used across industries (Financial Services, Media,
Telcom, Health Care, Internet Services, Government)
– Strategic OEM relationship with EMC and Cisco
– Over 1,000 installs


Expanding Hadoop Use Cases

Hadoop APIs
for Hadoop
Applications
NFS for file- ODBC (JDBC)
based for SQL-based
applications applications

Mission
Real-time Critical and SLA
Applications dependent
Applications

Blue = MapR Innovations

MapR’s Complete Distribution for Apache Hadoop
 Integrated, tested, hardened and MapR Control System
Supported MapR LDAP, NIS Quotas, CLI,
Heatmap™ Integration Alerts, Alarms REST APT
 100% Hadoop, HBase,
HDFS API compatible
Hive Pig Oozle Sqoop HBase Whirr
 Easy portability/
migration between Zoo-
Mahout Cascading Naglos Ganglia Flume
distributions Integration Integration keeper

 Unique advanced
features
 No changes required Direct Real-Time Snap- Data
Access Streaming Volumes Mirrors shots Placement
to Hadoop applications NFS

 Runs on commodity No NameNode High Performance Stateful Failover
Architecture Direct Shuffle and Self Healing
hardware

2.7
MapR’s Storage Services™


So what about that real-time stuff?


The Challenge

 Hadoop is great of processing vats of data
– But sucks for real-time (by design!)

 Storm is great for real-time processing
– But lacks any way to deal with batch processing

 It sounds like there isn’t a solution
– Neither fashionable solution handles everything


This is not a problem.

It’s an opportunity!


Hadoop is Not Very Real-time

Unprocessed now
Data

t

Fully Latest full Hadoop job
processed period takes this
long for this
data


Need to Plug the Hole in Hadoop

 We have real-time data with limited state
– Exactly what Storm does
– And what Hadoop does not

 We also have long-term analytics with lots of state
– Exactly what Hadoop does
– And what Storm does not

 Can Storm and Hadoop be combined?


Real-time and Long-time together

Blended now
View
view

t

Hadoop works Storm
great back here works
here


An Example

 I want to know how many queries I get
– Per second, minute, day, week
 Results should be available
– within <2 seconds 99.9+% of the time
– within 30 seconds almost always
 History should last >3 years
 Should work for 0.001 q/s up to 100,000 q/s
 Failure tolerant, yadda, yadda


Rough Design – Data Flow

Search Query Event
Query Event Counter
Counter Logger
Engine Spout
Spout Bolt
Bolt Bolt

Logger
Logger
Bolt Semi Snap
Bolt Agg

Raw Hadoop
Logs Aggregator

Long
agg


Counter Bolt Detail

 Input: Labels to count
 Output: Short-term semi-aggregated counts
– (time-window, label, count)
 Input is logged until next flush
 Non-zero counts emitted on flush if
– event count reaches threshold (typical 100K)
– time since last count reaches threshold (typical 1-10s)
 Tuples acked when counts emitted
 Double count probability is > 0 but very small


Counter Bolt Counterintuitivity

 Counts are emitted for same label, same time window many times
– these are semi-aggregated
– this is a feature
– tuples can be acked within 1s
– time windows can be much longer than 1s
 No need to send same label to same bolt
– speeds failure recovery


Design Flexibility

 Tuples can be ack’ed as soon as they hit the log
– counter can recover state on failure
– log is burn after write
 Count flush interval can be extended without extending tuple
timeout
– Decreases currency of counts in semi-aggregates
 Total bandwidth for log is typically not huge
– All of twitter @10,000 messages per second = 10K x 2KB = 20MB/s


Counter Bolt No-nos

 Cannot accumulate entire period in-memory
– Tuples must be ack’ed much sooner
– State must be persisted before ack’ing
– State can easily grow too large to handle without disk access
 Cannot persist entire count table at once
– Incremental persistence required


Guarantees

 Counter output volume is small-ish
– the greater of k tuples per 100K inputs or k tuple/s
– 1 tuple/s/label/bolt for this exercise
 Persistence layer must provide guarantees
– distributed against node failure
– must have either readable flush or closed-append
 HDFS is distributed, but provides no guarantees and strange
semantics

 MapRfs is distributed, provides all necessary guarantees


Failure Modes

 Bolt failure
– buffered tuples will go un’acked
– after timeout, tuples will be resent
– timeout ≈ 10s
– if failure occurs after persistence, before acking, then double-counting is
possible
 Storage (with MapR)
– most failures invisible
– a few continue within 0-2s, some take 10s
– catastrophic cluster restart can take 2-3 min
– logger can buffer this much easily


Presentation Layer

 Presentation must
– read recent output of Logger bolt
– read relevant output of Hadoop jobs
– combine semi-aggregated records
 User will see
– counts that increment within 0-2 s of events
– seamless meld of short and long-term data


Example 2 – Real-time learning

 My system has to
– learn a response model
and
– select training data
– in real-time
 Data rate up to 100K queries per second


Door Number 3 – AB testing in real-time

 I have 15 versions of my landing page
 Each visitor is assigned to a version
– Which version?
 A conversion or sale or whatever can happen
– How long to wait?
 Some versions of the landing page are horrible
– Don’t want to give them traffic


Real-time Constraints

 Selection must happen in <20 ms almost all the time
 Training events must be handled in <20 ms
 Failover must happen within 5 seconds
 Client should timeout and back-off
– no need for an answer after 500ms
 State persistence required


Rough Design

Selector Query Event Counter
DRPC Spout Timed Join Model
Layer Spout Bolt

Conversion Logger
Logger Model
Detector Bolt
Bolt State

Raw
Logs


A Quick Diversion

 You see a coin
– What is the probability of heads?
– Could it be larger or smaller than that?
 I flip the coin and while it is in the air ask again
 I catch the coin and ask again
 I look at the coin (and you don’t) and ask again
 Why does the answer change?
– And did it ever have a single value?


A First Conclusion

 Probability as expressed by humans is subjective and depends on
information and experience


A Second Diversion

 What is the mass of the moon?
– 1/2 degree @ 385 Mm = ~ 3.8 Mm diameter (really about 3.4-ish)
– V = 1/6 x pi x 3.83 x 1018 m3 = ~ 29 x 1018 m3 (really about 22)
– m = rho V = 4 Mg/m3 x 29 x 1018 m3 = 1.2 x 1023 kg (really about 0.7)
 Is that the exact number?
– Shouldn’t we have confidence bounds?

 Wikipedia says: 7.3477 × 1022 kg
– Is that the exact number?
– Shouldn’t they have confidence bounds?


A Second Conclusion

 A single number is a bad way to express uncertain knowledge

 A distribution of values might be better


I Dunno


5 and 5


2 and 10


Bayesian Bandit

 Compute distributions based on data
 Sample p1 and p2 from these distributions
 Put a coin in bandit 1 if p1 > p2
 Else, put the coin in bandit 2


And it works!

0.12

0.11

0.1

0.09

0.08

0.07
regret

0.06
ε- greedy, ε = 0.05
0.05

0.04 Bayesian Bandit with Gam m a- Norm al
0.03

0.02

0.01

0
0 100 200 300 400 500 600 700 800 900 1000 1100

n


Video Demo


The Code

 Select an alternative
n = dim(k)[1]
p0 = rep(0, length.out=n)
for (i in 1:n) {
p0[i] = rbeta(1, k[i,2]+1, k[i,1]+1)
}
return (which(p0 == max(p0)))

 Select and learn
for (z in 1:steps) {
i = select(k)
j = test(i)
k[i,j] = k[i,j]+1
}
return (k)

 But we already know how to count!


The Basic Idea

 We can encode a distribution by sampling
 Sampling allows unification of exploration and exploitation

 Can be extended to more general response models


 Contact:
– tdunning@maprtech.com
– @ted_dunning

 Slides and such:
– http://info.mapr.com/ted-uk-05-2012


MapR’s Innovations


Thank You


London hug

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to London hug

Similar to London hug (20)

More from Ted Dunning

More from Ted Dunning (20)

Recently uploaded

Recently uploaded (20)

London hug

Editor's Notes