SlideShare a Scribd company logo
1 of 105
Download to read offline
ACM DEBS 2015: Realtime
Streaming Analytics
Patterns
Srinath Perera
Sriskandarajah Suhothayan
WSO2 Inc.
Data Analytics ( Big Data)
o Scientists are doing this for
25 year with MPI (1991)
using special Hardware
o Took off with Google’s
MapReduce paper (2004),
Apache Hadoop, Hive and
whole ecosystem created.
o Later Spark emerged, and it is
faster.
o But, processing takes time.
Value of Some Insights degrade
Fast!
o For some usecases ( e.g. stock
markets, traffic, surveillance,
patient monitoring) the value
of insights degrade very
quickly with time.
o E.g. stock markets and speed of
light
oo We need technology that can produce outputs fast
o Static Queries, but need very fast output (Alerts, Realtime
control)
o Dynamic and Interactive Queries ( Data exploration)
History
▪Realtime Analytics are not new
either!!
- Active Databases (2000+)
- Stream processing (Aurora, Borealis
(2005+) and later Storm)
- Distributed Streaming Operators (e.
g. Database research topic around
2005)
- CEP Vendor Roadmap ( from http:
//www.complexevents.
com/2014/12/03/cep-tooling-
market-survey-2014/)
Data Analytics Landscape
Realtime Interactive Analytics
o Usually done to support
interactive queries
o Index data to make them
them readily accessible so
you can respond to queries
fast. (e.g. Apache Drill)
o Tools like Druid, VoltDB and
SAP Hana can do this with all
data in memory to make
things really fast.
Realtime Streaming Analytics
o Process data without Streaming ( As data some in)
o Queries are fixed ( Static)
o Triggers when given conditions are met.
o Technologies
o Stream Processing ( Apache Storm, Apache Samza)
o Complex Event Processing/CEP (WSO2 CEP, Esper,
StreamBase)
o MicroBatches ( Spark Streaming)
Realtime Football Analytics
● Video: https://www.youtube.com/watch?v=nRI6buQ0NOM
● More Info: http://www.slideshare.net/hemapani/strata-2014-
talktracking-a-soccer-game-with-big-data
Why Realtime Streaming Analytics
Patterns?
o Reason 1: Usual advantages
o Give us better understanding
o Give us better vocabulary to teach and
communicate
o Tools can implement them
o ..
o Reason 2: Under theme realtime analytics, lot of
people get too much carried away with word count
example. Patterns shows word count is just tip of
the iceberg.
Earlier Work on Patterns
o Patterns from SQL ( project, join, filter etc)
o Event Processing Technical Society’s (EPTS)
reference architecture
o higher-level patterns such as tracking, prediction and
learning in addition to low-level operators that
comes from SQL like languages.
o Esper’s Solution Patterns Document (50 patterns)
o Coral8 White Paper
Basic Patterns
o Pattern 1: Preprocessing ( filter, transform, enrich,
project .. )
o Pattern 2: Alerts and Thresholds
o Pattern 3: Simple Counting and Counting with
Windows
o Pattern 4: Joining Event Streams
o Pattern 5: Data Correlation, Missing Events, and
Erroneous Data
Patterns for Handling Trends
o Pattern 7: Detecting Temporal Event Sequence
Patterns
o Pattern 8: Tracking ( track something over space or
time)
o Pattern 9: Detecting Trends ( rise, fall, turn, tipple
bottom)
o Pattern 13: Online Control
Mixed Patterns
o Pattern 6: Interacting with Databases
o Pattern 10: Running the same Query in Batch and
Realtime Pipelines
o Pattern 11: Detecting and switching to Detailed
Analysis
o Pattern 12: Using a Machine Learning Model
Earlier Work on Patterns
Realtime Streaming
Analytics Tools
Implementing Realtime Analytics
o tempting to write a custom code. Filter look very
easy. Too complex!! Don’t!
o Option 1: Stream Processing (e.g. Storm). Kind of
works. It is like Map Reduce, you have to write code.
o Option 2: Spark Streaming - more compact than
Storm, but cannot do some stateful operations.
o Option 3: Complex Event Processing - compact, SQL
like language, fast
Stream Processing
o Program a set of processors and wire them up, data
flows though the graph.
o A middleware framework handles data flow,
distribution, and fault tolerance (e.g. Apache Storm,
Samza)
o Processors may be in the same machine or multiple
machines
Writing a Storm Program
o Write Spout(s)
o Write Bolt(s)
o Wire them up
o Run
Write Bolts
We will use a shorthand
like on the left to explain
public static class WordCount extends BaseBasicBolt {
@Override
public void execute(Tuple tuple, BasicOutputCollector
collector) {
.. do something …
collector.emit(new Values(word, count));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer
declarer) {
declarer.declare(new Fields("word", "count"));
}
}
Wire up and Run
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
.shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12)
.fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopologyWithProgressBar(
args[0], conf, builder.createTopology());
}else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf,
builder.createTopology());
...
}
}
Complex Event Processing
Micro Batches ( e.g. Spark
Streaming)
o Process data in small batches,
and then combine results for
final results (e.g. Spark)
o Works for simple aggregates,
but tricky to do this for complex
operations (e.g. Event
Sequences)
o Can do it with MapReduce as
well if the deadlines are not too
tight.
o A SQL like data processing
languages (e.g. Apache Hive)
o Since many understand SQL,
Hive made large scale data
processing Big Data accessible
to many
o Expressive, short, and sweet.
o Define core operations that
covers 90% of problems
o Let experts dig in when they
like!
SQL Like Query Languages
o Easy to follow from SQL
o Expressive, short, and sweet.
o Define core operations that covers 90% of problems
o Let experts dig in when they like!
CEP = SQL for Realtime
Analytics
Pattern
Implementations
Code and other details
o Sample code - https://github.
com/suhothayan/DEBS-2015-Realtime-Analytics-
Patterns
o WSO2 CEP
o pack http://svn.wso2.
org/repos/wso2/people/suho/packs/cep/4.0.0
/debs2015/wso2cep-4.0.0-SNAPSHOT.zip
o docs- https://docs.wso2.
com/display/CEP400/WSO2+Complex+Event+Processor+
Documentation
o Apache Storm - https://storm.apache.org/
o We have packs in a pendrive
Pattern 1: Preprocessing
o What? Cleanup and prepare data via operations like
filter, project, enrich, split, and transformations
o Usecases?
o From twitter data stream: we extract author,
timestamp and location fields and then filter
them based on the location of the author.
o From temperature stream we expect
temperature & room number of the sensor and
filter by them.
Filter
from TempStream [ roomNo > 245 and roomNo <= 365]
select roomNo, temp
insert into ServerRoomTempStream ;
In Storm
In CEP ( Siddhi)
Architecture of WSO2 CEP
CEP Event Adapters
Support for several transports (network access)
● SOAP
● HTTP
● JMS
● SMTP
● SMS
● Thrift
● Kafka
● Websocket
● MQTT
Supports database writes using Map messages
● Cassandra
● RDBMs
Supports custom event adaptors via its pluggable architecture!
Stream Definition (Data Model)
{
'name':'soft.drink.coop.sales', 'version':'1.0.0',
'nickName': 'Soft_Drink_Sales', 'description': 'Soft drink sales',
'metaData':[
{'name':'region','type':'STRING'}
],
'correlationData':[
{'name':’transactionID’,'type':'STRING'}
],
'payloadData':[
{'name':'brand','type':'STRING'},
{'name':'quantity','type':'INT'},
{'name':'total','type':'INT'},
{'name':'user','type':'STRING'}
]
}
Projection
define stream TempStream
(deviceID long, roomNo int, temp double);
from TempStream
select roomNo, temp
insert into OutputStream ;
Inferred Streams
from TempStream
select roomNo, temp
insert into OutputStream ;
define stream OutputStream
(roomNo int, temp double);
Enrich
from TempStream
select roomNo, temp,‘C’ as scale
insert into OutputStream
define stream OutputStream
(roomNo int, temp double, scale string);
from TempStream
select deviceID, roomNo, avg(temp) as avgTemp
insert into OutputStream ;
Transformation
from TempStream
select concat(deviceID, ‘-’, roomNo) as uid,
toFahrenheit(temp) as tempInF,
‘F’ as scale
insert into OutputStream ;
Split
from TempStream
select roomNo, temp
insert into RoomTempStream ;
from TempStream
select deviceID, temp
insert into DeviceTempStream ;
Pattern 2: Alerts and Thresholds
o What? detects a condition and generates alerts
based on a condition. (e.g. Alarm on high
temperature).
o These alerts can be based on a simple value or
more complex conditions such as rate of increase
etc.
o Usecases?
o Raise alert when vehicle going too fast
o Alert when a room is too hot
Filter Alert
from TempStream [ roomNo > 245 and roomNo <= 365
and temp > 40 ]
select roomNo, temp
insert into AlertServerRoomTempStream ;
Pattern 3: Simple Counting and
Counting with Windows
o What? aggregate functions like Min, Max,
Percentiles, etc
o Often they can be counted without storing any
data
o Most useful when used with a window
o Usecases?
o Most metrics need a time bound so we can
compare ( errors per day, transactions per
second)
o Linux Load Average give us an idea of overall
trend by reporting last 1m, 3m, and 5m mean.
Types of windows
o Sliding windows vs. Batch (tumbling) windows
o Time vs. Length windows
Also supports
o Unique window
o First unique window
o External time window
Window
In Storm
Aggregation
In CEP (Siddhi)
from TempStream
select roomNo, avg(temp) as avgTemp
insert into HotRoomsStream ;
Sliding Time Window
from TempStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
insert all events into AvgRoomTempStream ;
Group By
from TempStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
group by roomNo
insert all events into HotRoomsStream ;
Batch Time Window
from TempStream#window.timeBatch(5 min)
select roomNo, avg(temp) as avgTemp
group by roomNo
insert all events into HotRoomsStream ;
Pattern 4: Joining Event Streams
o What? Create a new event stream by joining
multiple streams
o Complication comes with time. So need at least
one window
o Often used with a window
o Usecases?
o To detecting when a player has kicked the ball in
a football game .
o To correlate TempStream and the state of the
regulator and trigger control commands
Join with Storm
Join
define stream TempStream
(deviceID long, roomNo int, temp double);
define stream RegulatorStream
(deviceID long, roomNo int, isOn bool);
In CEP (Siddhi)
Join
define stream TempStream
(deviceID long, roomNo int, temp double);
define stream RegulatorStream
(deviceID long, roomNo int, isOn bool);
from TempStream[temp > 30.0]#window.time(1 min) as T
join RegulatorStream[isOn == false]#window.length(1) as R
on T.roomNo == R.roomNo
select T.roomNo, R.deviceID, ‘start’ as action
insert into RegulatorActionStream ;
In CEP (Siddhi)
Pattern 5: Data Correlation, Missing
Events, and Erroneous Data
o What? find correlations and use that to detect and
handle missing and erroneous Data
o Use Cases?
o Detecting a missing event (e.g., Detect a
customer request that has not been responded
within 1 hour of its reception)
o Detecting erroneous data (e.g., Detecting failed
sensors using a set of sensors that monitor
overlapping regions. We can use those
redundant data to find erroneous sensors and
remove those data from further processing)
Missing Event in Storm
Missing Event in CEP
In CEP (Siddhi)
from RequestStream#window.time(1h)
insert expired events into ExpiryStream
from r1=RequestStream->r2=Response[id=r1.id] or
r3=ExpiryStream[id=r1.id]
select r1.id as id ...
insert into AlertStream having having r2.id == null;
Pattern 6: Interacting with Databases
o What? Combine realtime data against historical
data
o Use Cases?
o On a transaction, looking up the customer age
using ID from customer database to detect fraud
(enrichment)
o Checking a transaction against blacklists and
whitelists in the database
o Receive an input from the user (e.g., Daily
discount amount may be updated in the
database, and then the query will pick it
automatically without human intervention).
In Storm
Querying Databases
In CEP (Siddhi)
Event Table
define table CardUserTable (name string, cardNum long) ;
@from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ ,
table.name = ‘UserTable’, caching.algorithm’=‘LRU’)
define table CardUserTable (name string, cardNum long)
Cache types supported
● Basic: A size-based algorithm based on FIFO.
● LRU (Least Recently Used): The least recently used event is dropped
when cache is full.
● LFU (Least Frequently Used): The least frequently used event is dropped
when cache is full.
Join : Event Table
define stream Purchase (price double, cardNo long, place string);
define table CardUserTable (name string, cardNum long) ;
from Purchase#window.length(1) join CardUserTable
on Purchase.cardNo == CardUserTable.cardNum
select Purchase.cardNo as cardNo,
CardUserTable.name as name,
Purchase.price as price
insert into PurchaseUserStream ;
Insert : Event Table
define stream FraudStream (price double, cardNo long, userName
string);
define table BlacklistedUserTable (name string, cardNum long) ;
from FraudStream
select userName as name, cardNo as cardNum
insert into BlacklistedUserTable ;
Update : Event Table
define stream LoginStream (userID string,
islogin bool, loginTime long);
define table LastLoginTable (userID string, time long) ;
from LoginStream
select userID, loginTime as time
update LastLoginTable
on LoginStream.userID == LastLoginTable.userID ;
Pattern 7: Detecting Temporal
Event Sequence Patterns
o What? detect a temporal sequence of events or
condition arranged in time
o Use Cases?
o Detect suspicious activities like small transaction
immediately followed by a large transaction
o Detect ball possession in a football game
o Detect suspicious financial patterns like large buy
and sell behaviour within a small time period
In Storm
Pattern
In CEP (Siddhi)
Pattern
define stream Purchase (price double, cardNo long,place string);
from every (a1 = Purchase[price < 100] -> a3= ..) ->
a2 = Purchase[price >10000 and a1.cardNo == a2.cardNo]
within 1 day
select a1.cardNo as cardNo, a2.price as price, a2.place as place
insert into PotentialFraud ;
Pattern 8: Tracking
o What? detecting an overall trend over time
o Use Cases?
o Tracking a fleet of vehicles, making sure that
they adhere to speed limits, routes, and Geo-
fences.
o Tracking wildlife, making sure they are alive (they
will not move if they are dead) and making sure
they will not go out of the reservation.
o Tracking airline luggage and making sure they
have not been sent to wrong destinations
o Tracking a logistic network and figuring out
bottlenecks and unexpected conditions.
TFL: Traffic Analytics
Built using TFL ( Transport for London) open data feeds.
http://goo.gl/9xNiCm http://goo.gl/04tX6k
Pattern 9: Detecting Trends
o What? tracking something over space and time and
detects given conditions.
o Useful in stock markets, SLA enforcement, auto
scaling, predictive maintenance
o Use Cases?
o Rise, Fall of values and Turn (switch from rise to
a fall)
o Outliers - deviate from the current trend by a
large value
o Complex trends like “Triple Bottom” and “Cup
and Handle” [17].
Trend in Storm
Build and apply an state machine
In CEP (Siddhi)
Sequence
from t1=TempStream,
t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp) or
(t2[last].temp < temp and not(isNull(t2[last].temp))]+
within 5 min
select t1.temp as initialTemp,
t2[last].temp as finalTemp,
t1.deviceID,
t1.roomNo
insert into IncreaingHotRoomsStream ;
In CEP (Siddhi)
Partition
partition by (roomNo of TempStream)
begin
from t1=TempStream,
t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp)
or (t2[last].temp < temp and not(isNull(t2[last].temp))]+
within 5 min
select t1.temp as initialTemp,
t2[last].temp as finalTemp,
t1.deviceID,
t1.roomNo
insert into IncreaingHotRoomsStream ;
end;
Detecting Trends in Real Life
o Paper “A Complex Event Processing
Toolkit for Detecting Technical Chart
Patterns” (HPBC 2015) used the idea to
identify stock chart patterns
o Used kernel regression for smoothing
and detected maxima’s and minimas.
o Then any pattern can be written as a
temporal event sequence.
Pattern 10: Lambda Architecture
o What? runs the same query in both relatime and
batch pipelines. This uses realtime analytics to fill
the lag in batch analytics results.
o Also called “Lambda Architecture”. See Nathen
Marz’s “Questioning the Lambda Architecture”
o Use Cases?
o For example, if batch processing takes 15
minutes, results would always lags 15 minutes
from the current data. Here realtime processing
fill the gap.
Lambda Architecture. How?
Pattern 11: Detecting and switching
to Detailed Analysis
o What? detect a condition that suggests some
anomaly, and further analyze it using historical data.
o Use Cases?
o Use basic rules to detect Fraud (e.g., large transaction),
then pull out all transactions done against that credit
card for a larger time period (e.g., 3 months data) from
batch pipeline and run a detailed analysis
o While monitoring weather, detect conditions like high
temperature or low pressure in a given region, and then
start a high resolution localized forecast for that region.
o Detect good customers (e.g., through expenditure of
more than $1000 within a month, and then run a
detailed model to decide the potential of offering a deal).
Pattern 11: How?
Pattern 12: Using a Machine
Learning Model
o What? The idea is to train a model (often a
Machine Learning model), and then use it with the
Realtime pipeline to make decisions
o For example, you can build a model using R, export it as
PMML (Predictive Model Markup Language) and use it
within your realtime pipeline.
o Use Cases?
o Fraud Detection
o Segmentation
o Predict Churn
Predictive Analytics
o Build models and use
them with WSO2 CEP,
BAM and ESB using
upcoming WSO2
Machine Learner Product
( 2015 Q2)
o Build model using R,
export them as PMML,
and use within WSO2 CEP
o Call R Scripts from CEP
queries
In CEP (Siddhi)
PMML Model
from TrasnactionStream
#ml:applyModel(‘/path/logisticRegressionModel1.xml’,
timestamp, amount, ip)
insert into PotentialFraudsStream;
Pattern 13: Online Control
o What? Control something Online. These would
involve problems like current situation awareness,
predicting next value(s), and deciding on corrective
actions.
o Use Cases?
o Autopilot
o Self-driving
o Robotics
Fraud Demo
Scaling & HA for Pattern
Implementations
So how we scale a system ?
o Vertical Scaling
o Horizontal Scaling
Vertical Scaling
Horizontal Scaling
E.g. Calculate Mean
Horizontal Scaling ...
E.g. Calculate Mean
Horizontal Scaling ...
E.g. Calculate Mean
Horizontal Scaling ...
How about scaling median ?
Horizontal Scaling ...
How about scaling median ?
If & only if we can partition !
Scalable Realtime solutions ...
Spark Streaming
o Supports distributed processing
o Runs micro batches
o Not supports pattern & sequence detection
Scalable Realtime solutions ...
Spark Streaming
o Supports distributed processing
o Runs micro batches
o Not supports pattern & sequence detection
Apache Storm
o Supports distributed processing
o Stream processing engine
Why not use Apache Storm ?
Advantages
o Supports distributed processing
o Supports Partitioning
o Extendable
o Opensource
Disadvantages
o Need to write Java code
o Need to start from basic principles ( & data structures )
o Adoption for change is slow
o No support to govern artifacts
WSO2 CEP += Apache Storm
Advantages
o Supports distributed processing
o Supports Partitioning
o Extendable
o Opensource
Disadvantages
o No need to write Java code (Supports SQL like query language)
o No need to start from basic principles (Supports high level
language)
o Adoption for change is fast
o Govern artifacts using Toolboxes
o etc ...
How we scale ?
How we scale ...
Scaling with Storm
Siddhi QL
define stream StockStream
(symbol string, volume int, price double);
@name(‘Filter Query’)
from StockStream[price > 75]
select *
insert into HighPriceStockStream ;
@name(‘Window Query’)
from HighPriceStockStream#window.time(10 min)
select symbol, sum(volume) as sumVolume
insert into ResultStockStream ;
Siddhi QL - with partition
define stream StockStream
(symbol string, volume int, price double);
@name(‘Filter Query’)
from StockStream[price > 75]
select *
insert into HighPriceStockStream ;
@name(‘Window Query’)
partition with (symbol of HighPriceStockStream)
begin
from HighPriceStockStream#window.time(10 min)
select symbol, sum(volume) as sumVolume
insert into ResultStockStream ;
end;
Siddhi QL - distributed
define stream StockStream
(symbol string, volume int, price double);
@name(Filter Query’)
@dist(parallel= ‘3')
from StockStream[price > 75]
select *
insert into HightPriceStockStream ;
@name(‘Window Query’)
@dist(parallel= ‘2')
partition with (symbol of HighPriceStockStream)
begin
from HighPriceStockStream#window.time(10 min)
select symbol, sum(volume) as sumVolume
insert into ResultStockStream ;
end;
On Storm UI
On Storm UI
High Availability
HA / Persistence
o Option 1: Side by side
o Recommended
o Takes 2X hardware
o Gives zero down time
o Option 2: Snapshot and restore
o Uses less HW
o Will lose events between snapshots
o Downtime while recovery
o ** Some scenarios you can use event tables to keep intermediate state
Siddhi Extensions
● Function extension
● Aggregator extension
● Window extension
● Transform extension
Siddhi Query : Function Extension
from TempStream
select deviceID, roomNo,
custom:toKelvin(temp) as tempInKelvin,
‘K’ as scale
insert into OutputStream ;
Siddhi Query : Aggregator Extension
from TempStream
select deviceID, roomNo, temp
custom:stdev(temp) as stdevTemp,
‘C’ as scale
insert into OutputStream ;
Siddhi Query : Window Extension
from TempStream
#window.custom:lastUnique(roomNo,2 min)
select *
insert into OutputStream ;
Siddhi Query : Transform Extension
from XYZSpeedStream
#transform.custom:getVelocityVector(v,vx,vy,vz)
select velocity, direction
insert into SpeedStream ;
Contact us !

More Related Content

What's hot

Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Flink Forward
 

What's hot (20)

Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateIntroduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream Processing
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone ML
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
 

Viewers also liked

BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 

Viewers also liked (8)

WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 

Similar to ACM DEBS 2015: Realtime Streaming Analytics Patterns

Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara Prathap
WithTheBest
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
Gabriele Modena
 

Similar to ACM DEBS 2015: Realtime Streaming Analytics Patterns (20)

Is this normal?
Is this normal?Is this normal?
Is this normal?
 
Big Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWAREBig Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWARE
 
Observability: Beyond the Three Pillars with Spring
Observability: Beyond the Three Pillars with SpringObservability: Beyond the Three Pillars with Spring
Observability: Beyond the Three Pillars with Spring
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the message
 
1230 Rtf Final
1230 Rtf Final1230 Rtf Final
1230 Rtf Final
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overviewComplex Event Processing - A brief overview
Complex Event Processing - A brief overview
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara Prathap
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 
Overview of QP Frameworks and QM Modeling Tools (Notes)
Overview of QP Frameworks and QM Modeling Tools (Notes)Overview of QP Frameworks and QM Modeling Tools (Notes)
Overview of QP Frameworks and QM Modeling Tools (Notes)
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
bakalarska_praca
bakalarska_pracabakalarska_praca
bakalarska_praca
 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.ly
 

More from Srinath Perera

More from Srinath Perera (20)

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-Making
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance Professionals
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & Challenges
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future Integrations
 
Future of Serverless
Future of ServerlessFuture of Serverless
Future of Serverless
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going?
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of Blockchain
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New Technologies
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata Era
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and Risks
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology Landscape
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies Timeline
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the Ugly
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through Analytics
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration Technology
 

Recently uploaded

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 

Recently uploaded (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 

ACM DEBS 2015: Realtime Streaming Analytics Patterns

  • 1. ACM DEBS 2015: Realtime Streaming Analytics Patterns Srinath Perera Sriskandarajah Suhothayan WSO2 Inc.
  • 2. Data Analytics ( Big Data) o Scientists are doing this for 25 year with MPI (1991) using special Hardware o Took off with Google’s MapReduce paper (2004), Apache Hadoop, Hive and whole ecosystem created. o Later Spark emerged, and it is faster. o But, processing takes time.
  • 3. Value of Some Insights degrade Fast! o For some usecases ( e.g. stock markets, traffic, surveillance, patient monitoring) the value of insights degrade very quickly with time. o E.g. stock markets and speed of light oo We need technology that can produce outputs fast o Static Queries, but need very fast output (Alerts, Realtime control) o Dynamic and Interactive Queries ( Data exploration)
  • 4. History ▪Realtime Analytics are not new either!! - Active Databases (2000+) - Stream processing (Aurora, Borealis (2005+) and later Storm) - Distributed Streaming Operators (e. g. Database research topic around 2005) - CEP Vendor Roadmap ( from http: //www.complexevents. com/2014/12/03/cep-tooling- market-survey-2014/)
  • 6. Realtime Interactive Analytics o Usually done to support interactive queries o Index data to make them them readily accessible so you can respond to queries fast. (e.g. Apache Drill) o Tools like Druid, VoltDB and SAP Hana can do this with all data in memory to make things really fast.
  • 7. Realtime Streaming Analytics o Process data without Streaming ( As data some in) o Queries are fixed ( Static) o Triggers when given conditions are met. o Technologies o Stream Processing ( Apache Storm, Apache Samza) o Complex Event Processing/CEP (WSO2 CEP, Esper, StreamBase) o MicroBatches ( Spark Streaming)
  • 8. Realtime Football Analytics ● Video: https://www.youtube.com/watch?v=nRI6buQ0NOM ● More Info: http://www.slideshare.net/hemapani/strata-2014- talktracking-a-soccer-game-with-big-data
  • 9. Why Realtime Streaming Analytics Patterns? o Reason 1: Usual advantages o Give us better understanding o Give us better vocabulary to teach and communicate o Tools can implement them o .. o Reason 2: Under theme realtime analytics, lot of people get too much carried away with word count example. Patterns shows word count is just tip of the iceberg.
  • 10. Earlier Work on Patterns o Patterns from SQL ( project, join, filter etc) o Event Processing Technical Society’s (EPTS) reference architecture o higher-level patterns such as tracking, prediction and learning in addition to low-level operators that comes from SQL like languages. o Esper’s Solution Patterns Document (50 patterns) o Coral8 White Paper
  • 11. Basic Patterns o Pattern 1: Preprocessing ( filter, transform, enrich, project .. ) o Pattern 2: Alerts and Thresholds o Pattern 3: Simple Counting and Counting with Windows o Pattern 4: Joining Event Streams o Pattern 5: Data Correlation, Missing Events, and Erroneous Data
  • 12. Patterns for Handling Trends o Pattern 7: Detecting Temporal Event Sequence Patterns o Pattern 8: Tracking ( track something over space or time) o Pattern 9: Detecting Trends ( rise, fall, turn, tipple bottom) o Pattern 13: Online Control
  • 13. Mixed Patterns o Pattern 6: Interacting with Databases o Pattern 10: Running the same Query in Batch and Realtime Pipelines o Pattern 11: Detecting and switching to Detailed Analysis o Pattern 12: Using a Machine Learning Model
  • 14. Earlier Work on Patterns
  • 16. Implementing Realtime Analytics o tempting to write a custom code. Filter look very easy. Too complex!! Don’t! o Option 1: Stream Processing (e.g. Storm). Kind of works. It is like Map Reduce, you have to write code. o Option 2: Spark Streaming - more compact than Storm, but cannot do some stateful operations. o Option 3: Complex Event Processing - compact, SQL like language, fast
  • 17. Stream Processing o Program a set of processors and wire them up, data flows though the graph. o A middleware framework handles data flow, distribution, and fault tolerance (e.g. Apache Storm, Samza) o Processors may be in the same machine or multiple machines
  • 18. Writing a Storm Program o Write Spout(s) o Write Bolt(s) o Wire them up o Run
  • 19. Write Bolts We will use a shorthand like on the left to explain public static class WordCount extends BaseBasicBolt { @Override public void execute(Tuple tuple, BasicOutputCollector collector) { .. do something … collector.emit(new Values(word, count)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); } }
  • 20. Wire up and Run TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word")); Config conf = new Config(); if (args != null && args.length > 0) { conf.setNumWorkers(3); StormSubmitter.submitTopologyWithProgressBar( args[0], conf, builder.createTopology()); }else { conf.setMaxTaskParallelism(3); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count", conf, builder.createTopology()); ... } }
  • 22. Micro Batches ( e.g. Spark Streaming) o Process data in small batches, and then combine results for final results (e.g. Spark) o Works for simple aggregates, but tricky to do this for complex operations (e.g. Event Sequences) o Can do it with MapReduce as well if the deadlines are not too tight.
  • 23. o A SQL like data processing languages (e.g. Apache Hive) o Since many understand SQL, Hive made large scale data processing Big Data accessible to many o Expressive, short, and sweet. o Define core operations that covers 90% of problems o Let experts dig in when they like! SQL Like Query Languages
  • 24. o Easy to follow from SQL o Expressive, short, and sweet. o Define core operations that covers 90% of problems o Let experts dig in when they like! CEP = SQL for Realtime Analytics
  • 26. Code and other details o Sample code - https://github. com/suhothayan/DEBS-2015-Realtime-Analytics- Patterns o WSO2 CEP o pack http://svn.wso2. org/repos/wso2/people/suho/packs/cep/4.0.0 /debs2015/wso2cep-4.0.0-SNAPSHOT.zip o docs- https://docs.wso2. com/display/CEP400/WSO2+Complex+Event+Processor+ Documentation o Apache Storm - https://storm.apache.org/ o We have packs in a pendrive
  • 27. Pattern 1: Preprocessing o What? Cleanup and prepare data via operations like filter, project, enrich, split, and transformations o Usecases? o From twitter data stream: we extract author, timestamp and location fields and then filter them based on the location of the author. o From temperature stream we expect temperature & room number of the sensor and filter by them.
  • 28. Filter from TempStream [ roomNo > 245 and roomNo <= 365] select roomNo, temp insert into ServerRoomTempStream ; In Storm In CEP ( Siddhi)
  • 30. CEP Event Adapters Support for several transports (network access) ● SOAP ● HTTP ● JMS ● SMTP ● SMS ● Thrift ● Kafka ● Websocket ● MQTT Supports database writes using Map messages ● Cassandra ● RDBMs Supports custom event adaptors via its pluggable architecture!
  • 31. Stream Definition (Data Model) { 'name':'soft.drink.coop.sales', 'version':'1.0.0', 'nickName': 'Soft_Drink_Sales', 'description': 'Soft drink sales', 'metaData':[ {'name':'region','type':'STRING'} ], 'correlationData':[ {'name':’transactionID’,'type':'STRING'} ], 'payloadData':[ {'name':'brand','type':'STRING'}, {'name':'quantity','type':'INT'}, {'name':'total','type':'INT'}, {'name':'user','type':'STRING'} ] }
  • 32. Projection define stream TempStream (deviceID long, roomNo int, temp double); from TempStream select roomNo, temp insert into OutputStream ;
  • 33. Inferred Streams from TempStream select roomNo, temp insert into OutputStream ; define stream OutputStream (roomNo int, temp double);
  • 34. Enrich from TempStream select roomNo, temp,‘C’ as scale insert into OutputStream define stream OutputStream (roomNo int, temp double, scale string); from TempStream select deviceID, roomNo, avg(temp) as avgTemp insert into OutputStream ;
  • 35. Transformation from TempStream select concat(deviceID, ‘-’, roomNo) as uid, toFahrenheit(temp) as tempInF, ‘F’ as scale insert into OutputStream ;
  • 36. Split from TempStream select roomNo, temp insert into RoomTempStream ; from TempStream select deviceID, temp insert into DeviceTempStream ;
  • 37. Pattern 2: Alerts and Thresholds o What? detects a condition and generates alerts based on a condition. (e.g. Alarm on high temperature). o These alerts can be based on a simple value or more complex conditions such as rate of increase etc. o Usecases? o Raise alert when vehicle going too fast o Alert when a room is too hot
  • 38. Filter Alert from TempStream [ roomNo > 245 and roomNo <= 365 and temp > 40 ] select roomNo, temp insert into AlertServerRoomTempStream ;
  • 39. Pattern 3: Simple Counting and Counting with Windows o What? aggregate functions like Min, Max, Percentiles, etc o Often they can be counted without storing any data o Most useful when used with a window o Usecases? o Most metrics need a time bound so we can compare ( errors per day, transactions per second) o Linux Load Average give us an idea of overall trend by reporting last 1m, 3m, and 5m mean.
  • 40. Types of windows o Sliding windows vs. Batch (tumbling) windows o Time vs. Length windows Also supports o Unique window o First unique window o External time window
  • 42. Aggregation In CEP (Siddhi) from TempStream select roomNo, avg(temp) as avgTemp insert into HotRoomsStream ;
  • 43. Sliding Time Window from TempStream#window.time(1 min) select roomNo, avg(temp) as avgTemp insert all events into AvgRoomTempStream ;
  • 44. Group By from TempStream#window.time(1 min) select roomNo, avg(temp) as avgTemp group by roomNo insert all events into HotRoomsStream ;
  • 45. Batch Time Window from TempStream#window.timeBatch(5 min) select roomNo, avg(temp) as avgTemp group by roomNo insert all events into HotRoomsStream ;
  • 46. Pattern 4: Joining Event Streams o What? Create a new event stream by joining multiple streams o Complication comes with time. So need at least one window o Often used with a window o Usecases? o To detecting when a player has kicked the ball in a football game . o To correlate TempStream and the state of the regulator and trigger control commands
  • 48. Join define stream TempStream (deviceID long, roomNo int, temp double); define stream RegulatorStream (deviceID long, roomNo int, isOn bool); In CEP (Siddhi)
  • 49. Join define stream TempStream (deviceID long, roomNo int, temp double); define stream RegulatorStream (deviceID long, roomNo int, isOn bool); from TempStream[temp > 30.0]#window.time(1 min) as T join RegulatorStream[isOn == false]#window.length(1) as R on T.roomNo == R.roomNo select T.roomNo, R.deviceID, ‘start’ as action insert into RegulatorActionStream ; In CEP (Siddhi)
  • 50. Pattern 5: Data Correlation, Missing Events, and Erroneous Data o What? find correlations and use that to detect and handle missing and erroneous Data o Use Cases? o Detecting a missing event (e.g., Detect a customer request that has not been responded within 1 hour of its reception) o Detecting erroneous data (e.g., Detecting failed sensors using a set of sensors that monitor overlapping regions. We can use those redundant data to find erroneous sensors and remove those data from further processing)
  • 52. Missing Event in CEP In CEP (Siddhi) from RequestStream#window.time(1h) insert expired events into ExpiryStream from r1=RequestStream->r2=Response[id=r1.id] or r3=ExpiryStream[id=r1.id] select r1.id as id ... insert into AlertStream having having r2.id == null;
  • 53. Pattern 6: Interacting with Databases o What? Combine realtime data against historical data o Use Cases? o On a transaction, looking up the customer age using ID from customer database to detect fraud (enrichment) o Checking a transaction against blacklists and whitelists in the database o Receive an input from the user (e.g., Daily discount amount may be updated in the database, and then the query will pick it automatically without human intervention).
  • 55. In CEP (Siddhi) Event Table define table CardUserTable (name string, cardNum long) ; @from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ , table.name = ‘UserTable’, caching.algorithm’=‘LRU’) define table CardUserTable (name string, cardNum long) Cache types supported ● Basic: A size-based algorithm based on FIFO. ● LRU (Least Recently Used): The least recently used event is dropped when cache is full. ● LFU (Least Frequently Used): The least frequently used event is dropped when cache is full.
  • 56. Join : Event Table define stream Purchase (price double, cardNo long, place string); define table CardUserTable (name string, cardNum long) ; from Purchase#window.length(1) join CardUserTable on Purchase.cardNo == CardUserTable.cardNum select Purchase.cardNo as cardNo, CardUserTable.name as name, Purchase.price as price insert into PurchaseUserStream ;
  • 57. Insert : Event Table define stream FraudStream (price double, cardNo long, userName string); define table BlacklistedUserTable (name string, cardNum long) ; from FraudStream select userName as name, cardNo as cardNum insert into BlacklistedUserTable ;
  • 58. Update : Event Table define stream LoginStream (userID string, islogin bool, loginTime long); define table LastLoginTable (userID string, time long) ; from LoginStream select userID, loginTime as time update LastLoginTable on LoginStream.userID == LastLoginTable.userID ;
  • 59. Pattern 7: Detecting Temporal Event Sequence Patterns o What? detect a temporal sequence of events or condition arranged in time o Use Cases? o Detect suspicious activities like small transaction immediately followed by a large transaction o Detect ball possession in a football game o Detect suspicious financial patterns like large buy and sell behaviour within a small time period
  • 61. In CEP (Siddhi) Pattern define stream Purchase (price double, cardNo long,place string); from every (a1 = Purchase[price < 100] -> a3= ..) -> a2 = Purchase[price >10000 and a1.cardNo == a2.cardNo] within 1 day select a1.cardNo as cardNo, a2.price as price, a2.place as place insert into PotentialFraud ;
  • 62. Pattern 8: Tracking o What? detecting an overall trend over time o Use Cases? o Tracking a fleet of vehicles, making sure that they adhere to speed limits, routes, and Geo- fences. o Tracking wildlife, making sure they are alive (they will not move if they are dead) and making sure they will not go out of the reservation. o Tracking airline luggage and making sure they have not been sent to wrong destinations o Tracking a logistic network and figuring out bottlenecks and unexpected conditions.
  • 63. TFL: Traffic Analytics Built using TFL ( Transport for London) open data feeds. http://goo.gl/9xNiCm http://goo.gl/04tX6k
  • 64. Pattern 9: Detecting Trends o What? tracking something over space and time and detects given conditions. o Useful in stock markets, SLA enforcement, auto scaling, predictive maintenance o Use Cases? o Rise, Fall of values and Turn (switch from rise to a fall) o Outliers - deviate from the current trend by a large value o Complex trends like “Triple Bottom” and “Cup and Handle” [17].
  • 65. Trend in Storm Build and apply an state machine
  • 66. In CEP (Siddhi) Sequence from t1=TempStream, t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp) or (t2[last].temp < temp and not(isNull(t2[last].temp))]+ within 5 min select t1.temp as initialTemp, t2[last].temp as finalTemp, t1.deviceID, t1.roomNo insert into IncreaingHotRoomsStream ;
  • 67. In CEP (Siddhi) Partition partition by (roomNo of TempStream) begin from t1=TempStream, t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp) or (t2[last].temp < temp and not(isNull(t2[last].temp))]+ within 5 min select t1.temp as initialTemp, t2[last].temp as finalTemp, t1.deviceID, t1.roomNo insert into IncreaingHotRoomsStream ; end;
  • 68. Detecting Trends in Real Life o Paper “A Complex Event Processing Toolkit for Detecting Technical Chart Patterns” (HPBC 2015) used the idea to identify stock chart patterns o Used kernel regression for smoothing and detected maxima’s and minimas. o Then any pattern can be written as a temporal event sequence.
  • 69. Pattern 10: Lambda Architecture o What? runs the same query in both relatime and batch pipelines. This uses realtime analytics to fill the lag in batch analytics results. o Also called “Lambda Architecture”. See Nathen Marz’s “Questioning the Lambda Architecture” o Use Cases? o For example, if batch processing takes 15 minutes, results would always lags 15 minutes from the current data. Here realtime processing fill the gap.
  • 71. Pattern 11: Detecting and switching to Detailed Analysis o What? detect a condition that suggests some anomaly, and further analyze it using historical data. o Use Cases? o Use basic rules to detect Fraud (e.g., large transaction), then pull out all transactions done against that credit card for a larger time period (e.g., 3 months data) from batch pipeline and run a detailed analysis o While monitoring weather, detect conditions like high temperature or low pressure in a given region, and then start a high resolution localized forecast for that region. o Detect good customers (e.g., through expenditure of more than $1000 within a month, and then run a detailed model to decide the potential of offering a deal).
  • 73. Pattern 12: Using a Machine Learning Model o What? The idea is to train a model (often a Machine Learning model), and then use it with the Realtime pipeline to make decisions o For example, you can build a model using R, export it as PMML (Predictive Model Markup Language) and use it within your realtime pipeline. o Use Cases? o Fraud Detection o Segmentation o Predict Churn
  • 74. Predictive Analytics o Build models and use them with WSO2 CEP, BAM and ESB using upcoming WSO2 Machine Learner Product ( 2015 Q2) o Build model using R, export them as PMML, and use within WSO2 CEP o Call R Scripts from CEP queries
  • 75. In CEP (Siddhi) PMML Model from TrasnactionStream #ml:applyModel(‘/path/logisticRegressionModel1.xml’, timestamp, amount, ip) insert into PotentialFraudsStream;
  • 76. Pattern 13: Online Control o What? Control something Online. These would involve problems like current situation awareness, predicting next value(s), and deciding on corrective actions. o Use Cases? o Autopilot o Self-driving o Robotics
  • 78. Scaling & HA for Pattern Implementations
  • 79. So how we scale a system ? o Vertical Scaling o Horizontal Scaling
  • 82. Horizontal Scaling ... E.g. Calculate Mean
  • 83. Horizontal Scaling ... E.g. Calculate Mean
  • 84. Horizontal Scaling ... How about scaling median ?
  • 85. Horizontal Scaling ... How about scaling median ? If & only if we can partition !
  • 86. Scalable Realtime solutions ... Spark Streaming o Supports distributed processing o Runs micro batches o Not supports pattern & sequence detection
  • 87. Scalable Realtime solutions ... Spark Streaming o Supports distributed processing o Runs micro batches o Not supports pattern & sequence detection Apache Storm o Supports distributed processing o Stream processing engine
  • 88. Why not use Apache Storm ? Advantages o Supports distributed processing o Supports Partitioning o Extendable o Opensource Disadvantages o Need to write Java code o Need to start from basic principles ( & data structures ) o Adoption for change is slow o No support to govern artifacts
  • 89. WSO2 CEP += Apache Storm Advantages o Supports distributed processing o Supports Partitioning o Extendable o Opensource Disadvantages o No need to write Java code (Supports SQL like query language) o No need to start from basic principles (Supports high level language) o Adoption for change is fast o Govern artifacts using Toolboxes o etc ...
  • 93. Siddhi QL define stream StockStream (symbol string, volume int, price double); @name(‘Filter Query’) from StockStream[price > 75] select * insert into HighPriceStockStream ; @name(‘Window Query’) from HighPriceStockStream#window.time(10 min) select symbol, sum(volume) as sumVolume insert into ResultStockStream ;
  • 94. Siddhi QL - with partition define stream StockStream (symbol string, volume int, price double); @name(‘Filter Query’) from StockStream[price > 75] select * insert into HighPriceStockStream ; @name(‘Window Query’) partition with (symbol of HighPriceStockStream) begin from HighPriceStockStream#window.time(10 min) select symbol, sum(volume) as sumVolume insert into ResultStockStream ; end;
  • 95. Siddhi QL - distributed define stream StockStream (symbol string, volume int, price double); @name(Filter Query’) @dist(parallel= ‘3') from StockStream[price > 75] select * insert into HightPriceStockStream ; @name(‘Window Query’) @dist(parallel= ‘2') partition with (symbol of HighPriceStockStream) begin from HighPriceStockStream#window.time(10 min) select symbol, sum(volume) as sumVolume insert into ResultStockStream ; end;
  • 99. HA / Persistence o Option 1: Side by side o Recommended o Takes 2X hardware o Gives zero down time o Option 2: Snapshot and restore o Uses less HW o Will lose events between snapshots o Downtime while recovery o ** Some scenarios you can use event tables to keep intermediate state
  • 100. Siddhi Extensions ● Function extension ● Aggregator extension ● Window extension ● Transform extension
  • 101. Siddhi Query : Function Extension from TempStream select deviceID, roomNo, custom:toKelvin(temp) as tempInKelvin, ‘K’ as scale insert into OutputStream ;
  • 102. Siddhi Query : Aggregator Extension from TempStream select deviceID, roomNo, temp custom:stdev(temp) as stdevTemp, ‘C’ as scale insert into OutputStream ;
  • 103. Siddhi Query : Window Extension from TempStream #window.custom:lastUnique(roomNo,2 min) select * insert into OutputStream ;
  • 104. Siddhi Query : Transform Extension from XYZSpeedStream #transform.custom:getVelocityVector(v,vx,vy,vz) select velocity, direction insert into SpeedStream ;