SlideShare une entreprise Scribd logo
1  sur  105
Télécharger pour lire hors ligne
ACM DEBS 2015: Realtime
Streaming Analytics
Patterns
Srinath Perera
Sriskandarajah Suhothayan
WSO2 Inc.
Data Analytics ( Big Data)
o Scientists are doing this for
25 year with MPI (1991)
using special Hardware
o Took off with Google’s
MapReduce paper (2004),
Apache Hadoop, Hive and
whole ecosystem created.
o Later Spark emerged, and it is
faster.
o But, processing takes time.
Value of Some Insights degrade
Fast!
o For some usecases ( e.g. stock
markets, traffic, surveillance,
patient monitoring) the value
of insights degrade very
quickly with time.
o E.g. stock markets and speed of
light
oo We need technology that can produce outputs fast
o Static Queries, but need very fast output (Alerts, Realtime
control)
o Dynamic and Interactive Queries ( Data exploration)
History
▪Realtime Analytics are not new
either!!
- Active Databases (2000+)
- Stream processing (Aurora, Borealis
(2005+) and later Storm)
- Distributed Streaming Operators (e.
g. Database research topic around
2005)
- CEP Vendor Roadmap ( from http:
//www.complexevents.
com/2014/12/03/cep-tooling-
market-survey-2014/)
Data Analytics Landscape
Realtime Interactive Analytics
o Usually done to support
interactive queries
o Index data to make them
them readily accessible so
you can respond to queries
fast. (e.g. Apache Drill)
o Tools like Druid, VoltDB and
SAP Hana can do this with all
data in memory to make
things really fast.
Realtime Streaming Analytics
o Process data without Streaming ( As data some in)
o Queries are fixed ( Static)
o Triggers when given conditions are met.
o Technologies
o Stream Processing ( Apache Storm, Apache Samza)
o Complex Event Processing/CEP (WSO2 CEP, Esper,
StreamBase)
o MicroBatches ( Spark Streaming)
Realtime Football Analytics
● Video: https://www.youtube.com/watch?v=nRI6buQ0NOM
● More Info: http://www.slideshare.net/hemapani/strata-2014-
talktracking-a-soccer-game-with-big-data
Why Realtime Streaming Analytics
Patterns?
o Reason 1: Usual advantages
o Give us better understanding
o Give us better vocabulary to teach and
communicate
o Tools can implement them
o ..
o Reason 2: Under theme realtime analytics, lot of
people get too much carried away with word count
example. Patterns shows word count is just tip of
the iceberg.
Earlier Work on Patterns
o Patterns from SQL ( project, join, filter etc)
o Event Processing Technical Society’s (EPTS)
reference architecture
o higher-level patterns such as tracking, prediction and
learning in addition to low-level operators that
comes from SQL like languages.
o Esper’s Solution Patterns Document (50 patterns)
o Coral8 White Paper
Basic Patterns
o Pattern 1: Preprocessing ( filter, transform, enrich,
project .. )
o Pattern 2: Alerts and Thresholds
o Pattern 3: Simple Counting and Counting with
Windows
o Pattern 4: Joining Event Streams
o Pattern 5: Data Correlation, Missing Events, and
Erroneous Data
Patterns for Handling Trends
o Pattern 7: Detecting Temporal Event Sequence
Patterns
o Pattern 8: Tracking ( track something over space or
time)
o Pattern 9: Detecting Trends ( rise, fall, turn, tipple
bottom)
o Pattern 13: Online Control
Mixed Patterns
o Pattern 6: Interacting with Databases
o Pattern 10: Running the same Query in Batch and
Realtime Pipelines
o Pattern 11: Detecting and switching to Detailed
Analysis
o Pattern 12: Using a Machine Learning Model
Earlier Work on Patterns
Realtime Streaming
Analytics Tools
Implementing Realtime Analytics
o tempting to write a custom code. Filter look very
easy. Too complex!! Don’t!
o Option 1: Stream Processing (e.g. Storm). Kind of
works. It is like Map Reduce, you have to write code.
o Option 2: Spark Streaming - more compact than
Storm, but cannot do some stateful operations.
o Option 3: Complex Event Processing - compact, SQL
like language, fast
Stream Processing
o Program a set of processors and wire them up, data
flows though the graph.
o A middleware framework handles data flow,
distribution, and fault tolerance (e.g. Apache Storm,
Samza)
o Processors may be in the same machine or multiple
machines
Writing a Storm Program
o Write Spout(s)
o Write Bolt(s)
o Wire them up
o Run
Write Bolts
We will use a shorthand
like on the left to explain
public static class WordCount extends BaseBasicBolt {
@Override
public void execute(Tuple tuple, BasicOutputCollector
collector) {
.. do something …
collector.emit(new Values(word, count));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer
declarer) {
declarer.declare(new Fields("word", "count"));
}
}
Wire up and Run
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
.shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12)
.fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopologyWithProgressBar(
args[0], conf, builder.createTopology());
}else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf,
builder.createTopology());
...
}
}
Complex Event Processing
Micro Batches ( e.g. Spark
Streaming)
o Process data in small batches,
and then combine results for
final results (e.g. Spark)
o Works for simple aggregates,
but tricky to do this for complex
operations (e.g. Event
Sequences)
o Can do it with MapReduce as
well if the deadlines are not too
tight.
o A SQL like data processing
languages (e.g. Apache Hive)
o Since many understand SQL,
Hive made large scale data
processing Big Data accessible
to many
o Expressive, short, and sweet.
o Define core operations that
covers 90% of problems
o Let experts dig in when they
like!
SQL Like Query Languages
o Easy to follow from SQL
o Expressive, short, and sweet.
o Define core operations that covers 90% of problems
o Let experts dig in when they like!
CEP = SQL for Realtime
Analytics
Pattern
Implementations
Code and other details
o Sample code - https://github.
com/suhothayan/DEBS-2015-Realtime-Analytics-
Patterns
o WSO2 CEP
o pack http://svn.wso2.
org/repos/wso2/people/suho/packs/cep/4.0.0
/debs2015/wso2cep-4.0.0-SNAPSHOT.zip
o docs- https://docs.wso2.
com/display/CEP400/WSO2+Complex+Event+Processor+
Documentation
o Apache Storm - https://storm.apache.org/
o We have packs in a pendrive
Pattern 1: Preprocessing
o What? Cleanup and prepare data via operations like
filter, project, enrich, split, and transformations
o Usecases?
o From twitter data stream: we extract author,
timestamp and location fields and then filter
them based on the location of the author.
o From temperature stream we expect
temperature & room number of the sensor and
filter by them.
Filter
from TempStream [ roomNo > 245 and roomNo <= 365]
select roomNo, temp
insert into ServerRoomTempStream ;
In Storm
In CEP ( Siddhi)
Architecture of WSO2 CEP
CEP Event Adapters
Support for several transports (network access)
● SOAP
● HTTP
● JMS
● SMTP
● SMS
● Thrift
● Kafka
● Websocket
● MQTT
Supports database writes using Map messages
● Cassandra
● RDBMs
Supports custom event adaptors via its pluggable architecture!
Stream Definition (Data Model)
{
'name':'soft.drink.coop.sales', 'version':'1.0.0',
'nickName': 'Soft_Drink_Sales', 'description': 'Soft drink sales',
'metaData':[
{'name':'region','type':'STRING'}
],
'correlationData':[
{'name':’transactionID’,'type':'STRING'}
],
'payloadData':[
{'name':'brand','type':'STRING'},
{'name':'quantity','type':'INT'},
{'name':'total','type':'INT'},
{'name':'user','type':'STRING'}
]
}
Projection
define stream TempStream
(deviceID long, roomNo int, temp double);
from TempStream
select roomNo, temp
insert into OutputStream ;
Inferred Streams
from TempStream
select roomNo, temp
insert into OutputStream ;
define stream OutputStream
(roomNo int, temp double);
Enrich
from TempStream
select roomNo, temp,‘C’ as scale
insert into OutputStream
define stream OutputStream
(roomNo int, temp double, scale string);
from TempStream
select deviceID, roomNo, avg(temp) as avgTemp
insert into OutputStream ;
Transformation
from TempStream
select concat(deviceID, ‘-’, roomNo) as uid,
toFahrenheit(temp) as tempInF,
‘F’ as scale
insert into OutputStream ;
Split
from TempStream
select roomNo, temp
insert into RoomTempStream ;
from TempStream
select deviceID, temp
insert into DeviceTempStream ;
Pattern 2: Alerts and Thresholds
o What? detects a condition and generates alerts
based on a condition. (e.g. Alarm on high
temperature).
o These alerts can be based on a simple value or
more complex conditions such as rate of increase
etc.
o Usecases?
o Raise alert when vehicle going too fast
o Alert when a room is too hot
Filter Alert
from TempStream [ roomNo > 245 and roomNo <= 365
and temp > 40 ]
select roomNo, temp
insert into AlertServerRoomTempStream ;
Pattern 3: Simple Counting and
Counting with Windows
o What? aggregate functions like Min, Max,
Percentiles, etc
o Often they can be counted without storing any
data
o Most useful when used with a window
o Usecases?
o Most metrics need a time bound so we can
compare ( errors per day, transactions per
second)
o Linux Load Average give us an idea of overall
trend by reporting last 1m, 3m, and 5m mean.
Types of windows
o Sliding windows vs. Batch (tumbling) windows
o Time vs. Length windows
Also supports
o Unique window
o First unique window
o External time window
Window
In Storm
Aggregation
In CEP (Siddhi)
from TempStream
select roomNo, avg(temp) as avgTemp
insert into HotRoomsStream ;
Sliding Time Window
from TempStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
insert all events into AvgRoomTempStream ;
Group By
from TempStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
group by roomNo
insert all events into HotRoomsStream ;
Batch Time Window
from TempStream#window.timeBatch(5 min)
select roomNo, avg(temp) as avgTemp
group by roomNo
insert all events into HotRoomsStream ;
Pattern 4: Joining Event Streams
o What? Create a new event stream by joining
multiple streams
o Complication comes with time. So need at least
one window
o Often used with a window
o Usecases?
o To detecting when a player has kicked the ball in
a football game .
o To correlate TempStream and the state of the
regulator and trigger control commands
Join with Storm
Join
define stream TempStream
(deviceID long, roomNo int, temp double);
define stream RegulatorStream
(deviceID long, roomNo int, isOn bool);
In CEP (Siddhi)
Join
define stream TempStream
(deviceID long, roomNo int, temp double);
define stream RegulatorStream
(deviceID long, roomNo int, isOn bool);
from TempStream[temp > 30.0]#window.time(1 min) as T
join RegulatorStream[isOn == false]#window.length(1) as R
on T.roomNo == R.roomNo
select T.roomNo, R.deviceID, ‘start’ as action
insert into RegulatorActionStream ;
In CEP (Siddhi)
Pattern 5: Data Correlation, Missing
Events, and Erroneous Data
o What? find correlations and use that to detect and
handle missing and erroneous Data
o Use Cases?
o Detecting a missing event (e.g., Detect a
customer request that has not been responded
within 1 hour of its reception)
o Detecting erroneous data (e.g., Detecting failed
sensors using a set of sensors that monitor
overlapping regions. We can use those
redundant data to find erroneous sensors and
remove those data from further processing)
Missing Event in Storm
Missing Event in CEP
In CEP (Siddhi)
from RequestStream#window.time(1h)
insert expired events into ExpiryStream
from r1=RequestStream->r2=Response[id=r1.id] or
r3=ExpiryStream[id=r1.id]
select r1.id as id ...
insert into AlertStream having having r2.id == null;
Pattern 6: Interacting with Databases
o What? Combine realtime data against historical
data
o Use Cases?
o On a transaction, looking up the customer age
using ID from customer database to detect fraud
(enrichment)
o Checking a transaction against blacklists and
whitelists in the database
o Receive an input from the user (e.g., Daily
discount amount may be updated in the
database, and then the query will pick it
automatically without human intervention).
In Storm
Querying Databases
In CEP (Siddhi)
Event Table
define table CardUserTable (name string, cardNum long) ;
@from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ ,
table.name = ‘UserTable’, caching.algorithm’=‘LRU’)
define table CardUserTable (name string, cardNum long)
Cache types supported
● Basic: A size-based algorithm based on FIFO.
● LRU (Least Recently Used): The least recently used event is dropped
when cache is full.
● LFU (Least Frequently Used): The least frequently used event is dropped
when cache is full.
Join : Event Table
define stream Purchase (price double, cardNo long, place string);
define table CardUserTable (name string, cardNum long) ;
from Purchase#window.length(1) join CardUserTable
on Purchase.cardNo == CardUserTable.cardNum
select Purchase.cardNo as cardNo,
CardUserTable.name as name,
Purchase.price as price
insert into PurchaseUserStream ;
Insert : Event Table
define stream FraudStream (price double, cardNo long, userName
string);
define table BlacklistedUserTable (name string, cardNum long) ;
from FraudStream
select userName as name, cardNo as cardNum
insert into BlacklistedUserTable ;
Update : Event Table
define stream LoginStream (userID string,
islogin bool, loginTime long);
define table LastLoginTable (userID string, time long) ;
from LoginStream
select userID, loginTime as time
update LastLoginTable
on LoginStream.userID == LastLoginTable.userID ;
Pattern 7: Detecting Temporal
Event Sequence Patterns
o What? detect a temporal sequence of events or
condition arranged in time
o Use Cases?
o Detect suspicious activities like small transaction
immediately followed by a large transaction
o Detect ball possession in a football game
o Detect suspicious financial patterns like large buy
and sell behaviour within a small time period
In Storm
Pattern
In CEP (Siddhi)
Pattern
define stream Purchase (price double, cardNo long,place string);
from every (a1 = Purchase[price < 100] -> a3= ..) ->
a2 = Purchase[price >10000 and a1.cardNo == a2.cardNo]
within 1 day
select a1.cardNo as cardNo, a2.price as price, a2.place as place
insert into PotentialFraud ;
Pattern 8: Tracking
o What? detecting an overall trend over time
o Use Cases?
o Tracking a fleet of vehicles, making sure that
they adhere to speed limits, routes, and Geo-
fences.
o Tracking wildlife, making sure they are alive (they
will not move if they are dead) and making sure
they will not go out of the reservation.
o Tracking airline luggage and making sure they
have not been sent to wrong destinations
o Tracking a logistic network and figuring out
bottlenecks and unexpected conditions.
TFL: Traffic Analytics
Built using TFL ( Transport for London) open data feeds.
http://goo.gl/9xNiCm http://goo.gl/04tX6k
Pattern 9: Detecting Trends
o What? tracking something over space and time and
detects given conditions.
o Useful in stock markets, SLA enforcement, auto
scaling, predictive maintenance
o Use Cases?
o Rise, Fall of values and Turn (switch from rise to
a fall)
o Outliers - deviate from the current trend by a
large value
o Complex trends like “Triple Bottom” and “Cup
and Handle” [17].
Trend in Storm
Build and apply an state machine
In CEP (Siddhi)
Sequence
from t1=TempStream,
t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp) or
(t2[last].temp < temp and not(isNull(t2[last].temp))]+
within 5 min
select t1.temp as initialTemp,
t2[last].temp as finalTemp,
t1.deviceID,
t1.roomNo
insert into IncreaingHotRoomsStream ;
In CEP (Siddhi)
Partition
partition by (roomNo of TempStream)
begin
from t1=TempStream,
t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp)
or (t2[last].temp < temp and not(isNull(t2[last].temp))]+
within 5 min
select t1.temp as initialTemp,
t2[last].temp as finalTemp,
t1.deviceID,
t1.roomNo
insert into IncreaingHotRoomsStream ;
end;
Detecting Trends in Real Life
o Paper “A Complex Event Processing
Toolkit for Detecting Technical Chart
Patterns” (HPBC 2015) used the idea to
identify stock chart patterns
o Used kernel regression for smoothing
and detected maxima’s and minimas.
o Then any pattern can be written as a
temporal event sequence.
Pattern 10: Lambda Architecture
o What? runs the same query in both relatime and
batch pipelines. This uses realtime analytics to fill
the lag in batch analytics results.
o Also called “Lambda Architecture”. See Nathen
Marz’s “Questioning the Lambda Architecture”
o Use Cases?
o For example, if batch processing takes 15
minutes, results would always lags 15 minutes
from the current data. Here realtime processing
fill the gap.
Lambda Architecture. How?
Pattern 11: Detecting and switching
to Detailed Analysis
o What? detect a condition that suggests some
anomaly, and further analyze it using historical data.
o Use Cases?
o Use basic rules to detect Fraud (e.g., large transaction),
then pull out all transactions done against that credit
card for a larger time period (e.g., 3 months data) from
batch pipeline and run a detailed analysis
o While monitoring weather, detect conditions like high
temperature or low pressure in a given region, and then
start a high resolution localized forecast for that region.
o Detect good customers (e.g., through expenditure of
more than $1000 within a month, and then run a
detailed model to decide the potential of offering a deal).
Pattern 11: How?
Pattern 12: Using a Machine
Learning Model
o What? The idea is to train a model (often a
Machine Learning model), and then use it with the
Realtime pipeline to make decisions
o For example, you can build a model using R, export it as
PMML (Predictive Model Markup Language) and use it
within your realtime pipeline.
o Use Cases?
o Fraud Detection
o Segmentation
o Predict Churn
Predictive Analytics
o Build models and use
them with WSO2 CEP,
BAM and ESB using
upcoming WSO2
Machine Learner Product
( 2015 Q2)
o Build model using R,
export them as PMML,
and use within WSO2 CEP
o Call R Scripts from CEP
queries
In CEP (Siddhi)
PMML Model
from TrasnactionStream
#ml:applyModel(‘/path/logisticRegressionModel1.xml’,
timestamp, amount, ip)
insert into PotentialFraudsStream;
Pattern 13: Online Control
o What? Control something Online. These would
involve problems like current situation awareness,
predicting next value(s), and deciding on corrective
actions.
o Use Cases?
o Autopilot
o Self-driving
o Robotics
Fraud Demo
Scaling & HA for Pattern
Implementations
So how we scale a system ?
o Vertical Scaling
o Horizontal Scaling
Vertical Scaling
Horizontal Scaling
E.g. Calculate Mean
Horizontal Scaling ...
E.g. Calculate Mean
Horizontal Scaling ...
E.g. Calculate Mean
Horizontal Scaling ...
How about scaling median ?
Horizontal Scaling ...
How about scaling median ?
If & only if we can partition !
Scalable Realtime solutions ...
Spark Streaming
o Supports distributed processing
o Runs micro batches
o Not supports pattern & sequence detection
Scalable Realtime solutions ...
Spark Streaming
o Supports distributed processing
o Runs micro batches
o Not supports pattern & sequence detection
Apache Storm
o Supports distributed processing
o Stream processing engine
Why not use Apache Storm ?
Advantages
o Supports distributed processing
o Supports Partitioning
o Extendable
o Opensource
Disadvantages
o Need to write Java code
o Need to start from basic principles ( & data structures )
o Adoption for change is slow
o No support to govern artifacts
WSO2 CEP += Apache Storm
Advantages
o Supports distributed processing
o Supports Partitioning
o Extendable
o Opensource
Disadvantages
o No need to write Java code (Supports SQL like query language)
o No need to start from basic principles (Supports high level
language)
o Adoption for change is fast
o Govern artifacts using Toolboxes
o etc ...
How we scale ?
How we scale ...
Scaling with Storm
Siddhi QL
define stream StockStream
(symbol string, volume int, price double);
@name(‘Filter Query’)
from StockStream[price > 75]
select *
insert into HighPriceStockStream ;
@name(‘Window Query’)
from HighPriceStockStream#window.time(10 min)
select symbol, sum(volume) as sumVolume
insert into ResultStockStream ;
Siddhi QL - with partition
define stream StockStream
(symbol string, volume int, price double);
@name(‘Filter Query’)
from StockStream[price > 75]
select *
insert into HighPriceStockStream ;
@name(‘Window Query’)
partition with (symbol of HighPriceStockStream)
begin
from HighPriceStockStream#window.time(10 min)
select symbol, sum(volume) as sumVolume
insert into ResultStockStream ;
end;
Siddhi QL - distributed
define stream StockStream
(symbol string, volume int, price double);
@name(Filter Query’)
@dist(parallel= ‘3')
from StockStream[price > 75]
select *
insert into HightPriceStockStream ;
@name(‘Window Query’)
@dist(parallel= ‘2')
partition with (symbol of HighPriceStockStream)
begin
from HighPriceStockStream#window.time(10 min)
select symbol, sum(volume) as sumVolume
insert into ResultStockStream ;
end;
On Storm UI
On Storm UI
High Availability
HA / Persistence
o Option 1: Side by side
o Recommended
o Takes 2X hardware
o Gives zero down time
o Option 2: Snapshot and restore
o Uses less HW
o Will lose events between snapshots
o Downtime while recovery
o ** Some scenarios you can use event tables to keep intermediate state
Siddhi Extensions
● Function extension
● Aggregator extension
● Window extension
● Transform extension
Siddhi Query : Function Extension
from TempStream
select deviceID, roomNo,
custom:toKelvin(temp) as tempInKelvin,
‘K’ as scale
insert into OutputStream ;
Siddhi Query : Aggregator Extension
from TempStream
select deviceID, roomNo, temp
custom:stdev(temp) as stdevTemp,
‘C’ as scale
insert into OutputStream ;
Siddhi Query : Window Extension
from TempStream
#window.custom:lastUnique(roomNo,2 min)
select *
insert into OutputStream ;
Siddhi Query : Transform Extension
from XYZSpeedStream
#transform.custom:getVelocityVector(v,vx,vy,vz)
select velocity, direction
insert into SpeedStream ;
Contact us !

Contenu connexe

Tendances

Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateIntroduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateSrinath Perera
 
View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...
View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...
View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...Srinath Perera
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsSrinath Perera
 
Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Stratio
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeFlink Forward
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming AlgorithmsJoe Kelley
 
Mining data streams
Mining data streamsMining data streams
Mining data streamsAkash Gupta
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream ProcessingZbigniew Jerzak
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesNatalino Busa
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Databricks
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis PatternsMikio L. Braun
 
A Deep Learning use case for water end use detection by Roberto Díaz and José...
A Deep Learning use case for water end use detection by Roberto Díaz and José...A Deep Learning use case for water end use detection by Roberto Díaz and José...
A Deep Learning use case for water end use detection by Roberto Díaz and José...Big Data Spain
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
 
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 Application and Challenges of Streaming Analytics and Machine Learning on Mu... Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Application and Challenges of Streaming Analytics and Machine Learning on Mu...Databricks
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseFlorian Lautenschlager
 

Tendances (20)

Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateIntroduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
 
View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...
View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...
View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
 
Mining data streams
Mining data streamsMining data streams
Mining data streams
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream Processing
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologies
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
A Deep Learning use case for water end use detection by Roberto Díaz and José...
A Deep Learning use case for water end use detection by Roberto Díaz and José...A Deep Learning use case for water end use detection by Roberto Díaz and José...
A Deep Learning use case for water end use detection by Roberto Díaz and José...
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 Application and Challenges of Streaming Analytics and Machine Learning on Mu... Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
 

En vedette

Realtime Data Analytics
Realtime Data AnalyticsRealtime Data Analytics
Realtime Data AnalyticsBo Yang
 
Patterns for Deploying Analytics in the Real World
Patterns for Deploying Analytics in the Real WorldPatterns for Deploying Analytics in the Real World
Patterns for Deploying Analytics in the Real WorldSriskandarajah Suhothayan
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...Jen Aman
 
Scaling Gilt: from monolith ruby app to micro service scala service architecture
Scaling Gilt: from monolith ruby app to micro service scala service architectureScaling Gilt: from monolith ruby app to micro service scala service architecture
Scaling Gilt: from monolith ruby app to micro service scala service architectureGilt Tech Talks
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0WSO2
 
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Kenneth Knowles -  Apache Beam - A Unified Model for Batch and Streaming Data...Kenneth Knowles -  Apache Beam - A Unified Model for Batch and Streaming Data...
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...Flink Forward
 
Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital BusinessSrinath Perera
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with EsperMatthew McCullough
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with EsperTed Won
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with EsperAntónio Alegria
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overviewComplex Event Processing - A brief overview
Complex Event Processing - A brief overviewIstván Dávid
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Databricks
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 

En vedette (20)

Realtime Data Analytics
Realtime Data AnalyticsRealtime Data Analytics
Realtime Data Analytics
 
Patterns for Deploying Analytics in the Real World
Patterns for Deploying Analytics in the Real WorldPatterns for Deploying Analytics in the Real World
Patterns for Deploying Analytics in the Real World
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
 
Scaling Gilt: from monolith ruby app to micro service scala service architecture
Scaling Gilt: from monolith ruby app to micro service scala service architectureScaling Gilt: from monolith ruby app to micro service scala service architecture
Scaling Gilt: from monolith ruby app to micro service scala service architecture
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Sensing the world with Data of Things
Sensing the world with Data of ThingsSensing the world with Data of Things
Sensing the world with Data of Things
 
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
 
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Kenneth Knowles -  Apache Beam - A Unified Model for Batch and Streaming Data...Kenneth Knowles -  Apache Beam - A Unified Model for Batch and Streaming Data...
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
 
Esperwhispering
EsperwhisperingEsperwhispering
Esperwhispering
 
Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital Business
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overviewComplex Event Processing - A brief overview
Complex Event Processing - A brief overview
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 

Similaire à DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics

Big Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWAREBig Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWAREFernando Lopez Aguilar
 
Observability: Beyond the Three Pillars with Spring
Observability: Beyond the Three Pillars with SpringObservability: Beyond the Three Pillars with Spring
Observability: Beyond the Three Pillars with SpringVMware Tanzu
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageDamien Dallimore
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsthelabdude
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapWithTheBest
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureGabriele Modena
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTWNGINX, Inc.
 
Overview of QP Frameworks and QM Modeling Tools (Notes)
Overview of QP Frameworks and QM Modeling Tools (Notes)Overview of QP Frameworks and QM Modeling Tools (Notes)
Overview of QP Frameworks and QM Modeling Tools (Notes)Quantum Leaps, LLC
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platformconfluent
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyVarun Vijayaraghavan
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 

Similaire à DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics (20)

Is this normal?
Is this normal?Is this normal?
Is this normal?
 
Big Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWAREBig Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWARE
 
Observability: Beyond the Three Pillars with Spring
Observability: Beyond the Three Pillars with SpringObservability: Beyond the Three Pillars with Spring
Observability: Beyond the Three Pillars with Spring
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the message
 
1230 Rtf Final
1230 Rtf Final1230 Rtf Final
1230 Rtf Final
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara Prathap
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 
Overview of QP Frameworks and QM Modeling Tools (Notes)
Overview of QP Frameworks and QM Modeling Tools (Notes)Overview of QP Frameworks and QM Modeling Tools (Notes)
Overview of QP Frameworks and QM Modeling Tools (Notes)
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
bakalarska_praca
bakalarska_pracabakalarska_praca
bakalarska_praca
 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.ly
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 

Plus de Sriskandarajah Suhothayan

WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsSriskandarajah Suhothayan
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsSriskandarajah Suhothayan
 
An introduction to the WSO2 Analytics Platform
An introduction to the WSO2 Analytics Platform   An introduction to the WSO2 Analytics Platform
An introduction to the WSO2 Analytics Platform Sriskandarajah Suhothayan
 
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
Scalable Event Processing with WSO2CEP @  WSO2Con2015euScalable Event Processing with WSO2CEP @  WSO2Con2015eu
Scalable Event Processing with WSO2CEP @ WSO2Con2015euSriskandarajah Suhothayan
 
Make it fast for everyone - performance and middleware design
Make it fast for everyone - performance and middleware designMake it fast for everyone - performance and middleware design
Make it fast for everyone - performance and middleware designSriskandarajah Suhothayan
 
Gather those events : Instrumenting everything for analysis
Gather those events : Instrumenting everything for analysisGather those events : Instrumenting everything for analysis
Gather those events : Instrumenting everything for analysisSriskandarajah Suhothayan
 
Intelligent integration with WSO2 ESB & WSO2 CEP
Intelligent integration with WSO2 ESB & WSO2 CEP Intelligent integration with WSO2 ESB & WSO2 CEP
Intelligent integration with WSO2 ESB & WSO2 CEP Sriskandarajah Suhothayan
 

Plus de Sriskandarajah Suhothayan (10)

WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
 
Sensing the world with data of things
Sensing the world with  data of thingsSensing the world with  data of things
Sensing the world with data of things
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
 
An introduction to the WSO2 Analytics Platform
An introduction to the WSO2 Analytics Platform   An introduction to the WSO2 Analytics Platform
An introduction to the WSO2 Analytics Platform
 
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
Scalable Event Processing with WSO2CEP @  WSO2Con2015euScalable Event Processing with WSO2CEP @  WSO2Con2015eu
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
 
Make it fast for everyone - performance and middleware design
Make it fast for everyone - performance and middleware designMake it fast for everyone - performance and middleware design
Make it fast for everyone - performance and middleware design
 
Gather those events : Instrumenting everything for analysis
Gather those events : Instrumenting everything for analysisGather those events : Instrumenting everything for analysis
Gather those events : Instrumenting everything for analysis
 
Intelligent integration with WSO2 ESB & WSO2 CEP
Intelligent integration with WSO2 ESB & WSO2 CEP Intelligent integration with WSO2 ESB & WSO2 CEP
Intelligent integration with WSO2 ESB & WSO2 CEP
 
WSO2 Complex Event Processor
WSO2 Complex Event ProcessorWSO2 Complex Event Processor
WSO2 Complex Event Processor
 
Manen Ant SVN
Manen Ant SVNManen Ant SVN
Manen Ant SVN
 

Dernier

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Dernier (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics

  • 1. ACM DEBS 2015: Realtime Streaming Analytics Patterns Srinath Perera Sriskandarajah Suhothayan WSO2 Inc.
  • 2. Data Analytics ( Big Data) o Scientists are doing this for 25 year with MPI (1991) using special Hardware o Took off with Google’s MapReduce paper (2004), Apache Hadoop, Hive and whole ecosystem created. o Later Spark emerged, and it is faster. o But, processing takes time.
  • 3. Value of Some Insights degrade Fast! o For some usecases ( e.g. stock markets, traffic, surveillance, patient monitoring) the value of insights degrade very quickly with time. o E.g. stock markets and speed of light oo We need technology that can produce outputs fast o Static Queries, but need very fast output (Alerts, Realtime control) o Dynamic and Interactive Queries ( Data exploration)
  • 4. History ▪Realtime Analytics are not new either!! - Active Databases (2000+) - Stream processing (Aurora, Borealis (2005+) and later Storm) - Distributed Streaming Operators (e. g. Database research topic around 2005) - CEP Vendor Roadmap ( from http: //www.complexevents. com/2014/12/03/cep-tooling- market-survey-2014/)
  • 6. Realtime Interactive Analytics o Usually done to support interactive queries o Index data to make them them readily accessible so you can respond to queries fast. (e.g. Apache Drill) o Tools like Druid, VoltDB and SAP Hana can do this with all data in memory to make things really fast.
  • 7. Realtime Streaming Analytics o Process data without Streaming ( As data some in) o Queries are fixed ( Static) o Triggers when given conditions are met. o Technologies o Stream Processing ( Apache Storm, Apache Samza) o Complex Event Processing/CEP (WSO2 CEP, Esper, StreamBase) o MicroBatches ( Spark Streaming)
  • 8. Realtime Football Analytics ● Video: https://www.youtube.com/watch?v=nRI6buQ0NOM ● More Info: http://www.slideshare.net/hemapani/strata-2014- talktracking-a-soccer-game-with-big-data
  • 9. Why Realtime Streaming Analytics Patterns? o Reason 1: Usual advantages o Give us better understanding o Give us better vocabulary to teach and communicate o Tools can implement them o .. o Reason 2: Under theme realtime analytics, lot of people get too much carried away with word count example. Patterns shows word count is just tip of the iceberg.
  • 10. Earlier Work on Patterns o Patterns from SQL ( project, join, filter etc) o Event Processing Technical Society’s (EPTS) reference architecture o higher-level patterns such as tracking, prediction and learning in addition to low-level operators that comes from SQL like languages. o Esper’s Solution Patterns Document (50 patterns) o Coral8 White Paper
  • 11. Basic Patterns o Pattern 1: Preprocessing ( filter, transform, enrich, project .. ) o Pattern 2: Alerts and Thresholds o Pattern 3: Simple Counting and Counting with Windows o Pattern 4: Joining Event Streams o Pattern 5: Data Correlation, Missing Events, and Erroneous Data
  • 12. Patterns for Handling Trends o Pattern 7: Detecting Temporal Event Sequence Patterns o Pattern 8: Tracking ( track something over space or time) o Pattern 9: Detecting Trends ( rise, fall, turn, tipple bottom) o Pattern 13: Online Control
  • 13. Mixed Patterns o Pattern 6: Interacting with Databases o Pattern 10: Running the same Query in Batch and Realtime Pipelines o Pattern 11: Detecting and switching to Detailed Analysis o Pattern 12: Using a Machine Learning Model
  • 14. Earlier Work on Patterns
  • 16. Implementing Realtime Analytics o tempting to write a custom code. Filter look very easy. Too complex!! Don’t! o Option 1: Stream Processing (e.g. Storm). Kind of works. It is like Map Reduce, you have to write code. o Option 2: Spark Streaming - more compact than Storm, but cannot do some stateful operations. o Option 3: Complex Event Processing - compact, SQL like language, fast
  • 17. Stream Processing o Program a set of processors and wire them up, data flows though the graph. o A middleware framework handles data flow, distribution, and fault tolerance (e.g. Apache Storm, Samza) o Processors may be in the same machine or multiple machines
  • 18. Writing a Storm Program o Write Spout(s) o Write Bolt(s) o Wire them up o Run
  • 19. Write Bolts We will use a shorthand like on the left to explain public static class WordCount extends BaseBasicBolt { @Override public void execute(Tuple tuple, BasicOutputCollector collector) { .. do something … collector.emit(new Values(word, count)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); } }
  • 20. Wire up and Run TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word")); Config conf = new Config(); if (args != null && args.length > 0) { conf.setNumWorkers(3); StormSubmitter.submitTopologyWithProgressBar( args[0], conf, builder.createTopology()); }else { conf.setMaxTaskParallelism(3); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count", conf, builder.createTopology()); ... } }
  • 22. Micro Batches ( e.g. Spark Streaming) o Process data in small batches, and then combine results for final results (e.g. Spark) o Works for simple aggregates, but tricky to do this for complex operations (e.g. Event Sequences) o Can do it with MapReduce as well if the deadlines are not too tight.
  • 23. o A SQL like data processing languages (e.g. Apache Hive) o Since many understand SQL, Hive made large scale data processing Big Data accessible to many o Expressive, short, and sweet. o Define core operations that covers 90% of problems o Let experts dig in when they like! SQL Like Query Languages
  • 24. o Easy to follow from SQL o Expressive, short, and sweet. o Define core operations that covers 90% of problems o Let experts dig in when they like! CEP = SQL for Realtime Analytics
  • 26. Code and other details o Sample code - https://github. com/suhothayan/DEBS-2015-Realtime-Analytics- Patterns o WSO2 CEP o pack http://svn.wso2. org/repos/wso2/people/suho/packs/cep/4.0.0 /debs2015/wso2cep-4.0.0-SNAPSHOT.zip o docs- https://docs.wso2. com/display/CEP400/WSO2+Complex+Event+Processor+ Documentation o Apache Storm - https://storm.apache.org/ o We have packs in a pendrive
  • 27. Pattern 1: Preprocessing o What? Cleanup and prepare data via operations like filter, project, enrich, split, and transformations o Usecases? o From twitter data stream: we extract author, timestamp and location fields and then filter them based on the location of the author. o From temperature stream we expect temperature & room number of the sensor and filter by them.
  • 28. Filter from TempStream [ roomNo > 245 and roomNo <= 365] select roomNo, temp insert into ServerRoomTempStream ; In Storm In CEP ( Siddhi)
  • 30. CEP Event Adapters Support for several transports (network access) ● SOAP ● HTTP ● JMS ● SMTP ● SMS ● Thrift ● Kafka ● Websocket ● MQTT Supports database writes using Map messages ● Cassandra ● RDBMs Supports custom event adaptors via its pluggable architecture!
  • 31. Stream Definition (Data Model) { 'name':'soft.drink.coop.sales', 'version':'1.0.0', 'nickName': 'Soft_Drink_Sales', 'description': 'Soft drink sales', 'metaData':[ {'name':'region','type':'STRING'} ], 'correlationData':[ {'name':’transactionID’,'type':'STRING'} ], 'payloadData':[ {'name':'brand','type':'STRING'}, {'name':'quantity','type':'INT'}, {'name':'total','type':'INT'}, {'name':'user','type':'STRING'} ] }
  • 32. Projection define stream TempStream (deviceID long, roomNo int, temp double); from TempStream select roomNo, temp insert into OutputStream ;
  • 33. Inferred Streams from TempStream select roomNo, temp insert into OutputStream ; define stream OutputStream (roomNo int, temp double);
  • 34. Enrich from TempStream select roomNo, temp,‘C’ as scale insert into OutputStream define stream OutputStream (roomNo int, temp double, scale string); from TempStream select deviceID, roomNo, avg(temp) as avgTemp insert into OutputStream ;
  • 35. Transformation from TempStream select concat(deviceID, ‘-’, roomNo) as uid, toFahrenheit(temp) as tempInF, ‘F’ as scale insert into OutputStream ;
  • 36. Split from TempStream select roomNo, temp insert into RoomTempStream ; from TempStream select deviceID, temp insert into DeviceTempStream ;
  • 37. Pattern 2: Alerts and Thresholds o What? detects a condition and generates alerts based on a condition. (e.g. Alarm on high temperature). o These alerts can be based on a simple value or more complex conditions such as rate of increase etc. o Usecases? o Raise alert when vehicle going too fast o Alert when a room is too hot
  • 38. Filter Alert from TempStream [ roomNo > 245 and roomNo <= 365 and temp > 40 ] select roomNo, temp insert into AlertServerRoomTempStream ;
  • 39. Pattern 3: Simple Counting and Counting with Windows o What? aggregate functions like Min, Max, Percentiles, etc o Often they can be counted without storing any data o Most useful when used with a window o Usecases? o Most metrics need a time bound so we can compare ( errors per day, transactions per second) o Linux Load Average give us an idea of overall trend by reporting last 1m, 3m, and 5m mean.
  • 40. Types of windows o Sliding windows vs. Batch (tumbling) windows o Time vs. Length windows Also supports o Unique window o First unique window o External time window
  • 42. Aggregation In CEP (Siddhi) from TempStream select roomNo, avg(temp) as avgTemp insert into HotRoomsStream ;
  • 43. Sliding Time Window from TempStream#window.time(1 min) select roomNo, avg(temp) as avgTemp insert all events into AvgRoomTempStream ;
  • 44. Group By from TempStream#window.time(1 min) select roomNo, avg(temp) as avgTemp group by roomNo insert all events into HotRoomsStream ;
  • 45. Batch Time Window from TempStream#window.timeBatch(5 min) select roomNo, avg(temp) as avgTemp group by roomNo insert all events into HotRoomsStream ;
  • 46. Pattern 4: Joining Event Streams o What? Create a new event stream by joining multiple streams o Complication comes with time. So need at least one window o Often used with a window o Usecases? o To detecting when a player has kicked the ball in a football game . o To correlate TempStream and the state of the regulator and trigger control commands
  • 48. Join define stream TempStream (deviceID long, roomNo int, temp double); define stream RegulatorStream (deviceID long, roomNo int, isOn bool); In CEP (Siddhi)
  • 49. Join define stream TempStream (deviceID long, roomNo int, temp double); define stream RegulatorStream (deviceID long, roomNo int, isOn bool); from TempStream[temp > 30.0]#window.time(1 min) as T join RegulatorStream[isOn == false]#window.length(1) as R on T.roomNo == R.roomNo select T.roomNo, R.deviceID, ‘start’ as action insert into RegulatorActionStream ; In CEP (Siddhi)
  • 50. Pattern 5: Data Correlation, Missing Events, and Erroneous Data o What? find correlations and use that to detect and handle missing and erroneous Data o Use Cases? o Detecting a missing event (e.g., Detect a customer request that has not been responded within 1 hour of its reception) o Detecting erroneous data (e.g., Detecting failed sensors using a set of sensors that monitor overlapping regions. We can use those redundant data to find erroneous sensors and remove those data from further processing)
  • 52. Missing Event in CEP In CEP (Siddhi) from RequestStream#window.time(1h) insert expired events into ExpiryStream from r1=RequestStream->r2=Response[id=r1.id] or r3=ExpiryStream[id=r1.id] select r1.id as id ... insert into AlertStream having having r2.id == null;
  • 53. Pattern 6: Interacting with Databases o What? Combine realtime data against historical data o Use Cases? o On a transaction, looking up the customer age using ID from customer database to detect fraud (enrichment) o Checking a transaction against blacklists and whitelists in the database o Receive an input from the user (e.g., Daily discount amount may be updated in the database, and then the query will pick it automatically without human intervention).
  • 55. In CEP (Siddhi) Event Table define table CardUserTable (name string, cardNum long) ; @from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ , table.name = ‘UserTable’, caching.algorithm’=‘LRU’) define table CardUserTable (name string, cardNum long) Cache types supported ● Basic: A size-based algorithm based on FIFO. ● LRU (Least Recently Used): The least recently used event is dropped when cache is full. ● LFU (Least Frequently Used): The least frequently used event is dropped when cache is full.
  • 56. Join : Event Table define stream Purchase (price double, cardNo long, place string); define table CardUserTable (name string, cardNum long) ; from Purchase#window.length(1) join CardUserTable on Purchase.cardNo == CardUserTable.cardNum select Purchase.cardNo as cardNo, CardUserTable.name as name, Purchase.price as price insert into PurchaseUserStream ;
  • 57. Insert : Event Table define stream FraudStream (price double, cardNo long, userName string); define table BlacklistedUserTable (name string, cardNum long) ; from FraudStream select userName as name, cardNo as cardNum insert into BlacklistedUserTable ;
  • 58. Update : Event Table define stream LoginStream (userID string, islogin bool, loginTime long); define table LastLoginTable (userID string, time long) ; from LoginStream select userID, loginTime as time update LastLoginTable on LoginStream.userID == LastLoginTable.userID ;
  • 59. Pattern 7: Detecting Temporal Event Sequence Patterns o What? detect a temporal sequence of events or condition arranged in time o Use Cases? o Detect suspicious activities like small transaction immediately followed by a large transaction o Detect ball possession in a football game o Detect suspicious financial patterns like large buy and sell behaviour within a small time period
  • 61. In CEP (Siddhi) Pattern define stream Purchase (price double, cardNo long,place string); from every (a1 = Purchase[price < 100] -> a3= ..) -> a2 = Purchase[price >10000 and a1.cardNo == a2.cardNo] within 1 day select a1.cardNo as cardNo, a2.price as price, a2.place as place insert into PotentialFraud ;
  • 62. Pattern 8: Tracking o What? detecting an overall trend over time o Use Cases? o Tracking a fleet of vehicles, making sure that they adhere to speed limits, routes, and Geo- fences. o Tracking wildlife, making sure they are alive (they will not move if they are dead) and making sure they will not go out of the reservation. o Tracking airline luggage and making sure they have not been sent to wrong destinations o Tracking a logistic network and figuring out bottlenecks and unexpected conditions.
  • 63. TFL: Traffic Analytics Built using TFL ( Transport for London) open data feeds. http://goo.gl/9xNiCm http://goo.gl/04tX6k
  • 64. Pattern 9: Detecting Trends o What? tracking something over space and time and detects given conditions. o Useful in stock markets, SLA enforcement, auto scaling, predictive maintenance o Use Cases? o Rise, Fall of values and Turn (switch from rise to a fall) o Outliers - deviate from the current trend by a large value o Complex trends like “Triple Bottom” and “Cup and Handle” [17].
  • 65. Trend in Storm Build and apply an state machine
  • 66. In CEP (Siddhi) Sequence from t1=TempStream, t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp) or (t2[last].temp < temp and not(isNull(t2[last].temp))]+ within 5 min select t1.temp as initialTemp, t2[last].temp as finalTemp, t1.deviceID, t1.roomNo insert into IncreaingHotRoomsStream ;
  • 67. In CEP (Siddhi) Partition partition by (roomNo of TempStream) begin from t1=TempStream, t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp) or (t2[last].temp < temp and not(isNull(t2[last].temp))]+ within 5 min select t1.temp as initialTemp, t2[last].temp as finalTemp, t1.deviceID, t1.roomNo insert into IncreaingHotRoomsStream ; end;
  • 68. Detecting Trends in Real Life o Paper “A Complex Event Processing Toolkit for Detecting Technical Chart Patterns” (HPBC 2015) used the idea to identify stock chart patterns o Used kernel regression for smoothing and detected maxima’s and minimas. o Then any pattern can be written as a temporal event sequence.
  • 69. Pattern 10: Lambda Architecture o What? runs the same query in both relatime and batch pipelines. This uses realtime analytics to fill the lag in batch analytics results. o Also called “Lambda Architecture”. See Nathen Marz’s “Questioning the Lambda Architecture” o Use Cases? o For example, if batch processing takes 15 minutes, results would always lags 15 minutes from the current data. Here realtime processing fill the gap.
  • 71. Pattern 11: Detecting and switching to Detailed Analysis o What? detect a condition that suggests some anomaly, and further analyze it using historical data. o Use Cases? o Use basic rules to detect Fraud (e.g., large transaction), then pull out all transactions done against that credit card for a larger time period (e.g., 3 months data) from batch pipeline and run a detailed analysis o While monitoring weather, detect conditions like high temperature or low pressure in a given region, and then start a high resolution localized forecast for that region. o Detect good customers (e.g., through expenditure of more than $1000 within a month, and then run a detailed model to decide the potential of offering a deal).
  • 73. Pattern 12: Using a Machine Learning Model o What? The idea is to train a model (often a Machine Learning model), and then use it with the Realtime pipeline to make decisions o For example, you can build a model using R, export it as PMML (Predictive Model Markup Language) and use it within your realtime pipeline. o Use Cases? o Fraud Detection o Segmentation o Predict Churn
  • 74. Predictive Analytics o Build models and use them with WSO2 CEP, BAM and ESB using upcoming WSO2 Machine Learner Product ( 2015 Q2) o Build model using R, export them as PMML, and use within WSO2 CEP o Call R Scripts from CEP queries
  • 75. In CEP (Siddhi) PMML Model from TrasnactionStream #ml:applyModel(‘/path/logisticRegressionModel1.xml’, timestamp, amount, ip) insert into PotentialFraudsStream;
  • 76. Pattern 13: Online Control o What? Control something Online. These would involve problems like current situation awareness, predicting next value(s), and deciding on corrective actions. o Use Cases? o Autopilot o Self-driving o Robotics
  • 78. Scaling & HA for Pattern Implementations
  • 79. So how we scale a system ? o Vertical Scaling o Horizontal Scaling
  • 82. Horizontal Scaling ... E.g. Calculate Mean
  • 83. Horizontal Scaling ... E.g. Calculate Mean
  • 84. Horizontal Scaling ... How about scaling median ?
  • 85. Horizontal Scaling ... How about scaling median ? If & only if we can partition !
  • 86. Scalable Realtime solutions ... Spark Streaming o Supports distributed processing o Runs micro batches o Not supports pattern & sequence detection
  • 87. Scalable Realtime solutions ... Spark Streaming o Supports distributed processing o Runs micro batches o Not supports pattern & sequence detection Apache Storm o Supports distributed processing o Stream processing engine
  • 88. Why not use Apache Storm ? Advantages o Supports distributed processing o Supports Partitioning o Extendable o Opensource Disadvantages o Need to write Java code o Need to start from basic principles ( & data structures ) o Adoption for change is slow o No support to govern artifacts
  • 89. WSO2 CEP += Apache Storm Advantages o Supports distributed processing o Supports Partitioning o Extendable o Opensource Disadvantages o No need to write Java code (Supports SQL like query language) o No need to start from basic principles (Supports high level language) o Adoption for change is fast o Govern artifacts using Toolboxes o etc ...
  • 93. Siddhi QL define stream StockStream (symbol string, volume int, price double); @name(‘Filter Query’) from StockStream[price > 75] select * insert into HighPriceStockStream ; @name(‘Window Query’) from HighPriceStockStream#window.time(10 min) select symbol, sum(volume) as sumVolume insert into ResultStockStream ;
  • 94. Siddhi QL - with partition define stream StockStream (symbol string, volume int, price double); @name(‘Filter Query’) from StockStream[price > 75] select * insert into HighPriceStockStream ; @name(‘Window Query’) partition with (symbol of HighPriceStockStream) begin from HighPriceStockStream#window.time(10 min) select symbol, sum(volume) as sumVolume insert into ResultStockStream ; end;
  • 95. Siddhi QL - distributed define stream StockStream (symbol string, volume int, price double); @name(Filter Query’) @dist(parallel= ‘3') from StockStream[price > 75] select * insert into HightPriceStockStream ; @name(‘Window Query’) @dist(parallel= ‘2') partition with (symbol of HighPriceStockStream) begin from HighPriceStockStream#window.time(10 min) select symbol, sum(volume) as sumVolume insert into ResultStockStream ; end;
  • 99. HA / Persistence o Option 1: Side by side o Recommended o Takes 2X hardware o Gives zero down time o Option 2: Snapshot and restore o Uses less HW o Will lose events between snapshots o Downtime while recovery o ** Some scenarios you can use event tables to keep intermediate state
  • 100. Siddhi Extensions ● Function extension ● Aggregator extension ● Window extension ● Transform extension
  • 101. Siddhi Query : Function Extension from TempStream select deviceID, roomNo, custom:toKelvin(temp) as tempInKelvin, ‘K’ as scale insert into OutputStream ;
  • 102. Siddhi Query : Aggregator Extension from TempStream select deviceID, roomNo, temp custom:stdev(temp) as stdevTemp, ‘C’ as scale insert into OutputStream ;
  • 103. Siddhi Query : Window Extension from TempStream #window.custom:lastUnique(roomNo,2 min) select * insert into OutputStream ;
  • 104. Siddhi Query : Transform Extension from XYZSpeedStream #transform.custom:getVelocityVector(v,vx,vy,vz) select velocity, direction insert into SpeedStream ;