2. Outline
• BigData
• Complex Event Processing
• Basic Constructs of Query Language
• CEP Solution Patterns
• Scale, HA and Performance
• Demo
3.
4.
5.
6. Why Big Data is hard?
• How store? Assuming 1TB bytes it takes 1000
computers to store a 1PB
• How to move? Assuming 10Gb network, it
takes 2 hours to copy 1TB, or 83 days to copy
a 1PB
• How to search? Assuming each record is 1KB
and one machine can process 1000 records
per sec, it needs 277CPU days to process a
1TB and 785 CPU years to process a 1 PB
• How to process?
– How to convert algorithms to work in large size
– How to create new algorithms
http://www.susanica.com/photo/9
7. Why it is hard (Contd.)?
• System build of many computers
• That handles lots of data
• Running complex logic
• This pushes us to frontier of
Distributed Systems and Databases
• More data does not mean there is a
simple model
• Some models can be complex as
the system
http://www.flickr.com/photos/mariachily/5250487136,
Licensed CC
10. CEP Is & Is NOT!
• Is NOT!
• Simple filters
• Simple Event Processing
• E.g. Is this a gold or platinum customer?
• Joining multiple event streams
• Event Stream Processing
• Is !
• Processing multiple event streams
• Identify meaningful patterns among streams
• Using temporal windows
• E.g. Notify if there is a 10% increase in overall trading activity AND the
average price of commodities has fallen 2% in the last 4 hours
14. Event Streams
• Event stream is a sequence of events
• Event streams are defined by Stream Definitions
• Events streams have in-flows and out-flows
• Inflows can be from
• Event builders
Converts incoming XML, JSON, etc events to event
stream
• Execution plans
• Outflows are to
• Event formatters
Converts to event stream to XML, JSON, etc events
• Execution plans
16. Event Format
• Standard event formats are available for
• XML
• JSON
• Text
• Map
• WSO2 Event
• If events adhere to the standard format
they do not need data mapping.
• If events do not adhere
custom event mapping should be configured in
Event builder & Event Formatter
appropriately.
17. Event Format
Standard XML event format
<events>
<event>
<metaData>
<tenant_id>2</tenant_id>
</metaData>
<correlationData>
<activity_id>ID5</activity_id>
</correlationData>
<payloadData>
<clientPhoneNo>0771117673</clientPhoneNo>
<clientName>Mohanadarshan</clientName>
<clientResidenceAddress>15, Alexendra road,
California</clientResidenceAddress>
<clientAccountNo>ACT5673</clientAccountNo>
</payloadData>
</event>
<events>
18. CEP Execution Plan
● Is an isolated logical execution unit
● Each execution plan imports some of the event streams
available in CEP and defines the execution logic using queries
and exports the results as output event streams.
● Has one-to-one relationship with CEP Backend Runtime.
● Has many-to-many relationship with Event Streams.
● Each execution plan spawns a Siddhi Engine Instance.
23. Siddhi Query : Enrich
from TempStream
select roomNo, temp,‘C’ as scale
insert into OutputStream
define stream OutputStream
(roomNo int, temp double, scale string);
24. Siddhi Query : Enrich
from TempStream
select deviceID, roomNo, avg(temp) as avgTemp
insert into OutputStream ;
25. Siddhi Query : Transformation
from TempStream
select concat(deviceID, ‘-’, roomNo) as uid,
toFahrenheit(temp) as tempInF,
‘F’ as scale
insert into OutputStream ;
26. Siddhi Query : Split
from TempStream
select roomNo, temp
insert into RoomTempStream ;
from TempStream
select deviceID, temp
insert into DeviceTempStream ;
27. Siddhi Query : Filter
from TempStream [temp > 30.0 and roomNo != 2043]
select roomNo, temp
insert into HotRoomsStream ;
28. Siddhi Query : Window
from TempStream
select roomNo, avg(temp) as avgTemp
insert into HotRoomsStream ;
29. Siddhi Query : Window
from TempStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
insert into HotRoomsStream ;
30. Siddhi Query : Window
from TempStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
group by roomNo
insert into HotRoomsStream ;
31. Siddhi Query : Batch Window
from TempStream#window.timeBatch(5 min)
select roomNo, avg(temp) as avgTemp
group by roomNo
insert into HotRoomsStream ;
33. Siddhi Query : Join
define stream TempStream
(deviceID long, roomNo int, temp double);
define stream RegulatorStream
(deviceID long, roomNo int, isOn bool);
from TempStream[temp > 30.0]#window.time(1 min) as T
join RegulatorStream[isOn == false]#window.lenght(1) as R
on T.roomNo == R.roomNo
select T.roomNo, R.deviceID, ‘start’ as action
insert into RegulatorActionStream ;
34. Siddhi Query : Detect Trend
from t1=TempStream,
t2=TempStream [t1.temp < t2.temp and
t1.deviceID == t2.deviceID]+
within 5 min
select t1.temp as initialTemp,
t2.temp as finalTemp,
t1.deviceID,
t1.roomNo
insert into IncreaingHotRoomsStream ;
35. Siddhi Query : Partition
define partition Device by TempStream.deviceID ;
define partition Temp by
range TempStream.temp <= 0 as ‘ICE’,
range TempStream.temp > 0 and
TempStream.temp < 100 as ‘WATER’,
range TempStream.temp > 100 as ‘VAPOUR’ ;
36. Siddhi Query : Detect Trend per Partition
define partition Device by TempStream.deviceID ;
from t1=TempStream,
t2=TempStream [t1.temp < t2.temp and
t1.deviceID == t2.deviceID]+
within 5 min
select t1.temp as initialTemp,
t2.temp as finalTemp,
t1.deviceID,
t1.roomNo
insert into IncreaingHotRoomsStream
partition by Device ;
37. Siddhi Query : Detect Pattern
define stream Purchase (price double, cardNo long,place string);
from every (a1 = Purchase[price < 10] -> a3= ..) ->
a2 = Purchase[price >10000 and a1.cardNo == a2.cardNo]
within 1 day
select a1.cardNo as cardNo, a2.price as price, a2.place as place
insert into PotentialFraud ;
38. Siddhi Query : Define Event Table
define table CardUserTable (name string, cardNum long) ;
define table CardUserTable (name string, cardNum long)
from (‘datasource.name’=‘CardDataSource’, ‘table.name’=
‘UserTable’, ‘caching.algorithm’=‘LRU’) ;
Cache types supported
● Basic: A size-based algorithm based on FIFO.
● LRU (Least Recently Used): The least recently used event is dropped
when cache is full.
● LFU (Least Frequently Used): The least frequently used event is dropped
when cache is full.
39. Siddhi Query : Query Event Table
define stream Purchase (price double, cardNo long, place string);
define table CardUserTable (name string, cardNum long) ;
from Purchase#window.length(1) join CardUserTable
on Purchase.cardNo == CardUserTable.cardNum
select Purchase.cardNo as cardNo,
CardUserTable.name as name,
Purchase.price as price
insert into PurchaseUserStream ;
40. Siddhi Query : Insert into Event Table
define stream FraudStream (price double, cardNo long, userName
string);
define table BlacklistedUserTable (name string, cardNum long) ;
from FraudStream
select userName as name, cardNo as cardNum
insert into BlacklistedUserTable ;
41. Siddhi Query : Update into Event Table
define stream LoginStream (userID string,
islogin bool, loginTime long);
define table LastLoginTable (userID string, time long) ;
from LoginStream
select userID, loginTime as time
update LastLoginTable
on LoginStream.userID == LastLoginTable.userID ;
43. Siddhi Query : Function Extension
from TempStream
select deviceID, roomNo,
custom:toKelvin(temp) as tempInKelvin,
‘K’ as scale
insert into OutputStream ;
44. Siddhi Query : Aggregator Extension
from TempStream
select deviceID, roomNo, temp
custom:stdev(temp) as stdevTemp,
‘C’ as scale
insert into OutputStream ;
45. Siddhi Query : Window Extension
from TempStream
#window.custom:lastUnique(roomNo,2 min)
select *
insert into OutputStream ;
46. Siddhi Query : Transform Extension
from XYZSpeedStream
#transform.custom:getVelocityVector(v,vx,vy,vz)
select velocity, direction
insert into SpeedStream ;
47. CEP Event Adaptors
● For receiving and publishing events
● Has the configurations to connect to external endpoints
● Has many-to-one relationship with Event Streams
48. CEP Event Adaptors
Support for several transports (network access)
● SOAP
● HTTP
● JMS
● SMTP
● SMS
● Thrift
● Kafka
Supporting data formats
● XML
● JSON
● Map
● Text
● WSO2Event - WSO2 data format over Thrift for High Performant Event transfer
supporting Java/C/C++/C# via Thrift language bindings
49. CEP Event Adaptors
Supports database writes using Map messages
● Cassandra
● MYSQL
● H2
Supports custom event adaptors
via its pluggable architecture!
50. Monitoring & Debugging : Event Flow
● Visualization of the Event Stream flow in CEP
● Helps to get the big picture
● Good for debugging
51. Monitoring & Debugging : Event Tracer
• Dump message traces in a textual format
• Before and after processing each stage of event flow
52. Monitoring & Debugging : Event Statistics
• Real-time statistics
• via visual illustrations & JMX
• Time based request & response counts
• Stats on all components of CEP server
53. Real Time Dashboard
• Provides tools to configure gadgets
• Currently supports RDBMS only
• Powered by WSO2 User Engagement Server ( WSO2UES)
54. Performance Results
• Same JVM Performance (Siddhi with Esper, M means a Million)
4 core machine
• Filters 8M Events/Sec vs Esper 2M
• Window 2.5M Events/Sec vs. Esper 1M
• Patterns 1.4M Events/Sec about 10X faster than Esper
• Over the Network Performance (Using thrift based WSO2 event
format) - 8 core machine
• Filter 0.25M (or 250K) Event/Sec
55. CEP High Availability
Execution plan in “RedundantNode” based distributed processing mode
<executionPlan name="RedundantNodeExecutionPlan" statistics="enable"
trace="enable" xmlns="http://wso2.org/carbon/eventprocessor">
...
<siddhiConfiguration>
<property name="siddhi.enable.distributed.processing">RedundantNode</property>
<property name="siddhi.persistence.snapshot.time.interval.minutes">0</property>
</siddhiConfiguration>
...
</executionPlan>
56. HA / Persistence
• Option 1: Side by side
• Recommended
• Takes 2X hardware
• Gives zero down time
• Option 2: Snapshot and restore
• Uses less HW
• Will lose events between snapshots
• Downtime while recovery
• ** Some scenarios you can use event tables to keep
intermediate state
57. Scaling
• Vertically scaling
• Can be distributed as a pipeline
• Horizontally scaling
• Queries like windows, patterns, and Join have shared states
• Hard to distribute!
58. Scaling (Contd.)
• Currently users have to setup the pipeline manually (WSO2 team
can help)
• Work is underway to support above pipeline and distributer
operators out of the box
65. Email Notification
Hi Alis Miranda
Your order for 1 L CHICKEN pizza will be delivered in 30 mins to
779 Burl Ave, Clovis, CA 93611.
The total cost of the order is $14.5.
If you didn't get the pizza within 30 min you will be eligible to have those pizzas for
free..!!
MyPizzaShop