In this talk, we will discuss about the WSO2 Data Analytics platform that brings together all the technologies into one platform. It lets you collect data through a one sensor API, process it using batch, realtime or predictive technologies and communicate your results all within a single platform and user experience.
More details https://iwringer.wordpress.com/2015/03/18/introducing-wso2-analytics-platform-note-for-architects/
2. Analytics is Growing Up
▪ It is no longer about doing
your first analytics usecase.
▪ It is about
▪ How to do it everyday,
efficiently?
▪ How to recover?
▪ How to make
decisions?
▪ How to do other forms
like real-time ,
Interactive, and
predicative analytics
3. Analytics 2.0 Platform
▪ One platform for all
four forms of analytics
▪ Single consistent
programming model
▪ One analytics archive
format)
▪ Support for the lifecycle
of analytics Apps
Integrate well with rest of the
enterprise!!
4.
5. Collect Data
▪ One Sensor API to
publish events
- REST, Thrift, JMS, Kafka
- Java clients, java script
clients*
▪ First you define streams
(think it as a infinite table
in SQL DB)
▪ Then send events via
Sensor API
Can send to batch pipeline, Realtime pipeline or both via
configuration!
6. Collecting Data: Example
Java example: create and send events
Events send asynchronously
See client given in http://goo.gl/vIJzqc for more info
Agent agent = new Agent(agentConfiguration);
publisher = new AsyncDataPublisher("tcp://hostname:7612", .. );
StreamDefinition definition = new
StreamDefinition(STREAM_NAME,VERSION);
definition.addPayloadData("sid", STRING);
...
publisher.addStreamDefinition(definition);
...
Event event = new Event();
event.setPayloadData(eventData);
publisher.publish(STREAM_NAME, VERSION, event); Send event
Define Stream
Initialize Agent
9. Analytics logic with SQL like
Queries
▪ Both BAM and CEP provides a
SQL like data processing language
▪ Since many understands SQL,
above languages made large scale
data processing Big Data
accessible to many
▪ Expressive, short, and sweet.
▪ Define core operations that covers
90% of problems
▪ Lets experts dig in when they like!
(via User Defined functions)
10. Scaling CEP Queries on top of
Storm
▪Accepts CEP queries with hints about how to partition streams
▪Partition streams, build a Apache Storm topology running CEP
nodes as Storm Sprouts, and run it. (see http://goo.gl/pP3kdX )
11. Predictive Analytics
▪ Predictive Analytics learns a
decision function (a model)
using examples
▪ Is this fraud?
▪ How to drive?
▪ Handwritten text
▪ Build models and use them
with WSO2 CEP, BAM and
ESB using WSO2 Machine
Learner Product ( 2015 Q3)
▪ Build model using R, export
them as PMML, and use
within WSO2 CEP
12. WSO2 Machine Learner
▪ A wizard to sample,
explore, and understand
data through
visualizations
▪ A wizard to configure,
train machine learning
models, and select the
best model
▪ Find and use those
models with WSO2 CEP,
BAM and ESB
▪ Powered by Apache
Spark MLLib
13. Communicate: Dashboards
▪ Idea is to give a “Overall idea” in a glance (e.g. car dashboard)
▪ Support for personalization, you can build your own dashboard.
▪ Also the entry point for Drill down
▪ How to build?
- Dashboard via Google Gadget and content via HTML5 + java scripts
- Use charting libraries like Vega or D3
14. Communicate: Alerts
▪ Detecting conditions can
be done via CEP Queries
▪ Key is the “Last Mile”
- Email
- SMS
- Push notifications to a UI
- Pager
- Trigger physical Alarm
▪ How?
- Select Email sender “Output Adaptor” from CEP, or send from
CEP to ESB, and ESB has lot of connectors
15. Communicate: APIs
▪ With mobile Apps, most data
are exposed and shared as
APIs (REST/Json ) to end
users.
▪ Need to expose analytics
results as API
▪ Following are some challenges
- Security and Permissions
- API Discovery
- Billing, throttling, quotas &
SLA
▪ How?
- Write data to a database from CEP event tables
- Build Services via WSO2 Data Service
- Expose them as APIs via API Manager
16. Event Stream Store
▪ One stop place for all
event stream definitions
▪ Let users
▪ Publish and consume
though Multiple protocols
like REST, JMS, Thrift,
Web Sockets etc.
▪ Discover event streams
▪ Enforce security and
authorization
▪ Per-pay subscriptions
▪ Effectively a Event Stream
Market Place!!
▪ This will automate APIs
creation as discussed in the
slide before.
17. What is it good for?
▪ Batch Analytics
▪ Realtime Streaming analytics
▪ Realtime Interactive analytics
▪ Lambda Architecture
▪ Train and use a ML model
▪ Selective Detailed Analysis
http://tinybuddha.com/blog/a-simple-technique-to-
solve-problems-before-they-get-bigger/
18. Selective Detailed Analysis
• Too expensive to do
detailed analysis on all the
data
• Instead detect the condition,
and dig into related data
• Fraud toolbox
• Other usecases
– Dynamic offers at Retail
Site
– Weather
19. Lambda Architecture
• Same code in both batch and realtime layers
• Idea is to fill the time between two batch runs
• Batch layer writes the data to a DB
• Realtime layer merge with batch data via Event Tables
20. Real Life Use Cases
▪ Health, Smart Parking solutions
▪ Financial Monitoring
▪ Smart City project, Vehicle
tracking, Building monitoring
▪ Railway monitoring
▪ Throttling and Anomaly
Detection
▪ API Analytics (13+ customers)
▪ Connected Car
21. Case Study: DEBS Grand Challenges
▪ DEBS ((Distributed Event Based Systems) Grand
Challenge is a yearly event processing challenge.
▪ 2014 Challenge:
▪ Smart Home electricity data: 2000 sensors, 40
houses, 4 Billion events. We posted (400K
events/sec) and close to one million
distributed throughput with 4 nodes.
▪ one of the four finalists
▪ 2015 Challenge:
▪ Based on taxi activities collected from New
York City over the year 2013. 14,144 taxis 173
million taxi trip records. We posted 300K/sec
on a single node and one of the finalists.
https://www.flickr.com/photos/shedboy/3681317392/
22. Case Study: Realtime Soccer
Analysis
Watch at:
https://www.youtube.com/watch?v=nRI6buQ0NOM
23. Case Study: TFL Traffic Analysis
Built using TFL (
Transport for
London) open data
feeds.
http://goo.gl/04tX
6k
http://goo.gl/9xNi
Cm
24. Select the Product
Product Features
WSO2 Data
Analytics Server
(DAS)
Everything : Batch,
Realtime, Interactive,
and Predictive
Analytics
WSO2 Complex
Event Processor
(CEP)
Realtime Analytics
only
WSO2 Machine
Learner
Predictive Analytics
only