WSO2 Data Analytics Server is a comprehensive enterprise data analytics platform; it fuses batch and real-time analytics of any source of data with predictive analytics via machine learning.
2. WSO2 Analytics Platform
WSO2 Analytics Platform uniquely combines simultaneous real-
time and batch analysis with predictive analytics to turn data
from IoT, mobile and Web apps into actionable insights
2
4. Analytics Strategy
• We deliver a single platform to address all analytics styles - This was driven
by the increasing market requirement to expand analytics in enterprises
beyond pure BI and start exploiting big data in real time.
• We deliver together
• Batch Analytics: analysis on data at-rest, running typically every hour or
every day, and focused on historical dashboards and reports.
• Real time Analytics: analyze event streams in real-time and detect
patterns and conditions.
• Predictive Analytics: leverage machine learning to create a mathematical
model allowing to predict future behavior.
• Interactive Analytics: execute queries on the fly on top of data at rest.
4
5. Analytics Strategy
• Focus on supporting high-level, SQL query-like languages across the analytics
platform
• No Java programming involved
• Lowest learning curve
• Client Applications are agnostic of the part of the platform being used, so
customers can increase their usage of the platform without changing their apps.
• Common set of receivers/publishers for all analytics types
• Common format for events
• Leverage leading open source projects such as Storm and Spark and contribute
back (such as Siddhi).
• Even if they are packaged together, each component of the platform can scale
independently
5
6. Key Differentiators
• Open Source, under Apache 2 license
• Integrated Batch, Streaming, Interactive and Predictive Analytics
• Rich, extensible, SQL-like configuration language
• Rich set of data connectors, which can be easily extended
• Events only need to be published once from applications to the platform, and can
be consumed by batch or real time pipeline.
• Part of the overall WSO2 platform
6
7. Key Differentiators
• Rich set of data connectors, which can be easily extended
• Integrated with batch analytics (same receivers/publishers architecture)
• Events only need to be published once from applications to the platform, and can
be consumed by batch or real time pipeline.
• Performance on single node satisfies 90% of use cases
7
8. Market Recognition
• Named as a Strong Performer in The Forrester Wave™: Big Data Streaming
Analytics, Q1 2016.
• Highest score possible in 'Acquisition and Pricing' criteria, and among second-
highest scores in 'Ability to execute' criteria
• The Forrester Report notes…..
“WSO2 is an open source middleware provider that includes a full spectrum of architected-as-
one components such as application servers, message brokers, enterprise service bus, and many
others.
Its streaming analytics solution follows the complex event processor architectural approach, so it
provides very low-latency analytics. Enterprises that already use WSO2 middleware can add CEP
seamlessly. Enterprises looking for a full middleware stack that includes streaming analytics will
find a place for WSO2 on their shortlist as well.”
9. IoT / Edge Analytics
• We provide a solid foundation for an IoT analytics
solution, should it be for device manufacturers or
device users
• Customers can today:
• React in a few hours, a few mins or a few ms to a
condition, leveraging batch and streaming analytics.
• Implement closed loop control (autonomic
computing) leveraging Machine Learning.
• Embed streaming engine in IoT devices or gateways
• Use a SDK and data agent to directly publish events at
the device hardware level.
9
Reference: https://iwringer.wordpress.com/2015/10/15/thinking-deeply-about-iot-analytics/
11. Smart Home
• DEBS (Distributed Event Based Systems) is a premier academic
conference, which post yearly event processing challenge (http:
//www.cse.iitb.ac.in/debs2014/?page_id=42)
• Smart Home electricity data: 2000 sensors, 40 houses, 4 Billion
events
• We posted fastest single node solution measured (400K events/sec)
and close to one million distributed throughput.
• WSO2 CEP based solution is one of the four finalists (with Dresden
University of Technology, Fraunhofer Institute, and Imperial College
London)
• Only generic solution to become a finalist
1
12. Customer Stories
a
12
Experian delivers a digital marketing platform, where CEP plays a key role to analyze in real-time
customers behavior and offer targeted promotions. CEP was chosen after careful analysis, primarily for
its openness, its open source nature, the fact support is driven by engineers and the availability of a
complete middleware, integrated with CEP, for additional use cases.
Eurecat is the Catalunya innovation center (in Spain) - Using CEP to analyze data from iBeacons
deployed within department stores to offer instant rebates to user or send them help if it detected that
they seem “stuck” in the shop area. They chose WSO2 due to real time processing, the variety of IoT
connectors available as well as the extensible framework and the rich configuration language. They
also use WSO2 ESB in conjunction with WSO2 CEP.
Pacific Controls is an innovative company delivering an IoT platform of platforms: Galaxy 2021. The
platform allows to manage all kinds of devices within a building and take automated decisions such as
moving an elevator or starting the air conditioning based on certain conditions. Within Galaxy2021,
CEP is used for monitoring alarms and specific conditions.Pacific Controls also uses other products
from the WSO2 platform, such as WSO2 ESB and Identity Server.
A leading Airlines uses CEP to enhance customer experience by calculating the average time to reach
their boarding gate (going through security, walking, etc.). They also want to track the time it takes to
clean a plane, in order to better streamline the boarding process and notify both the air line and
customers about potential delays. They evaluated WSO2 CEP first as they were already using our
platform and decided to use it as it addressed all their requirements.
13. Cloud IDE Analytics
• Custom solution created in partnership with Codenvy to bring analytics to Codenvy
management team and its customers
• Developed in less than a month, with a custom plug-in to MongoDB.
• Deployed in the codenvy.com platform.
13
14. Healthcare Data Monitoring
• Allows to search/visualize/analyze healthcare records (HL7) across 20 hospitals in
Italy
• Used in combination with WSO2 ESB
• Custom toolbox tailored to customer’s requirement ( to replace existing system)
•
14
15. Data Processing Pipeline
a
15
Collect Data
•Define scheme for
data.
•Send events to batch
and/or Real time
pipeline.
•Publish events.
Analyze
•Spark Sql for batch
analytics.
•Siddhi Query
Language for real time
analytics.
•Predictive models for
Machine Learning.
Communicate
•Alerts
•Dashboards
•API
19. Event Streams
• Event stream is a sequence of events
• Event streams are defined by Stream
Definitions
• Events streams have inflows and
outflows
• Inflows can be from
• Event Receivers
• Execution plans
• Outflows are to
• Event Publishers
• Execution plans
{
'name':'phone.retail.shop', 'version':'1.0.0',
'nickName': 'Phone_Retail_Shop', 'description':
'Phone Sales',
'metaData':[
{'name':'clientType','type':'STRING'}
],
'correlaitonData':[
{'name':’transactionID’,'type':'STRING'}
],
'payloadData':[
{'name':'brand','type':'STRING'},
{'name':'quantity','type':'INT'},
{'name':'total','type':'INT'},
{'name':'user','type':'STRING'}
]
}
20. Data Connectors
• We provide a complete set of data connectors, which customers can enrich.
• The following connectors are available out of the box
• Source : Email, File, HTTP, JMS, Kafka, MQTT, SOAP, WebSocket, Thrift, Binary, Log and JMX
receiver
• Sink : RDBMS, Cassandra, SMS, Email, File, HTTP, JMS, Kafka, MQTT, SOAP, WebSocket,
Thrift, Binary
• Custom connectors can be written in Java - A Sample connector source is available as a
starting point and OOTB connectors source can be used as reference.
• Incoming/outgoing data can be mapped using XPath, regular expressions, or JSON paths.
• Data Connectors are common across the analytics platform.
20
22. Batch Analytics
● Powered by Apache Spark up to 30x higher performance than Hadoop
● Parallel, distributed with optimized in-memory processing
● Scalable script-based analytics written using an easy-to-learn, SQL-like query language
powered by Spark SQL
● Interactive built in web interface (Spark Console) for ad-hoc query execution
● HA/FO supported scheduled query script execution
● Run Spark on a single node, Spark embedded Carbon server cluster or connect to external
Spark cluster
23. Batch Analytics with Spark SQL
create temporary table product_data using carbonanalytics
options (schema …)
create temporary table products using carbonanalytics
options (schema …)
insert into products select product_name from product_data
group by …
23
24. Interactive Analytics
• Full text data indexing support powered by Apache Lucene
• Drill down search support
• Distributed data indexing.
• Designed to support scalability
• Near real-time data indexing and retrieval
• Data indexed immediately as received
• Distributed indexing implementation for scalability
• Index sharding with Lucene indices
25. Data Indexing
• Full text support data indexing powered by Apache Lucene.
• Drill down search support.
• Distributed data indexing.
• Designed to support scalability.
• Near real time data indexing and retrieval.
• Data indexed immediately as received.
25
26. Realtime Analytics
• Process in streaming fashion (one event at a time)
• Execution logic written as Execution Plans
• Execution Plan
• An isolated logical execution unit
• Includes a set of queries, and relates to multiple input and output event
streams
• Executed using dedicated WSO2 Siddhi engine
26
27. CEP Operators with Siddhi
•Filter
from SoftDrinkSales[region == ‘USA’ and quantity > 99] select brand, price, quantity
•Window
from SoftDrinkSales#window.time(1 hour)
from SoftDrinkSales#window.timeBatch(15 min)
from SoftDrinkSales#window.length(100)
•Join
from PizzaOrder#window.time(1h) as o join PizzaDelivery as d
on o.id == d.id
insert into DeliveryTime o.id as id, d.ts-0.ts as ts
28. CEP Operators with Siddhi
•Event Table
Define table CardUserTable (name string, cardNum long) ;
@from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ , table.name = ‘UserTable’,
caching.algorithm’=‘LRU’)
•Sequences
from every a1 = PizzaOder -> a2 = PizzaOder[custid=a1.custid]
•Custom Extentions
Select brand, custom:toUSD(price, currency) as priceInUSD insert into OutputStream ;
29. Operators Summary
a
29
Category Operators
Event Sequencing
e handle out of order events by using a variant of the K-Slack algorithm, which is a
well-known solution to handling disorder in event streams, by buffering data until order
can be guaranteed.
Compensation for missed events is not supported in the current version, but is on the
roadmap. Additionally, we can use filtering to reduce noisy events in a stream (based
on Kalman filter)
Enrichment
Enrichment is done via two ways: event tables to access historical data from any
JDBC data source, and custom extensions to connect to custom source of data, such
as files.
Business Logic
Scripting can be used to add any business logic to any execution plan. JavaScript,
Scala and R are supported out of the box. Additional, customers can easily invoke
custom logic through their own operators.
Transformation
The filter operator can be used to filter streams on a certain set of conditions, which
can be combined via and/or - Conditions can be expressed using mathematical
operators, regular expressions, string manipulation and logical operators. Additional ,
queries allow to select information from input stream, project them to output stream or
new stream, and replace certain elements
30. Operators Summary
a
30
Category Operators
Time Windows
Siddhi provides very strong support for time windows, a domain where an SQL-like query language bring
much simplicity compared to a programing language. Several types of windows are supported, including
sliding and tumbling (batch) windows, time windows starting from a point in time, or CRON-based time
windows. Additionally, we support applying streaming processing to events based on the number of events (
length window), the unicity of events or the frequency of events.
Aggregation/Correlation
Using Join and Pattern operators, we can aggregate and correlate two or more streams of data. Join allows
to join events based on condition, while pattern allows to correlate multiple events based on time, logical
relationship or event counting.
Pattern Matching
We detect patterns based on temporal order (based on arrival order), logical relationship (based or the
logical relationship of 2 events, or counting (to limit the number of events matching the pattern). The pattern
may or may not allow events in between the events the condition. If no foreign event is allowed, the
sequence operator must be used.
Custom
Developers can create their own function, operators , time windows and processing operators. The
extensions are written in Java. Once implemented the operators can be used as any other out of the box
operator or function.
Libraries to support custom operators
Developers use the current operators as reference to develop their own, this is one of the key advantages
with open source distribution. We deliver dozens of extensions on GitHub which can be adapted by 3rd
parties. At the implementation level, implementing an extension just involves extending a well-defined
interface.
Other operators
We support more than 100 custom operators on top of the list above, including geographical operators, for
location-based applications, time series, math, natural language processing, integration with machine
learning models created in PMML or our own Machine Learning product.
31. Predictive Analytics (with WSO2 Machine Learner)
31
• Powered by Apache Spark Mlib
• Manage and explore your data
• Analyze the data using machine learning algorithms
• Build machine learning models
• Compare and manage generated machine learning models
• Predict using the built models
32. Manage Data set
32
• Supported data sources
• CSV/TSV files from local file systems.
• Files from HDFS.
• Tables from WSO2 Data Analytics Server
• Supports data set versioning.
• Version data collected overtime from the same data set
• Generate models from the different versions.
• Manage datasets based on projects ,users.
33. Pre-process & Explore Data
33
• Find key details from feature set
• Scatter plots to understand
relationship between feature set
• Supported graphs:
• Scatter plots, Parallel sets,Trellis charts,
Cluster diagram, Histogram
• Missing value handling with mean
imputation and discard
34. Analysis with ML Algorithm
34
• Supports deep learning
• Supports supervised and unsupervised learning.
• Includes algorithms for numerical prediction, classification and
clustering.
• Supports anomaly detection algorithm.
• Supports recommendation with Collaborative Filtering
Recommendation Algorithm
35. Analysis with ML Algorithm
35
• Includes algorithms for numerical prediction, classification and
clustering.
Numerical prediction Linear Regression, Ridge Regression, Lasso Regression
Classification Logistic Regression, Naive Bayes, Decision Tree,
Random Forest and Support Vector Machines
Clustering K-Means
36. Model Evaluation & Comparison
36
• Evaluate generated models
based on metrics
• Accuracy
• Area under ROC curve
• Confusion Matrix
• Predicted vs. Actual graphs
• Feature importance
• Compare models generated
from different analysis.
• Set fractions for training data
37. Development Tools
• SiddhiTryIt
• Query Editor
• Query verification
• Wizard-like support to create an execution plan
• Event flow viewer
• Events tracer
• Event Simulator
37
41. Activating Statistics and Tracing
• Statistics and Tracing can be activated individually for
• Execution Plans
• Event receivers
• Event publishers
41
45. Queries Dynamic Behavior
• Developers can create dynamic queries leveraging templates
support
• Templates can be deployed from the Execution manager by
authorized personnel.
45
51. Communicate: Alerts
• Detecting conditions can be done via CEP Queries
• Key is the “Last Mile”
• Email
• SMS
• Push notifications to a UI
• Pager
• Trigger physical Alarm
• How?
• Select Email sender “Output Adaptor” from DAS(Real
time profile), or send from DAS (Real time profile) to
ESB, and ESB has lot of connectors
52. Communicate: APIs
• With mobile Apps, most data are exposed and shared as APIs
(REST/Json ) to end users.
• Need to expose analytics results as API
• Following are some challenges
• Security and Permissions
• API Discovery
• Billing, throttling, quotas & SLA
• How?
• Write data to a database from DAS(Realtime profile) event
tables
• Build service via WSO2 Data Services
• Expose as API via API Manager
53. Securing WSO2 DAS
• User Management
• Users are managed through the administration console. Administrators
can create specific groups and assign them to new/existing users. Users
and groups can be stored in LDAP, Active Directory, a database or any
custom user store.
• Permissions are assigned to users to access all or parts of the DAS
artifacts , either via the admin console or via APIs. For example, a user
could have the right to use the simulation tools, view statistics, etc. but
won’t be able to deploy applications.
• Auditing
• All actions performed in the admin console or via CLI can be written to an
external audit log.
53
54. Securing WSO2 DAS
• Event Transmission
• HTTP-based, TCP-based, JMS and binary transports support encryption
(TLS and SSL) both at source and sink level. Receivers can be configured
so that they only accept secure connections.
54
60. Solutions…
• Pre-built solutions by 3rd party
• Apache Eagle: Apache Eagle is an Open Source Monitoring solution,
contributed by eBay Inc, to instantly identify access to sensitive data,
recognize attacks, malicious activities in Hadoop and take actions in real
time.
• Open MRS: OpenMRS is an open source project used to manage electronic
health records.
• Pre-build solutions from us
• Fraud Detection solution, focused on Credit Card fraud.
• GeoDashboard Solution
• Auto-scaling manager for Apache stratos
• Throttling manager for API Management
60
62. Fraud Detection
62
• Use or change the generic rules we
provide and add as many rules as they
like
• Change weights of Fraud Scoring
Model to suit their business needs
• Use the Markov Modelling and
Clustering capabilities to learn
unknown Fraud Patterns in their
domain
• Use the dashboard provided or plug
the Fraud Detection Toolkit to their
own Fraud Detection UI
http://wso2.com/library/webinars/2015/02/catch-them-in-
the-act-fraud-detection-with-wso2-cep-and-wso2-bam/
63. Fleet Management
• Updating the locations in real time and showing the route a device has travelled
• Showing visual indicators to represent the status and for alerts
• Displaying and plotting useful information, such as location, speed, etc
63
http://wso2.com/library/articles/2015/01/article-geo-
spatial-data-analysis-using-wso2-complex-event-
processor-0/
64. Football Game Analysis
• Measures each player’s running speeds and
calculates how long he spent on different
speed ranges
• Calculates the duration each player kept
the ball in their possession throughout the
match
• Detect hits on the ball and detects goals
• Calculate duration each player has spent in
a given position can be derived
http://www.slideshare.net/hemapani/analyzing-a-soccer-game-with-
wso2-cep
64