Stratio Streaming is the result of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP engine as complex event processing engine.
Field Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Spark Summit - Stratio Streaming
2. StratioistheonlyBig Data platformableto combine, in onequery, storeddata withstreamingdata in real-time (in lessthan30 seconds).
Weare polyglotsas well: Weuse SparkovertwonoSQLdatabases, Cassandra& Mongo DB.
4. •
•
•
•
•
Spark Streaming provides a high-level abstraction called discretized stream or DStream, which represents a continuous stream of data, and in fact is represented as a sequence of RDDs, which is Spark’s abstraction of an immutable, distributed dataset.
Shark
(SQL)
Spark
Streaming
Mllib
(machine learning)
GraphX
(graph)
6. •
Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances
CEP as a technique helps discover complex events by analyzing and correlating other events
•
7. A CEP engine should provide operators over streams, keeping in mind that events and streams in a CEP are first-class citizens. In CEP, we think in terms of event streams: event stream is a sequence of events that arrives over time.
Users provide queriesto the CEP engine whose main mission is matching those queries against events coming through event streams.
A CEP engine thus has notion of time and it allows working with temporal queries that reason in terms of temporal concepts, such as “time windows” or “before and after” event relationships… among others
•Filter
•Join
•Aggregation (Avg, Sum , Min, Max, Custom)
•Group by
•Having
•Conditions and Expressions (and, or, not, true/false, ==,!=, >=, >, <=, <),
•Data types (boolean, string, int, long, float, double)
•Pattern processing
•Sequence processing (zero to many, one to many, and zero to one)
8. •You still have to integrate it in your code
•There is nothing like an interactive console
•If you want to do something with the streams, you guessed it, you have to code it!
•There is no way to remotely listen to a stream
•There are no solution patterns ready-to-use with the engine
•No statistics, no auditing
•Hard to integrate with other tools (dashboarding, log stream, batch processing)
10. With this solution you can use our API in
order to request commands to Stratio
Streaming engine in your code.
And you can also work with the interactive
shell in order to test your queries or interact
with the engine on demand.
Both tools, in fact, hide that you are sending
messages to a complex engine, built with
Zookeeper, Kafka, Spark Streaming and Siddhi
CEP Engine.
17. Filtering
Projection
In-builtfunctions
Windows (time and length)
Join
EventSequences
There are a lot of CEP operators that you can use in your queries:
EventPatterns
Output ratelimiting
Customwindows, customfunctions
from sensor_grid#window.length(10) select name, ind, avg(data) as data group by name insert into sensor_grid_avgfor current-events
18. 1.>, <, ==, >=, <=, !=
2.contains, instanceof
3.and, or, not
1.sum, avg, max, min, count: when aggregated (group by, having)
2.Field Type Conversion
3.Coalesce: if field null then takeanother field
4.IsMatch: true or false if match regex
from orders[price >= 20 and price < 100]…
from orders select * insert into ordersB…
from orders select client, price insert into ordersB…
23. Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.
StratioIngestion is an ETL for Big Data product, based on Flume.
Design your workflows (wysiwyg) with useful and improved sources and sinks, transform your data on the fly
•Create the stream if it doesn’t exist
•It is possible to send filtered event-flows only to streaming engine
•Built on the StratioStreaming API.
24. • Call-center Real-time monitoring
Real-time detection of client churn risk
Natural Language Processing Analysis to detect incidents in real-time
Anomaly detection in the service based on patterns
• IT services monitoring
DoS attack detection, hotlinking, etc in real-time
Warnings in monitoring of heterogeneous services
Preventive detection of downtime based on patterns
• Sensor grid monitoring
Alarms when thresholds are reached
Complex alarms involving several sensors
Real-time monitoring (landing support devices in an airport, for example)
Data Machine
Intelligence
25. SELECT sum(order.quantity), company_data.country
FROM streaming.orderWITH WINDOW 15 minutes
INNER JOIN batch.company_data
ON order.company= company_data.company_name; .
•With an powerful query planner
•Able to perform mixed queries with streaming and batch data
SQL query example, mixing real-time data (coming from StratioStreaming Engine) and batch data (stored in a noSQLdatabase)
27. We are first going to use
the Shell to create
streams and queries.