In many database applications we first log data and then, a few hours or days later, we start analyzing it. But in a world that’s moving faster and faster, we sometimes need to analyze what is happening NOW.
Azure Stream Analytics allows you to analyze streams of data via a new Azure service. In this session you will see how to get started using this new service. From event hubs on the input side over temporal SQL queries: the demo’s in this session will show you end to end how to get started with Azure Stream Analytics.
How to Troubleshoot Apps for the Modern Connected Worker
Azure stream analytics by Nico Jacobs
1. Azure Stream Analytics
Dr. Nico Jacobs, nico@ .be, @SQLWaldorf
Tweet and win an Ignite 2016 ticket #itproceed
2. Why
• Traditional Business Intelligence first collects data and
analyzes it afterwards
– Typically 1 day latency
• But we live in a fast paced world
– Social media
– Internet of Things
– Just-in-time production
• We want to monitor and analyze streams of data in
near real time
– Typically a few seconds up to a few minutes latency
3. A different kind of query
• Traditional querying assumes the data doesn’t
change while you are querying it:
We query a fixed state
– If the data is changing: snapshots and transactions
‘freeze’ the data while we query it
– Since we query a finite state, our query should finish
in a finite amount of time
table
query
result
table
14
4. A different kind of query
• When analyzing a stream of data, we deal with
a potential infinite amount of data
• As a consequence our query will never end!
• To solve this problem most queries will use
time windows
stream
temporal
query result
stream
12:15:00 1
12:15:10 3
12:15:20 2
…
5. Azure Stream Analytics
• In Azure Stream Analytics we create, manage
and run jobs
• Every job has at least one input, one query and
one output
• But jobs can be more complex: a query can
read from different inputs and write to multiple
outputs
QueryInput Output
Query
6. Inputs
• Currently two types of input supported
– Data Stream: an Azure Event Hub or Azure Blob
through which we receive a stream of data
– Reference Data: an Azure Blob for static reference
data (lookup ‘table’)
• No support for Azure databases or other cloud
storage (yet)
7. Temporal query
• Query is written in SQL!
– No Java or .Net coding skills needed
• Mainly a subset of T-SQL
• A few extra keywords are added to deal
with temporal queries
8. Output
• Results are stored either in
– Azure Blob storage: creates log files with temporal query results
• Ideal for archiving
– SQL database: Stores results in Azure SQL Database table
• Ideal as source for traditional reporting and analysis
– Event hub: Sends an event to an event hub
• Ideal to generate actionable events such as alerts or notifications
– Azure Table storage:
• More structured than blob storage, easier to setup than SQL database and
durable (in contrast to event hub)
– PowerBI.com:
• Ideal for near real time reporting!
9. Time for action!
• Online feedback on this talk
• Browse to itprofeed.azurewebsites.net
Event hub
Azure
Stream
Analytics
PowerBI.com
10. Demos
1. Create an Azure Service Bus Event Hub
2. Implement applications to send data into the
Event Hub
3. Create an Azure Stream Analytics job
4. Link the input
5. Create an output
6. Write and test a query
7. Start the job
11. Create Azure Event Hub
• Azure event hub is newest component in
Azure Service Bus
• Typically used to collect sensor and app
data
• Event hub collects and temporary stores
thousands of events per second
13. Create Azure Stream Analytics job
• Currently only available
in the old Azure portal
• Preferably put it in the
same region as Event
Hub and data storage
14. Link the input
• Event hub does not assume any data format
• But stream analytics needs to parse the data
• Three data formats supported: JSON, CSV and
Apache Avro (binary JSON)
• No columns specified
15. Create an output
• Five output options: Azure Table or Blob, SQL
Database, Event Hub or PowerBI.com
• Blob and event hub do not require predefined
meta-data
– Again: CSV, JSON and Avro supported
• When storing information in a SQL Database or
Azure Table storage we need to create upfront the
table in which we will store the results
– Meta-data needed upfront
16. Create Query
• In a query window we can write two types of
statements:
– SELECT statement to extract a stream of results
from one or more input streams
• Required
• Can use WITH clause to write more complex constructs
or increase parallelism
– CREATE TABLE statements to specify type
information on our input stream(s)
17. Simple SELECT statement
• SELECT <fields> | * FROM <input> [WHERE
<condition>]
– This query simply produces a filtered output-
stream based on the input stream
– In the SELECT statement and WHERE clause we
can use functions such as DATEDIFF
– But many functions from T-SQL are not available
• E.g. we can use CAST but not CONVERT
18. Testing a query
• Trial and error query development would be slow:
– Starting a Stream Analytics job takes some minutes
– Inspecting the outcome of a job means checking
tables or blobs
– We cannot modify a query while it is running
• Luckily when a job is stopped, we can run a query
on data from a JSON text file and see the outcome
in the browser
– There is even a ‘sample input’ option
19. Data types
• Very simple type system:
– Bigint
– Float
– Nvarchar(max)
– Datetime
• Inputs will be casted into one of these types
• We can control these types with a CREATE TABLE
statement:
– This does not create a table, but just a data type mapping
for the inputs
20. Group by
• Group by returns data aggregated over a certain subset of
data
• How to define a subset in a stream?
• Windowing functions!
– Each Group By requires a windowing function
(fromMSDN)
22. Timestamp by
• A record can have multiple timestamps associated with
them
– E.g. the time a phone call starts, ends, is submitted to the
event hub, is processed by Azure Stream Analytics, …
– By default the timestamp used in the temporal SQL queries
is System.Timestamp
• Event hub arrival time
• Blob last modified data
– But we can include an explicit timestamp in the data we
provide. In that case we must follow the FROM in our
temporal query with TIMESTAMP BY <fieldname>
23. JOIN
• We can combine multiple event streams or an event
stream with reference data via a join (inner join) or a left
outer join
• In the join clause we can specify the time window in
which we want the join to take place
– We use a special version of DateDiff for this
24. INTO clause
• We can have multiple outputs
• Without INTO clause we write to destination
named ‘output’
• With INTO clause we can choose for every
select the appropriate destination
– E.g. send events to blob storage for big data
analysis, but send special events to event hub for
alerting
25. Out of order inputs
• What if event 6:54:32 arrives after event
6:55:55?
– Trick: buffer your data for n minutes: all
events that arrive less than n minutes late
will be processed (tolerance window)
– What do we do with everything that arrives
more then n minutes late? Do we skip them
(drop) or do we pretend they happened just
now (adjust)?
26. Scaling
• By default every job consists of 1 streaming unit
• A streaming unit can process up to 1 Mb / second
• When higher throughput is needed we can activate
up to 6 streaming units per regular query
• If your input is a partitioned event hub, we can
write partitioned queries and partitioned
subqueries (WITH clause)
• A non-partitioned query with a 3-fold partitioned
subquery can have (1+3) * 4 = 24 streaming units!
27. Pricing
• Azure Stream Analytics
• 0.55 € per streaming unit per day (+- 17 €/month)
• 0.0008 € per Gb throughput
• So, when processing about 10 million
events at a max. rate of 1 Mb/sec. this
costs less than 18 € a month
28. Machine Learning
• Sensor thresholds are not always constant
• But Azure can ‘learn’ which values
preceded issues Azure Machine Learning
29.
30. Summary
• Azure Stream Analytics is a PaaS version of
StreamInsight
– Process stream of events via temporal queries
• Supports multiple input and output formats
• Scales to large volumes of events
• Temporal queries are written in SQL variant
31. And win a Lumia 635
Feedback form will be sent to you by email
Give me (more) feedback