SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Mehmet Bakkaloglu
Business Insights & Analytics, Hitachi Consulting
Team: Siddhartha Mohapatra, Vaughan Rees, John Shiangoli, Sidhartha Mahapatro
SQLBits Conference
7 April 2017
Near Real-Time IoT Analytics of Pumping
Stations in PowerBI
Contents
• Background
• Challenge
• Dashboards
• Solution
• Future improvements
• Comparison to other options
Waste Water Network
PS: Pumping Station
CSO: Combined Sewer
Overflow
WWTW: Waste Water
Treatment Works
Objective: Prevent Waste Water Spills
• Spills are damaging to the environment and may incur a fine from the
Environment Agency
The Data
• ~600 Waste Water Treatment Plants, ~900 Pumping Stations, ~250 CSOs
• Sites have different architectures, different sets of IoT sensors (SCADA
signals), different naming conventions for sensors, and some of them 100s of
sensors
• Analogue signals up to every 15 min
• Digital signals only when there is a change from 1 to 0 and 0 to 1 (therefore
there can be a long gap between receiving a signal)
• This particular solution currently has 45 million rows (100 sites, Aug 16 – Mar
17). With new sites added, it will be around 400 million rows (600 sites, 1
years’ data). For the purposes of this session
Site Names, Catchment Names,
Beach Names have been
masked, and postcodes mapped
to random Scottish postcodes
Pumping Stations
• Pumping Station sites can have multiple wet wells and/or storm water tanks
• Simplified diagrams showing the IoT sensors:
Challenge
• How to produce a generic set of dashboards for pumping stations that will:
• Show us the likelihood of a spill at a pumping station
• Help us investigate the cause of a spill, including going back in history
• Show the asset status
 With access to only flat files of IoT data (SCADA)
 Using Azure & PowerBI
 In near real-time (~15 minutes latency)
 With compelling visualisations
PowerBI Dashboards
Current Status
Dashboard showing the
likelihood of a spill at a
site
• Used by site
engineers and
operators
• Shows the logic
used to derive
likelihood of spill
• Can also filter by
beach to determine
whether spills are
affecting a particular
beach
• Can be used on
mobile devices as
well
Details (1)
Dashboard allowing users
to identify the cause of a
spill
Example:
• It might be that pump
was not running when
it should have been,
which would indicate
an issue with the pump
• It might be that when
there is more than one
pump running the flow
did not increase, which
might indicate a
blockage e.g. dead
animal or garbage
• etc.
Spilling
Heavy Rain
Both pumps on
Details (2)
Why is there no flow?
Why is pump 3 not running?
No signal from pump 3
Details (3)
Why is it above consented flow
even though level is not high?
Signal Status
• Shows whether there
was a signal in the last
day
• If there was not a
signal in the last day
this could mean there
is need for
maintenance
• Last signal value vs
signal average
Architecture
• How do we fill in the gap to meet the requirements?
Flat files
(SCADA
data)
?
On-premise
First set of requirements to be addressed
• Names of signals from different sites need to be
standardised
• In some cases, due to the architecture of sites,
signals from multiple sites need to be reported as one
site
• Not all sites have rainfall data, therefore we need to
map it from the nearest site
• For rainfall level use average of last one hour of
rainfall voltage
• Likelihood of a spill at a pumping station requires
complex logic
• Visualise Pump Status properly
Likelihood of a spill
Working with site engineers, analysing historical data,
and based on available sensors:
Around 100 sites classified into 6 types:
Around 300 rules defined
using various signals and
thresholds to determine RAG
Status and Risk Level
Some spills are
allowed e.g. if it is
raining heavily, but
other spills are not
allowed. RAG
Status identifies this.
• Is there a Flow signal?
• Flow vs Consented Flow?
• Rainfall vs Threshold?
• Wet Well Level vs Spill Level vs Thresholds?
• Pump(s) running?
• Is there a Tide Meter?
• Tide increasing/decreasing?
• Storm pump(s) running?
• Is there an inhibitor?
• Is there a Pumped Overflow signal?
Likelihood of a spill
We need a platform that
enables us to implement such
a rules engine in near real-time
Visualise Pump Status
• Pump Status is a digital signal
that comes only when there is a
change from 1 to 0 and 0 to 1
• It does not come at a regular
interval
• In order to visualise it and to be
able to filter properly, generate
data points at 5 minute intervals
and at end points
• While this is not difficult to do,
we need a platform that
enables us to generate data
points in near real-time
Original
data points
visualised
differently
With
generated
data points
2 pumps
(option 1)
2 pumps
(option 2)
Filtering on > 14:15 would show
no pump status until 15:00
Zeros are missing
Original
data points
Where to implement these requirements
 Easiest choice: Do it in a Data Warehouse – but what about latency?
a) Data Warehouse
b) Stream Analytics
c) Storm on HDInsight
d) Spark on HDInsight
e) Somewhere else?
Azure SQL
Data
Warehouse
Basic Architecture
Flat files
(SCADA
data)
Azure Data Factory
Azure
Blob
Storage
SSIS
Stored
Procs
Load into DW using PolyBase
• Fastest way of loading
• Rate increases when DWU
is increased
• This is not the case for other
options such as BCP, ADF,
SSIS
Use SSIS Azure Blob Upload Task
• Can do transformations, conditional
tasks, and connect to various sources
• Alternatives: AZCopy (command-line
utility), ADF Copy (transformation not
possible & Data Management Gateway
required), BCP (for small files)
1
2
3
3 issues
causing us
not to meet
latency
requirement
(~15 min)
On-premise Cloud
Improvement 1 – Azure Tabular Model
• PowerBI DirectQuery is unacceptably slow, has
functional limitations, creates load on DW
• PowerBI Import Mode even with Pro licence can
only be refreshed up to 8 times a day, has a size
limit, and no partitioning
 How can we make this near real-time?
• Use Azure Analysis Services Tabular Model – we
can process it as frequently as we wish
• Improve processing performance by partitioning
Tabular Model into current and historical
• Live Connection from PowerBI to Tabular Model
works super fast
(Azure Analysis Services came out in October 2016 & is still in preview)
Connection
Options
How to process partitions of Azure Tabular Model &
streamline with ETL process
using Microsoft.AnalysisServices.Tabular;
using static Microsoft.AnalysisServices.Tabular.Database;
...
public IDictionary<string, string> Execute(
IEnumerable<LinkedService> linkedServices,
IEnumerable<Dataset> datasets,
Activity activity,
IActivityLogger logger)
{
...
if (currentDateTime.Date == lastProcessDate.Date)
{
// If during the day, process only current partition
model.Tables[tableName[i]].Partitions[0].RequestRefresh(RefreshType.Full);
...
}
else
{
// If new day, process full
model.RequestRefresh(RefreshType.Full);
...
}
...
}
• No out-of-the-box way of automating the processing of Azure Tabular Model yet
• Do it in C# using Tabular Object Model libraries (TOM) and run from Azure Data Factory
using Azure Batch
How to run C# from ADF using Azure
Batch
1. Create a .NET Class
Library project with just the
Execute method of the
IDotNetActivity interface
2. Build it, create a zip file of the
binaries, and upload to Azure Blob
Storage
3. Create an Azure Batch account and
pool
4. Add a pipeline to Azure Data
Factory solution to run the C# code
using Azure Batch
https://docs.microsoft.com/en-
us/azure/data-factory/data-factory-use-
custom-activities
Process Tabular Model from Azure Data Factory
Add pipeline to ADF to process Tabular Model
Improvement 2 – Archive Blob Files
With data growing rapidly, even using PolyBase,
loading data into Azure Data Warehouse physical
table slows down
 Archive Blob Files that have been loaded into Data
Warehouse physical table, so the external table points
to only the new files
 Archiving is done from SSIS retrospectively
(Alternative would be to define PolyBase on a file (rather than a
folder) but it would have to be on the fly)
CREATE EXTERNAL TABLE [stg].[iSCADADigital]
(
[dv_id] [bigint] NULL,
[db_addr] [int] NULL,
[time] [datetime2](7) NULL,
[value] [bit] NULL,
...
...
)
WITH
( DATA_SOURCE = [iSCADAAzureStorage],
LOCATION = N'digital/',
FILE_FORMAT = [PipeDelimitedText],
REJECT_TYPE = VALUE,
REJECT_VALUE = 0)
PolyBase
External
Table
Fact
(Physical
Table)
Dimension
(Physical
Table)
File1
File 2
File N
Azure Blob
Storage
Azure Data Warehouse
Improvement 3 – Streamline SSIS & ADF
• Minimise latency between SSIS completion and ADF start
• Create a Stored Proc that will be the first task in ADF that will check for the
completion of SSIS upload of files into Azure Blob
CREATE PROC [ctl].[iSCADAStartProcess] AS
BEGIN
...
-- Check Upload Status
WHILE ISNULL(@ProcessStatus, '') = ''
BEGIN
SET @ProcessStatus = (SELECT TOP 1 ProcessStatus
FROM ctl.ProcessAudit
WHERE ProcessName = ‘SSISUpload'
AND ProcessStatus = 'END');
END
-- Now log the start of Azure Data Factory Phase
-- And load data into Azure Data Warehouse
...
END
Azure SQL
Data
Warehouse
Architecture
Flat files
(SCADA
data)
Azure Analysis
Services
Tabular Model
Azure Data Factory
Azure
Blob
Storage
SSIS
Stored
Procs
Load data
using
PolyBase
Live
Connection
Archive
historical blobs
for faster load
Streamline
by adding
handshake
Partition Tabular Model
• Current Partition
• Historical Partition
Process with Tabular Object
Model (TOM)
• Current Partition processed
at the end of each run
• Full process at midnight
Azure Blob
Upload Task
Every 15
minutes
C#
Script
Microsoft
recommended
minimum for ADF
CloudOn-premise
How to improve this solution
• As data grows, may need a more sophisticated partitioning scheme for Tabular
Model. Currently 45 million rows (Aug 16 – Mar 17). With new sites added, it will
be around 400 million rows (600 sites, 1 years’ data).
• Automatically scale up Azure Analysis Services when doing full processing
overnight, and then scale down
• Use machine learning to find the correlation between signals – this could help
improve the logic to predict the likelihood a spill (Utilising Azure Stream Analytics
& Azure Machine Learning)
• Store historical RAG Status and Risk Level, and use Machine Learning to
predict future RAG Status and Risk Level
• Other ideas:
• Detect pump blockages from Flow & Pump Status
• Pump energy use vs Flow rate – determine whether pumps are efficient or need to
be serviced
Comparison to alternative solutions
Solution Handle
complex rules
Handle out of
order events
Latency Visualisations Cope with
large
amounts of
data
Historical data
analysis
alongside near
real-time data
Scale up
& down
Cost
Azure Data Warehouse &
Analysis Services
Yes Yes ~ 15 min
(Microsoft
recommended
minimum for
ADF)
PowerBI Yes Yes Yes Pay-as-you-go
(PaaS)
Azure Stream Analytics May struggle
(SQL &
reference data
from Azure Blob
but no UDFs,
no extensible
code)
Yes Low PowerBI Yes Results need to
be stored in DW
Yes Pay-as-you-go
(PaaS)
Storm on HDInsight Code in Java or
C#
Has to be
implemented
Low PowerBI Yes (Very
Large)
Results need to
be stored
Yes Pay-as-you-go
(PaaS)
Spark on HDInsight (Spark
Streaming)
Scala or Java Yes (by
batching data)
Batching adds
some latency
PowerBI Yes (Very
Large)
Results need to
be stored
Yes Pay-as-you-go
(PaaS)
On-premises SQL Server &
Analysis Services
Yes Yes Low (Although
loading data
may be slow)
SSRS or with on-
premises gateway in
PowerBI but may need
ExpressRoute for
performance
Not as
powerful as
Azure DW
Yes No High initial
setup cost, and
later upgrade
cost
A lot more can be done with Waste Water IoT data
• Another similar but larger project we did was on Waste Water Treatment Works
• Common dashboards for all sites to analyse performance, identify issues, prevent failures
• Dashboard for each phase of treatment: Final Effluent  Inlet & PST  Filters & HST  ASP  FST etc.
Contact
LinkedIn: https://www.linkedin.com/in/mehmet-bakkaloglu-0a328010/
Near Real-Time IoT Analytics of Pumping Stations in PowerBI

Contenu connexe

Tendances

Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Mayank Shrivastava
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowDataWorks Summit
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan BlueDatabricks
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergAnant Corporation
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadataLouis liu
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark Summit
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions Yugabyte
 

Tendances (20)

Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan Blue
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadata
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
KFServing and Feast
KFServing and FeastKFServing and Feast
KFServing and Feast
 

Similaire à Near Real-Time IoT Analytics of Pumping Stations in PowerBI

Azure stream analytics by Nico Jacobs
Azure stream analytics by Nico JacobsAzure stream analytics by Nico Jacobs
Azure stream analytics by Nico JacobsITProceed
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
CosmosDB for IoT Scenarios
CosmosDB for IoT ScenariosCosmosDB for IoT Scenarios
CosmosDB for IoT ScenariosIvo Andreev
 
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology EnthusiastsNebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology EnthusiastsNebula Graph
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperDerek Diamond
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Tal Bar-Zvi
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca SartoriCCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartoriwalk2talk srl
 
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon_Org Team
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Lucas Jellema
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptQingsong Yao
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Antonios Chatzipavlis
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPTathagata Das
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Summit
 

Similaire à Near Real-Time IoT Analytics of Pumping Stations in PowerBI (20)

Azure stream analytics by Nico Jacobs
Azure stream analytics by Nico JacobsAzure stream analytics by Nico Jacobs
Azure stream analytics by Nico Jacobs
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
CosmosDB for IoT Scenarios
CosmosDB for IoT ScenariosCosmosDB for IoT Scenarios
CosmosDB for IoT Scenarios
 
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology EnthusiastsNebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White Paper
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca SartoriCCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
 
OOW13 Exadata and ODI with Parallel
OOW13 Exadata and ODI with ParallelOOW13 Exadata and ODI with Parallel
OOW13 Exadata and ODI with Parallel
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
 
Oow2016 review-db-dev-bigdata-BI
Oow2016 review-db-dev-bigdata-BIOow2016 review-db-dev-bigdata-BI
Oow2016 review-db-dev-bigdata-BI
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 

Dernier

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Dernier (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Near Real-Time IoT Analytics of Pumping Stations in PowerBI

  • 1. Mehmet Bakkaloglu Business Insights & Analytics, Hitachi Consulting Team: Siddhartha Mohapatra, Vaughan Rees, John Shiangoli, Sidhartha Mahapatro SQLBits Conference 7 April 2017 Near Real-Time IoT Analytics of Pumping Stations in PowerBI
  • 2. Contents • Background • Challenge • Dashboards • Solution • Future improvements • Comparison to other options
  • 3. Waste Water Network PS: Pumping Station CSO: Combined Sewer Overflow WWTW: Waste Water Treatment Works
  • 4. Objective: Prevent Waste Water Spills • Spills are damaging to the environment and may incur a fine from the Environment Agency
  • 5. The Data • ~600 Waste Water Treatment Plants, ~900 Pumping Stations, ~250 CSOs • Sites have different architectures, different sets of IoT sensors (SCADA signals), different naming conventions for sensors, and some of them 100s of sensors • Analogue signals up to every 15 min • Digital signals only when there is a change from 1 to 0 and 0 to 1 (therefore there can be a long gap between receiving a signal) • This particular solution currently has 45 million rows (100 sites, Aug 16 – Mar 17). With new sites added, it will be around 400 million rows (600 sites, 1 years’ data). For the purposes of this session Site Names, Catchment Names, Beach Names have been masked, and postcodes mapped to random Scottish postcodes
  • 6. Pumping Stations • Pumping Station sites can have multiple wet wells and/or storm water tanks • Simplified diagrams showing the IoT sensors:
  • 7. Challenge • How to produce a generic set of dashboards for pumping stations that will: • Show us the likelihood of a spill at a pumping station • Help us investigate the cause of a spill, including going back in history • Show the asset status  With access to only flat files of IoT data (SCADA)  Using Azure & PowerBI  In near real-time (~15 minutes latency)  With compelling visualisations
  • 9. Current Status Dashboard showing the likelihood of a spill at a site • Used by site engineers and operators • Shows the logic used to derive likelihood of spill • Can also filter by beach to determine whether spills are affecting a particular beach • Can be used on mobile devices as well
  • 10. Details (1) Dashboard allowing users to identify the cause of a spill Example: • It might be that pump was not running when it should have been, which would indicate an issue with the pump • It might be that when there is more than one pump running the flow did not increase, which might indicate a blockage e.g. dead animal or garbage • etc. Spilling Heavy Rain Both pumps on
  • 11. Details (2) Why is there no flow? Why is pump 3 not running? No signal from pump 3
  • 12. Details (3) Why is it above consented flow even though level is not high?
  • 13. Signal Status • Shows whether there was a signal in the last day • If there was not a signal in the last day this could mean there is need for maintenance • Last signal value vs signal average
  • 14. Architecture • How do we fill in the gap to meet the requirements? Flat files (SCADA data) ? On-premise
  • 15. First set of requirements to be addressed • Names of signals from different sites need to be standardised • In some cases, due to the architecture of sites, signals from multiple sites need to be reported as one site • Not all sites have rainfall data, therefore we need to map it from the nearest site • For rainfall level use average of last one hour of rainfall voltage • Likelihood of a spill at a pumping station requires complex logic • Visualise Pump Status properly
  • 16. Likelihood of a spill Working with site engineers, analysing historical data, and based on available sensors: Around 100 sites classified into 6 types: Around 300 rules defined using various signals and thresholds to determine RAG Status and Risk Level Some spills are allowed e.g. if it is raining heavily, but other spills are not allowed. RAG Status identifies this. • Is there a Flow signal? • Flow vs Consented Flow? • Rainfall vs Threshold? • Wet Well Level vs Spill Level vs Thresholds? • Pump(s) running? • Is there a Tide Meter? • Tide increasing/decreasing? • Storm pump(s) running? • Is there an inhibitor? • Is there a Pumped Overflow signal?
  • 17. Likelihood of a spill We need a platform that enables us to implement such a rules engine in near real-time
  • 18. Visualise Pump Status • Pump Status is a digital signal that comes only when there is a change from 1 to 0 and 0 to 1 • It does not come at a regular interval • In order to visualise it and to be able to filter properly, generate data points at 5 minute intervals and at end points • While this is not difficult to do, we need a platform that enables us to generate data points in near real-time Original data points visualised differently With generated data points 2 pumps (option 1) 2 pumps (option 2) Filtering on > 14:15 would show no pump status until 15:00 Zeros are missing Original data points
  • 19. Where to implement these requirements  Easiest choice: Do it in a Data Warehouse – but what about latency? a) Data Warehouse b) Stream Analytics c) Storm on HDInsight d) Spark on HDInsight e) Somewhere else?
  • 20. Azure SQL Data Warehouse Basic Architecture Flat files (SCADA data) Azure Data Factory Azure Blob Storage SSIS Stored Procs Load into DW using PolyBase • Fastest way of loading • Rate increases when DWU is increased • This is not the case for other options such as BCP, ADF, SSIS Use SSIS Azure Blob Upload Task • Can do transformations, conditional tasks, and connect to various sources • Alternatives: AZCopy (command-line utility), ADF Copy (transformation not possible & Data Management Gateway required), BCP (for small files) 1 2 3 3 issues causing us not to meet latency requirement (~15 min) On-premise Cloud
  • 21. Improvement 1 – Azure Tabular Model • PowerBI DirectQuery is unacceptably slow, has functional limitations, creates load on DW • PowerBI Import Mode even with Pro licence can only be refreshed up to 8 times a day, has a size limit, and no partitioning  How can we make this near real-time? • Use Azure Analysis Services Tabular Model – we can process it as frequently as we wish • Improve processing performance by partitioning Tabular Model into current and historical • Live Connection from PowerBI to Tabular Model works super fast (Azure Analysis Services came out in October 2016 & is still in preview) Connection Options
  • 22. How to process partitions of Azure Tabular Model & streamline with ETL process using Microsoft.AnalysisServices.Tabular; using static Microsoft.AnalysisServices.Tabular.Database; ... public IDictionary<string, string> Execute( IEnumerable<LinkedService> linkedServices, IEnumerable<Dataset> datasets, Activity activity, IActivityLogger logger) { ... if (currentDateTime.Date == lastProcessDate.Date) { // If during the day, process only current partition model.Tables[tableName[i]].Partitions[0].RequestRefresh(RefreshType.Full); ... } else { // If new day, process full model.RequestRefresh(RefreshType.Full); ... } ... } • No out-of-the-box way of automating the processing of Azure Tabular Model yet • Do it in C# using Tabular Object Model libraries (TOM) and run from Azure Data Factory using Azure Batch How to run C# from ADF using Azure Batch 1. Create a .NET Class Library project with just the Execute method of the IDotNetActivity interface 2. Build it, create a zip file of the binaries, and upload to Azure Blob Storage 3. Create an Azure Batch account and pool 4. Add a pipeline to Azure Data Factory solution to run the C# code using Azure Batch https://docs.microsoft.com/en- us/azure/data-factory/data-factory-use- custom-activities
  • 23. Process Tabular Model from Azure Data Factory
  • 24. Add pipeline to ADF to process Tabular Model
  • 25. Improvement 2 – Archive Blob Files With data growing rapidly, even using PolyBase, loading data into Azure Data Warehouse physical table slows down  Archive Blob Files that have been loaded into Data Warehouse physical table, so the external table points to only the new files  Archiving is done from SSIS retrospectively (Alternative would be to define PolyBase on a file (rather than a folder) but it would have to be on the fly) CREATE EXTERNAL TABLE [stg].[iSCADADigital] ( [dv_id] [bigint] NULL, [db_addr] [int] NULL, [time] [datetime2](7) NULL, [value] [bit] NULL, ... ... ) WITH ( DATA_SOURCE = [iSCADAAzureStorage], LOCATION = N'digital/', FILE_FORMAT = [PipeDelimitedText], REJECT_TYPE = VALUE, REJECT_VALUE = 0) PolyBase External Table Fact (Physical Table) Dimension (Physical Table) File1 File 2 File N Azure Blob Storage Azure Data Warehouse
  • 26. Improvement 3 – Streamline SSIS & ADF • Minimise latency between SSIS completion and ADF start • Create a Stored Proc that will be the first task in ADF that will check for the completion of SSIS upload of files into Azure Blob CREATE PROC [ctl].[iSCADAStartProcess] AS BEGIN ... -- Check Upload Status WHILE ISNULL(@ProcessStatus, '') = '' BEGIN SET @ProcessStatus = (SELECT TOP 1 ProcessStatus FROM ctl.ProcessAudit WHERE ProcessName = ‘SSISUpload' AND ProcessStatus = 'END'); END -- Now log the start of Azure Data Factory Phase -- And load data into Azure Data Warehouse ... END
  • 27. Azure SQL Data Warehouse Architecture Flat files (SCADA data) Azure Analysis Services Tabular Model Azure Data Factory Azure Blob Storage SSIS Stored Procs Load data using PolyBase Live Connection Archive historical blobs for faster load Streamline by adding handshake Partition Tabular Model • Current Partition • Historical Partition Process with Tabular Object Model (TOM) • Current Partition processed at the end of each run • Full process at midnight Azure Blob Upload Task Every 15 minutes C# Script Microsoft recommended minimum for ADF CloudOn-premise
  • 28. How to improve this solution • As data grows, may need a more sophisticated partitioning scheme for Tabular Model. Currently 45 million rows (Aug 16 – Mar 17). With new sites added, it will be around 400 million rows (600 sites, 1 years’ data). • Automatically scale up Azure Analysis Services when doing full processing overnight, and then scale down • Use machine learning to find the correlation between signals – this could help improve the logic to predict the likelihood a spill (Utilising Azure Stream Analytics & Azure Machine Learning) • Store historical RAG Status and Risk Level, and use Machine Learning to predict future RAG Status and Risk Level • Other ideas: • Detect pump blockages from Flow & Pump Status • Pump energy use vs Flow rate – determine whether pumps are efficient or need to be serviced
  • 29. Comparison to alternative solutions Solution Handle complex rules Handle out of order events Latency Visualisations Cope with large amounts of data Historical data analysis alongside near real-time data Scale up & down Cost Azure Data Warehouse & Analysis Services Yes Yes ~ 15 min (Microsoft recommended minimum for ADF) PowerBI Yes Yes Yes Pay-as-you-go (PaaS) Azure Stream Analytics May struggle (SQL & reference data from Azure Blob but no UDFs, no extensible code) Yes Low PowerBI Yes Results need to be stored in DW Yes Pay-as-you-go (PaaS) Storm on HDInsight Code in Java or C# Has to be implemented Low PowerBI Yes (Very Large) Results need to be stored Yes Pay-as-you-go (PaaS) Spark on HDInsight (Spark Streaming) Scala or Java Yes (by batching data) Batching adds some latency PowerBI Yes (Very Large) Results need to be stored Yes Pay-as-you-go (PaaS) On-premises SQL Server & Analysis Services Yes Yes Low (Although loading data may be slow) SSRS or with on- premises gateway in PowerBI but may need ExpressRoute for performance Not as powerful as Azure DW Yes No High initial setup cost, and later upgrade cost
  • 30. A lot more can be done with Waste Water IoT data • Another similar but larger project we did was on Waste Water Treatment Works • Common dashboards for all sites to analyse performance, identify issues, prevent failures • Dashboard for each phase of treatment: Final Effluent  Inlet & PST  Filters & HST  ASP  FST etc.