2. EDP Inovação 2
Agenda
1. Introduction to EDP
2. Motivation
3. Project PREDIS – Real time Load and Generation disaggregated forecast
4. EDP Future IT Architecture
5. Conclusions
3. EDP Inovação
EDP Group - from a local electricity incumbent to a global energy player with strong presence in
Europe, Brazil and considerable investments in USA
UK
USA
Canada
Portugal
Brazil
Angola
Spain
Italy
France
Belgium
Poland
Romania
China
中国
# Present in the
Electric Sector in Dow
Jones Sustainability
Indexes
#3 World wind
energy company
#1 Europe
hydro project
(+3,5 GW under
development)
#1 Portugal industrial
group
260 Employees
3 422 Installed Capacity (MW)
9 330 Net Generation (GWh)
100% Generation from renewable sources
USA/ Canada
2 635 Employees
2 831 651 Electricity Customers
1 874 Installed Capacity (MW)
8 043 Net Generation (GWh)
100% Generation from renewable sources
24 544 Electricity Distribution (GWh)
Brazil
7252 Employees
6 053 509 Electricity Customers
271 576 Gas Customers
10 992 Installed Capacity (MW)
34 364 Net Generation (GWh)
51% Generation from renewable sources
46 508 Electricity Distribution (GWh)
7 138 Gas Distribution (GWh)
Portugal
34 Employees
363 Installed Capacity (MW)
705 Net Generation (GWh)
100% Generation from renewable sources
France/ Belgium
14 Employees
Italy
21 Employees
United Kingdom
51 Employees
475 Installed Capacity (MW)
621 Net Generation (GWh)
100% Generation from renewable sources
Poland/ Romania
2 038 Employees
1 015 543 Electricity Customers
787 869 Gas Customers
6 087 Installed Capacity (MW)
15 331 Net Generation (GWh)
37% Generation from renewable s.
9 517 Electricity Distribution (GWh)
48 447 Gas Distribution (GWh)
Spain
Mexico
4. EDP Inovação 4
EDP Distribuição and EDP Inovação – facts and figures
245.000
Km
Percent of the electricity distribution
network owned in mainland Portugal
Distribution network
approximate length
6
Million
Approximate number
of customers served
EDP Distribuição is the EDP Group's company operating in the regulated distribution and
supply businesses in Portugal. EDP's distribution activity is regulated by the Portuguese
energy regulator ERSE (Entidade Reguladora dos Serviços Energéticos) which defines the
tariffs, parameters and prices for electricity and other services in Portugal.
EDP Inovação is the innovation arm of EDP Group, promoting value-adding innovation
within the Group by leading the adoption of new technological evolutions and practices.
Open innovation approach
Client-
focused
Solutions
Smarter
Grids
Cleaner
Energy
Data Leap
5 strategic innovation areas
Entrepreneurship & Venture Capital ecosystem
EDP Starter and EDP Ventures
Storage
5. EDP Inovação 5
Agenda
1. Introduction to EDP
2. Motivation
3. Project PREDIS – Real time Load and Generation disaggregated forecast
4. EDP Future IT Architecture
5. Conclusions
6. EDP Inovação
Smart Grid
The energy sector transformation is adding new challenges to the Distribution System Operator
(DSO), demanding new strategies for the Distribution Power Grid, that is becoming progressively
more intelligent
Quality of
Service
Operational
Efficiency
Historical Challenges New Challenges
Advanced
Metering
Infrastructure
Network
automation
& sensors
Energy efficiency
and new business
models
Electric
vehicles
Renewables
and
Distributed
Generation
6
7. EDP Inovação 7
To address those new challenges we need to increase the visibility over the LV network, reducing
the existing gap when compared with HV and MV networks.
HV: 9.000 km
412 HV/MV
Substation
HV/MV
Station
VHV/HV
HV network
Distribution Network
Secondary Substation
MV/LV
MV network LV network
Retailer/
Consumer/
Producer
140.000 km LV Lines
6.000.000 Users
MV: 74.000 km
MV/LV: 66.000
Network
Assets
Level of
Monitoring
and
Automation
HANLANWAN
EDP Box
The ability to collect information from different sources (internal and external, structured and
unstructured) that are mostly scattered, has a huge potential to improve the utility operational
activities
8. EDP Inovação
8
Preparing for the data deluge: in 2013 EDP start to address big data and advanced analytics, due
to a operational issue and upon benchmarking a conventional database with Hadoop
Load Curve Profiling + Aggregation Technology Time Notes
Current architecture Oracle Around 8h 4 Million points
SQL with Big Data Hive, Impala 1 to 4h Inadequate
Customized programming without Big Data Java Around 5min One machine (multi-core)
Customized Programing
with Big Data
Spark <5 min
Multi machines (PCs) with Big Data
higher resilience and parallelization
National Energy Consumption (with load curves) by voltage level*
System
Nodes
[#]
Cores
[#]
RAM
[GB]
Cluster Readings
[10^6]
Volume
[MB]
Processing Time
[h:min:sec]
BO (Oracle) 4 96 202 Local 12 x 6 72 3:45:00
Hadoop 21 42 157 Virtual / Cloud 96 x 6 576 00:09:37
*This Proof of Concept was done in the cloud payed with a credit card with a cost around $30.
Main conclusions
• The Hadoop cluster is by nature resilient and coped with nodes failure
• The processing times can be greatly reduced over traditional architecture
• There is a high need for customization
• The choice of the tool within the Hadoop ecosystem depends highly on the type of calculations/use-case to be made
9. EDP Inovação 9
Agenda
1. Introduction to EDP
2. Motivation
3. Project PREDIS – Real time Load and Generation disaggregated forecast
4. EDP Future IT Architecture
5. Conclusions
10. EDP Inovação 10
With the results obtained a project called PREDIS was set-up to obtain the load and generation
forecast at an disaggregated level and in near real time mode (with 15 minutes refreshment)
PREDIS requirements
• Inputs from different data sources from EDP
Distribuição (GIS, SCADA, Oracle, SAP)
• Development of a adequate machine-cluster to
perform all the computation
• Information integration on a data model to
support the forecast
• Develop analytic processes to compute the
information in adequate elapsed time Energy
Balance
Revenue
Assurance
Dispatch
Center
PREDIS
SGL
EI
Server
TC
TLP
EB
Estimate
Grid
Planning
Fraud
SIT
BI-
Scada
Power
On
SysGrid
PREDIS Project Goals:
Electrical Load Forecast for the next 72 hours
Disaggregated Renewable energy sources Forecast (Wind, Solar) for the next 72 hours
Manage the aprox. 6 million points asset universe (Substations, Distribution Transformers, LV clients, etc)
Forecast update every 15 minutes
Incorporate dynamic grid topology
11. EDP Inovação
Review of
existing load
forecast models
We defined some steps to find an adequate model that allowed us to forecast the load with
“good enough” accuracy
11
Test the model
over national
Load
Improve the
model with
additional
Explanatory
variables
Define models
for different
times of year
12. EDP Inovação 12
After choosing the model we identified a set of explanatory variables and tested the model over
National Demand
Explanatory variables:
• Year, month, day
• Day of week
• Public holiday
• Season (Spring, Summer, Autumn, Winter)
• Daylight save time (TRUE, FALSE)
• Time of year
• Time of day (48 1/2 hour intervals)
• Temperature From NOAA website
Dataset:
• Half-hourly electricity measurements
• National demand (mainland Portugal)
• From 2006 to 2011 – Data for calibration
• From 2012 to 2014 – Data for test
High temperature ~ demand peak
(2013 - 4th highest heat wave since 1981)
Low temperature ~ demand peak
(2012 European cold wave due Siberian High)
13. EDP Inovação 13
In order to increase the model’s accuracy and looking at the major residuals, we started a trial
and error process to identify the main causes that would decrease the model errors
August
Christmas and New Year period
Public holiday on Sunday
Gong storm
-500
500
0,65
0,7
0,75
0,8
0,85
0,9
0,95
1
Iteraction Variable
1 24h lagged load
2 temp. combined w. time of day
3 48h lagged load
4 day of week
5 public holidays
6 intra-day effect dependent on the day type
7 day of the year
8 24h lagged temp. + min and max temp. of last 24h
9 days offs before Christmas and Carnival
Devianceexplained
Iteration
Thehigherthebetter
Features added/combined
Model Accuracy
14. EDP Inovação 14
But there were still some issues with the forecast. After special days like Christmas the model
shouldn’t use the load values of the previous day to forecast
This lead to a new approach of using a weighted majority algorithm
15. EDP Inovação 15
In this approach we had several algorithms that were trained to certain conditions and the
model automatically choose the one that minimized the error for each period
Iteration Variable
1 General-purpose model
2 General-purpose model reviewed
3 Weekends' model
4 August's model
5 Public holidays' model
6 Spring and Summer's model
7 Autumn and Winter's mode
8 Christmas and New Year's model
9 Carnival's model
10 Easter's model
Jan Autumn Dec
Carnival
period
Easter
period
AugustSpring
Christmas and
New Year periodWeekends Other public holidays
Oneyear
2
2,1
2,2
2,3
2,4
2,5
2,6
2,7
Thelowerthebetter
MAPE(%)
Iteration
Models added/combined
We now have a working algorithm with ~2% error for an aggregated national load.
16. EDP Inovação
Meanwhile we also implemented a R wind generation forecast model based on wind velocity +
air pressure and also the energy supplied by the wind farm
Forecast D+1
- Forecast
- Actual
Forecast D+2
- Forecast
- Actual
Forecast D+3
- Forecast
- Actual
7% 8% 12%
NMAE
Normalized mean
absolute error
Test conditions:
• 9 months calibration data + 1 month validation data
• Hourly generation measurements and forecasts of wind velocity@10m and pressure@MSL (72h time horizon, 3h intervals)
17. EDP Inovação 17
New challenges on load forecasting emerge when we decrease the voltage level (substations and
distribution transformers)
August August
Christmas and
New year
Christmas and
New year
Network
reconfigurations?
Done so far:
Implemented 2 Big Data Clusters (Cloudera Hadoop)
Developed an architecture for the Project
Developed a Load forecast model with ~2% MAPE for
national load
Developed a Wind forecast model with ~10% error
Next Steps:
Improve existing models
Incorporate network configurations on the forecast module (state estimation, network status)
Cluster different types of load by voltage level, load tipification etc.
Wind farms state estimation
Photovoltaic model definition and implementation
Collect data from the source systems in a continuous way
18. EDP Inovação 18
Agenda
1. Introduction to EDP
2. Motivation
3. Project PREDIS – Real time Load and Generation disaggregated forecast
4. EDP Future IT Architecture
5. Conclusions
19. EDP Inovação 19
The PREDIS project and other use-cases revealed a series of limitations that currently exist in the
IT systems
• How can we expand analytics knowledge in business areas?
• How can we achieve massive data extractions without impacting the performance of existing
operational systems?
• How can we avoid a proliferation of interfaces each one with a specific function?
• How can we interpret the data that exists in current IT systems?
• How to “democratize” the access to data so that multi source analytics can be developed?
• How to “stream” the data needed to address some of the use-cases identified ?
These requirements were important to develop a new approach of IT systems
and lead to a specific analysis of the current Analytics architecture
20. EDP Inovação 20
Traditionally a utility has analytic solutions based on a silo oriented BI architecture that isn’t
prepared to deal with high volumes of data with structured and unstructured information
InformationUsage
Integration
Software
application
Operation
Software
Application
… Software
Application
Sap
Application
… SAP
Application
Software
Application
BW Redundancy of information
Lack of connectivity between the
different information “silos”
Little or non-existing
related information at a
disaggregated level
This was the vision of the IT architecture till 2010. Meanwhile IT world has changed, but the business
needs are still the same. It is necessary to have a Strategic, Tactic and Operational vision.
Analytic
level
Operational
Level
SAP ExtractorsETL / Active Data Guard / Golden Gate
21. EDP Inovação
Usage
21
Information
Integration
Application
Operation
Software
Application
… Software
Application
SAP
Application
… SAP
Application
Software
Application
External Sources
3
Nowadays operational systems create more data every day in the 3V’s that characterize Big Data
(Volume, Variety, Velocity). A conventional infrastructure cannot handle operational activities and
advanced analytics in due time.
Big Data comes as an option that allows data ingestion and advanced analytics of high
volumes of data oriented to one of the 3V’s (Volume, Variety, Velocity),
keeping operational systems with their normal activities.
SAP Extractors
ETL / Golden Gate
Interaction
BW“DataLake”
MDU GR
MDU GA
MDU GE
Analytic
Level
Opoerational
Level
22. EDP Inovação
The new analytic-oriented architecture is based on a Data Lake and on UDMs with the information
from the different IT/OT systems fed with CDC (Change Data Capture) interfaces leveraging new
analytics
22
Catalogue
DataWarehouse
CorpODS BW
OT
Events Engine
Virtual data warehouse
Real-time
Dashboard
DataGovernance
Business Solutions
Comercial Analytics Performace MgmtAsset Analytics
Asset Performance
Distributed Load Forecast Energy Balance
Fraud Predictive Maintenance
Data Sources
External
SourcesSAPSAP
Non-SAP
BW
Réplicas
Data Lake (“big data”)
MDU D
(Gestão de ativos, Gestão comercial, Gestão de energia, Gestão da rede)
HDFS
DM DM DM DM
HDFS HDFS HDFS
Reporting Alerts
Dashboards Discovery
Advanced Analytics Geo Analytics
Data exploration tools
API’s
Business functionalities
DataGovernance
OT
Data Exploration
Data Access
Data Processing + Data Repository
Data Ingestion
23. EDP Inovação
The new analytic-oriented architecture is based on a Data Lake and on UDMs with the information
from the different IT/OT systems fed with CDC (Change Data Capture) interfaces leveraging new
analytics
23
Catalogue
DataWarehouse
CorpODS BW
OT
Events Engine
Virtual data warehouse
Real-time
Dashboard
DataGovernance
Business Solutions
Commercial Analytics Performace MgmtAsset Analytics
Asset Performance
Distributed Load Forecast Energy Balance
Fraud Predictive Maintenance
Data Sources
External
SourcesSAPSAP
Non-SAP
BW
Replicas
Data Lake (“big data”)
Unified Data Model
(Asset Mgmt, Commercial Mgmt, Energy Mgmt, Grid Mgmt)
HDFS
DM DM DM DM
HDFS HDFS HDFS
Reporting Alerts
Dashboards Discovery
Advanced Analytics Geo Analytics
Data exploration tools
API’s
24. EDP Inovação
Big Data Platform/Cluster
HDFS (Storage)
Hadoop Distributed File System
Hbase
Columnar Store
Mahout
Machine
Learning
Hive
SQL Query
IMPALA/
SPARK
In-memory
Map Reduce/YARN (Resource Management)
Distributed Processing Framework
Web app
API – data access and data modeling
Externalaccess
todatadownload
System A
System B
Files
PREDIS
Forecast
Model 1
implemented on R
Model 2
implemented on R
SIT
EDM (SGL)
Ei-Server
Rede
Activa
SCADA-BI
External data
sources
Dataextractandloading
IPMA
SGL
SIT
New Model
implemented on R Deploy
Resultsofnew
modelsdeployed
Sqoop
Kafka
PREDIS instantiated in the new architecture
Files CDC
25. EDP Inovação 25
We are now building the data lake infrastructure whilst we acquire knowledge in this new type of
architecture supported in two Hadoop clusters: an Enterprise Grade and a Low Cost as “sand box”
Purpose: Internal enterprise level cluster
for projects support
Data confidentiality guaranteed
Hardware quality (Enterprise grade)
Prepared to scale horizontally
Cloudera Hadoop and R
7 nodes/servers (dimensioned for
PREDIS project)
ENTERPRISE GRADE DEVELOPMENT
CLUSTER
Purpose: Internal test and development
cluster assembly
Big Data platform knowledge
development
Low cost platform (desktop PCs)
Cloudera Hadoop and R
48 nodes/servers
LOW COST CLUSTER
Purpose: Enterprise DataLake cluster
Oracle Big Data Appliance
Seamless integration with Exadata
Prepared to scale horizontally
Cloudera Hadoop Enterprise
6 nodes/servers with superior
characteristics and hardware optimized
ENTERPRISE PRODUCTION DATALAKE
INFRASTRUCTURE
26. EDP Inovação 26
Findings & Conclusions
• Big Data Analytics is a continuous learning process and a cultural change. To overcome the lack of
knowledge in this subject a Advanced Analytics and Machine Learning training in R is being
lectured in EDP. Additionally a SAS Miner training is scheduled for the 2nd trimester of 2017
• Access to overloaded source systems’ data can be difficult. CDC (change data capture) extractors
seem the best way to extract data from the source systems and have the data available near real
time to analytics users
• The development of a Data Lake will decrease the number of interfaces between operational
systems
• The Unified Data Model, where information is organized and cataloged, allows a unique vision of
all available data (and can support the “single source of truth”). But we need Data Governance!
• Support of a experienced IT partner for several of the activities involved is essential to avoid
major pitfalls and to help detail architecture “sweet spots” for each use-case