Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet
1. A Data Lake and a Data Lab to Optimize
Operations and Safety Within a Nuclear Fleet
Hadoop Summit 2016, San José, June 30th
Marie-Luce PICARD, EDF R&D – marie-luce.picard@edf.fr
Jean-Marc RANGOD, EDF-DPNT
Christophe SALPERWYCK, EDF R&D
Special thanks to Raphaël QUERCIA EDF-DTG, Carole MAI and Amandine PIERROT EDF R&D
2. 2
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
3. 3
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
4. 4
ELECTRICITY GENERATION
623.5 TWH
All electricity-related activities
Generation
Transmission & Distribution
Trading and Sales & Marketing
Energy services
Key figures*
€72.9 billion in sales
38.5 million customers
158,161 employees worldwide
84.7% of generation does not emit CO2
2014 INVESTMENTS
€4.5 BILLION
EDF: A GLOBAL LEADER IN ELECTRICITY
*as of 2015
EDF :
AN EFFICIENT,
RESPONSIBLE
ELECTRICITY COMPANY
AND THE CHAMPION
OF LOW-CARBON
GROWTH
5. WORLD’S LEADING OPERATOR, EXCELLENT
PERFORMANCE IN FRANCE
72.9 GW installed capacity, 54% of the Group’s net generation
capacity
477.7 TWh generated, 77% of the Group’s output
58 reactors operated in France,
15 in the UK
3 EPR under construction:
— 1 in Flamanville (France)
— 2 in Taishan (China)
2 EPR in project phase
OSART safety audit
17 best practices identified by IAEA
France
Best generation performance for six years
UK
World record for safety in the workplace
China
Strengthened cooperation agreement with CNNC
NUCLEAR
EDF 2015 I P.5
8. Scientific
partnerships with
actors of Paris-
Saclay
research departments
8
exceptional buildings
4 outstanding hall test
1 Unique equipment,
innovative
communication
tools
Diverse areas of
expertise
1500
work stations
Plenty of
collaborative
spaces
EDF LAB PARIS-SACLAY
9. 9
Main Big Data related challenges for EDF
Power Generation
Process monitoring and condition-based maintenance
from sensors
Power generation forecasting for renewables
Energy management
Load forecasting
Balancing and optimizing generation and consumption
(using smart metering information, including
renewables)
Electrical networks
Smart Grid operations (local)
Condition-based maintenance
Customers and sales
New services to customers using smart-metering data
Smart Homes, Smart Building, Smart Cities management
related to energy
10. 10
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
11. 11
Operations and maintenance of the nuclear fleet
The maintenance policy of EDF generation fleet is optimized to ensure reliability and safety of
equipment and systems while strengthening our competitiveness:
Have better diagnosis, improved performance and availability
Make a better use of data and documents, so far stored into Data silos
More globally, the IT teams and projects aim at:
Strengthen performance of operations and maintenance through a global fleet approach
Simplify the Industrial Information System architecture
Improve and develop the way we use our data
Accumulate and archive data through time
… while reducing costs
12. 12
Voluminous and heterogeneous data …. stored in data silos
Source : Wikipedia
One DB by nuclear site, gathering data from
sensors. Use of Data Historians.
Focus on data:
High volume:
data is stored up to 40-60 years (lifetime of the plant)
SCADA data can be sampled every 20 to 40 ms (but mainly a few
seconds)
Around 10.000 sensors per plant
Variety:
Data is heterogeneous
Time series, images, documents
Various data sources
The actual systems (historians) don’t allow
too many concurrent access, and their SLA are
quite bad
14. 14
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
16. 16
Zoom on data
4 generations of plants, but high level of normalization of data and sensors (for
example, use of trigrams for identification of elementary systems)
Two main types of sensors : ANA (for analogic) and TOR (for state events)
Time series
Volume
For the POC, 10 plants, 2 years: about 20 billions of points
Target (59 plants) : 15 To of data (all plants, whole lifecycle)
Metric, global Date Value Quality
BU2ABP177MT- 2015-04-30T22:05:00.000Z 156.6 Good/M
BU2ABP177MT- 2015-04-30T22:06:00.000Z 156.4 Good/M
BU2ABP177MT- 2015-04-30T22:07:00.000Z 156.2 Good/M
BU2ABP177MT- 2015-04-30T22:08:00.000Z 156.0 Good
BU2ABP177MT- 2015-04-30T22:09:00.000Z 156.2 Good/M
BU2ABP177MT- 2015-04-30T22:10:00.000Z 156.4 Good/M
BU2ABP177MT- 2015-04-30T22:12:00.000Z 156.7 Good/M
BU2ABP177MT- 2015-04-30T22:14:00.000Z 157.1 Good
BU2ABP177MT- 2015-04-30T22:15:00.000Z 157.3 Good
BU2ABP177MT- 2015-04-30T22:16:00.000Z 157.5 Good
BU2ABP177MT- 2015-04-30T22:19:00.000Z 157.3 Good/M
BU2ABP177MT- 2015-04-30T22:20:00.000Z 157.1 Good/M
BU2ABP177MT- 2015-04-30T22:21:00.000Z 157.3 Good/M
BU2ABP177MT- 2015-04-30T22:22:00.000Z 157.1 Good/M
BU2ABP177MT- 2015-04-30T22:24:00.000Z 156.9 Good/M
BU2ABP177MT- 2015-04-30T22:27:00.000Z 157.1 Good/M
BU2ABP177MT- 2015-04-30T22:28:00.000Z 157.3 Good/M
BU2ABP177MT- 2015-04-30T22:29:00.000Z 157.5 Good/M
BU2ABP177MT- 2015-04-30T22:30:00.000Z 157.7 Good/M
17. 17
Data model
Use of HBASE and PHOENIX
Distributed key/values store
Allows models update (normalization requirements evolution, new indicators… new plants)
Phoenix for SQL compliance + BI tools
Tables
3 tables : DDT, ANA, TOR
Rowkey : <sensorid, timestamp> (queries mainly consider one or several sensors for a period of time)
Sequential storage ; split into Hfiles and Hregion according to the plant unit
Clé ColumnFamily Colonne Valeur Phoenix type
m
(concat(metriquei
d, timestamp))
0 v H_ValeurANA Float
q H_QualitéANA Char(10)
n H_NiveauxANA varchar(10)
Clé ColumnFamily Colonne Valeur Phoenix type
m
(concat(metriquei
d, timestamp))
0 v H_ValeurTOR Varchar(10)
q H_QualiteTOR Char(10)
n H_NiveauxTOR Varchar(10)
18. 18
Validation and performances evaluation
POC validation
Upload of historical data; queries / analyses
Existing functions: viz, reports, services
Data injection: SCADA for the whole fleet,
integration of other sources of data
Results
6 weeks (estimated) needed to upload historical data
from 59 plants
Queries for validating the model :
Use of Jmeter for simulating load
With or without insertion workload
~ < 1 second for drawing a curve for a selected month
Integration of an existing GUI for viz (realized within a
few days)
Validation of specific calculation within reports
ODBC link for specific e-monitoring application
Integration of various sources of (structured) data into
the data lake
‘Real-time’ insertion of data (micro-batch):
Up to 2M points / s
Very low latency between insertion and availability (< 10s)
SELECT
MIN(v), MAX(v),
FIRST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC),
LAST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC),
TO_CHAR(ts, 'dd') as day,
TO_CHAR(ts, 'HH') as hour,
TO_CHAR(ts, 'mm') as minute,
count(*) as cnt
FROM
ORLI_ANA
WHERE
m = ? AND
ts > current_time()-1 AND //last 24h
ts < current_time()
GROUP BY
day, hour, minute
Phoenix query (ANA)
19. 19
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
20. 20
Added value of data science algorithms on heterogeneous data:
Operations and maintenance can be better optimized through data analytics run on
data coming from the whole fleet
Active and reactive power are indicators of constraints on alternators: effect on
their wears
• ~ 50 plants
• 20 years of data
• 10 min interval data
• Phoenix queries allow to select plants and periods of time
• Compute and show reactive power per day or per hour of the
day
• More detailed analysis
• Fleet level analysis
• Interactive queries
21. 21
Added value of data science algorithms on heterogeneous data:
Operations and maintenance can be better optimized through data analytics run on
data coming from the whole fleet
Monitoring and control of contractual agreements when network frequency
varies (plants have to contribute to the global balance)
• Pattern matching
• Response time for different plants
• Different levels of analysis : by plant, by
generation, global
• Generic approach implemented for any
kind of patterns
22. 22
Added value of data science algorithms on heterogeneous data
Prediction of plants cooling according to the quality of incoming water in the
plants
• Correlations?
• According to the plants
• Use of GAM models
• Integration of two internal sources +
external data
• Better understanding
• // Work in progress //
23. 23
Integration of data science and visualization: architecture
Hadoop Cluster Web Service REST
(VM)
Browser
24. 24
Integration of data science: a global approach
Pre-processing
Data quality
Sampling
Synchronization
…
Selection and queries
Threshold
Pattern matching
Period of time
…
Analysis and data science
Reporting
Exploratory analysis
(distribution …)
Modelling
…
25. 25
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
26. 26
A Data Lab in progress: a team, an approach …
… and some questions
Objectives:
Bring value from data analytics
Issues:
Skills and organization (between entities)
Architecture :
Operational Hadoop cluster and loads (use of a multitenant
enterprise cluster)
Other loads (data science)
Data prep within Hadoop + edge machine for data science (Spark, R,
Python)
How to quantify value
Developments costs and maintenance
How to industrialize
Source: Xebia
27. 27
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
28. 28
Takeaways
A Data Lake for our nuclear fleet
In progress : industrialization and decommissioning of Historian applications
Great reduction of licensing costs
A Data Lab under construction
POCs showing the added value of data science algorithms
predictive maintenance
In the context of fleet renovation for plant life extension (major overhaul program): operations & maintenance, generation
costs optimization
Issues remaining : skills, organization, technical architecture, quantify value
Perspectives and technical issues:
Data lakes and labs for other fleets (thermal plants, hydro, renewables)
Scalable time-series analytics (synchronization, missing data …)
Handling heterogeneous data (textual, images, graphs …)
IoT platform
29. References
A proof of concept with Hadoop: storage and analytics of electrical time-series.
Marie-Luce Picard, Bruno Jacquin, Hadoop Summit 2012, Californie, USA, June 2012: http://www.slideshare.net/Hadoop_Summit/proof-of-
concent-with-hadoop
Massive Smart Meter Data Storage and Processing on top of Hadoop.
Leeley D. P. dos Santos, Alzennyr G. da Silva, Bruno Jacquin, Marie-Luce Picard, David Worms,Charles Bernard. Workshop Big Data 2012,
Conférence VLDB (Very Large Data Bases), Istanbul, Turquie, 2012: http://www.cse.buffalo.edu/faculty/tkosar/bigdata2012/program.php
Searching time-series with Hadoop in an electric power company.
Alice Bérard, Georges Hébrail, BigMine Workshop, KDD2013, Chicago, August 2013: http://bigdata-mining.org/
Real-time energy data-analytics with Storm.
Rémy Saissy, Marie-Luce Picard, Charles Bernard, Bruno Jacquin, Simon Maby, Benoît Grossin, Hadoop Summit 2014, Californie, USA, June
2014: http://fr.slideshare.net/Hadoop_Summit/t-525p212picard
Computing Data Quality Indicators on Big Data Stream Using a CEP
Wenlu Yang, Alzennyr Gomes Da Silva, Marie-Luce Picard, IEEE Xplore - IWCIM 2015, Prague, Novembre 2015.
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Network
Guillaume Germaine, Thomas Vial, Hadoop Summit Europe 2016, Dublin
http://www.slideshare.net/HadoopSummit/exploring-titan-and-spark-graphx-for-analyzing-timevarying-electrical-networks
Notes de l'éditeur
Nuclear energy supplies competitive, carbon-free electricity that we generate in the best possible safety conditions.
In 2014, the International Atomic Energy Agency conducted an audit on how nuclear safety is integrated into the organisation and processes of our central departments: the IAEA found no departure from its standards and identified 17 best practices.
→ In France, we achieved our best performance in six years thanks to our management of scheduled shutdowns: the average length of extensions was halved. Wintertime fleet availability topped 90%. Our annual output was up 3% (415.9 TWh).
• The principle of the “Grand Carénage” maintenance programme was approved. The programme involves renovating the French nuclear fleet over a 10-year period in order to extend its operating life beyond 40 years if all conditions are met. The investment is put at €55 billion for the entire fleet.
• The Flamanville EPR worksite is continuing, the first nuclear plant to be built in France for 15 years.
→ In the UK, output was good (56.3 TWh) despite the unscheduled shutdown of two plants. EDF Energy established a world record for safety in the workplace (0.98 accidents requiring more than one day of lost time per million hours worked by employees and subcontractors).
• The Hinkley Point C project to build two EPR in Somerset took a major step forward: in October, the European Commission approved the main terms of the agreements concluded with the British government.
→ In China, through partnerships, we are taking good advantage of the expertise we have acquired in the design, construction, operation and maintenance of our nuclear fleet.
• Construction of two 1,750 MW EPR in Taishan (EDF 30% in partnership with CGN) is ongoing.
• We signed an agreement to strengthen cooperation in engineering, operation and maintenance with CNNC, China’s largest state-owned nuclear company.