SlideShare une entreprise Scribd logo
1  sur  29
A Data Lake and a Data Lab to Optimize
Operations and Safety Within a Nuclear Fleet
Hadoop Summit 2016, San José, June 30th
Marie-Luce PICARD, EDF R&D – marie-luce.picard@edf.fr
Jean-Marc RANGOD, EDF-DPNT
Christophe SALPERWYCK, EDF R&D
Special thanks to Raphaël QUERCIA EDF-DTG, Carole MAI and Amandine PIERROT EDF R&D
2
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
3
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
4
ELECTRICITY GENERATION
623.5 TWH
All electricity-related activities
Generation
Transmission & Distribution
Trading and Sales & Marketing
Energy services
Key figures*
€72.9 billion in sales
38.5 million customers
158,161 employees worldwide
84.7% of generation does not emit CO2
2014 INVESTMENTS
€4.5 BILLION
EDF: A GLOBAL LEADER IN ELECTRICITY
*as of 2015
EDF :
AN EFFICIENT,
RESPONSIBLE
ELECTRICITY COMPANY
AND THE CHAMPION
OF LOW-CARBON
GROWTH
WORLD’S LEADING OPERATOR, EXCELLENT
PERFORMANCE IN FRANCE
72.9 GW installed capacity, 54% of the Group’s net generation
capacity
477.7 TWh generated, 77% of the Group’s output
58 reactors operated in France,
15 in the UK
3 EPR under construction:
— 1 in Flamanville (France)
— 2 in Taishan (China)
2 EPR in project phase
 OSART safety audit
17 best practices identified by IAEA
 France
Best generation performance for six years
 UK
World record for safety in the workplace
 China
Strengthened cooperation agreement with CNNC
NUCLEAR
EDF 2015 I P.5
R&D KEY FIGURES
Scientific
partnerships with
actors of Paris-
Saclay
research departments
8
exceptional buildings
4 outstanding hall test
1 Unique equipment,
innovative
communication
tools
Diverse areas of
expertise
1500
work stations
Plenty of
collaborative
spaces
EDF LAB PARIS-SACLAY
9
Main Big Data related challenges for EDF
Power Generation
 Process monitoring and condition-based maintenance
from sensors
 Power generation forecasting for renewables
Energy management
 Load forecasting
 Balancing and optimizing generation and consumption
(using smart metering information, including
renewables)
 Electrical networks
 Smart Grid operations (local)
 Condition-based maintenance
Customers and sales
 New services to customers using smart-metering data
 Smart Homes, Smart Building, Smart Cities management
related to energy
10
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
11
Operations and maintenance of the nuclear fleet
 The maintenance policy of EDF generation fleet is optimized to ensure reliability and safety of
equipment and systems while strengthening our competitiveness:
 Have better diagnosis, improved performance and availability
 Make a better use of data and documents, so far stored into Data silos
 More globally, the IT teams and projects aim at:
 Strengthen performance of operations and maintenance through a global fleet approach
 Simplify the Industrial Information System architecture
 Improve and develop the way we use our data
 Accumulate and archive data through time
… while reducing costs
12
Voluminous and heterogeneous data …. stored in data silos
Source : Wikipedia
One DB by nuclear site, gathering data from
sensors. Use of Data Historians.
 Focus on data:
 High volume:
 data is stored up to 40-60 years (lifetime of the plant)
 SCADA data can be sampled every 20 to 40 ms (but mainly a few
seconds)
 Around 10.000 sensors per plant
 Variety:
 Data is heterogeneous
 Time series, images, documents
 Various data sources
 The actual systems (historians) don’t allow
too many concurrent access, and their SLA are
quite bad
13
A Data Lake for the nuclear fleet
ESPADON : the Data Lake for the nuclear
fleet
One DB by nuclear site, gathering data from
sensors. Use of Data Historians.
Source : Wikipedia
© M. Caraveo, Hadoop cluster NOE data center
14
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
| 15
A data lake for the nuclear fleet: big picture
….
Files
(chemical
information)
Historian -
SCADA
Files
(dosimetry)
E-monitoring
application
Viz
Interactive
queries and
reporting
Web Service
Hadoop cluster – ESPADON
Data Lake
Reports
© M. Caraveo, Hadoop cluster NOE data
center
16
Zoom on data
 4 generations of plants, but high level of normalization of data and sensors (for
example, use of trigrams for identification of elementary systems)
 Two main types of sensors : ANA (for analogic) and TOR (for state events)
 Time series
 Volume
 For the POC, 10 plants, 2 years: about 20 billions of points
 Target (59 plants) : 15 To of data (all plants, whole lifecycle)
Metric, global Date Value Quality
BU2ABP177MT- 2015-04-30T22:05:00.000Z 156.6 Good/M
BU2ABP177MT- 2015-04-30T22:06:00.000Z 156.4 Good/M
BU2ABP177MT- 2015-04-30T22:07:00.000Z 156.2 Good/M
BU2ABP177MT- 2015-04-30T22:08:00.000Z 156.0 Good
BU2ABP177MT- 2015-04-30T22:09:00.000Z 156.2 Good/M
BU2ABP177MT- 2015-04-30T22:10:00.000Z 156.4 Good/M
BU2ABP177MT- 2015-04-30T22:12:00.000Z 156.7 Good/M
BU2ABP177MT- 2015-04-30T22:14:00.000Z 157.1 Good
BU2ABP177MT- 2015-04-30T22:15:00.000Z 157.3 Good
BU2ABP177MT- 2015-04-30T22:16:00.000Z 157.5 Good
BU2ABP177MT- 2015-04-30T22:19:00.000Z 157.3 Good/M
BU2ABP177MT- 2015-04-30T22:20:00.000Z 157.1 Good/M
BU2ABP177MT- 2015-04-30T22:21:00.000Z 157.3 Good/M
BU2ABP177MT- 2015-04-30T22:22:00.000Z 157.1 Good/M
BU2ABP177MT- 2015-04-30T22:24:00.000Z 156.9 Good/M
BU2ABP177MT- 2015-04-30T22:27:00.000Z 157.1 Good/M
BU2ABP177MT- 2015-04-30T22:28:00.000Z 157.3 Good/M
BU2ABP177MT- 2015-04-30T22:29:00.000Z 157.5 Good/M
BU2ABP177MT- 2015-04-30T22:30:00.000Z 157.7 Good/M
17
Data model
 Use of HBASE and PHOENIX
 Distributed key/values store
 Allows models update (normalization requirements evolution, new indicators… new plants)
 Phoenix for SQL compliance + BI tools
 Tables
 3 tables : DDT, ANA, TOR
 Rowkey : <sensorid, timestamp> (queries mainly consider one or several sensors for a period of time)
 Sequential storage ; split into Hfiles and Hregion according to the plant unit
Clé ColumnFamily Colonne Valeur Phoenix type
m
(concat(metriquei
d, timestamp))
0 v H_ValeurANA Float
q H_QualitéANA Char(10)
n H_NiveauxANA varchar(10)
Clé ColumnFamily Colonne Valeur Phoenix type
m
(concat(metriquei
d, timestamp))
0 v H_ValeurTOR Varchar(10)
q H_QualiteTOR Char(10)
n H_NiveauxTOR Varchar(10)
18
Validation and performances evaluation
 POC validation
 Upload of historical data; queries / analyses
 Existing functions: viz, reports, services
 Data injection: SCADA for the whole fleet,
integration of other sources of data
 Results
 6 weeks (estimated) needed to upload historical data
from 59 plants
 Queries for validating the model :
 Use of Jmeter for simulating load
 With or without insertion workload
 ~ < 1 second for drawing a curve for a selected month
 Integration of an existing GUI for viz (realized within a
few days)
 Validation of specific calculation within reports
 ODBC link for specific e-monitoring application
 Integration of various sources of (structured) data into
the data lake
 ‘Real-time’ insertion of data (micro-batch):
 Up to 2M points / s
 Very low latency between insertion and availability (< 10s)
SELECT
MIN(v), MAX(v),
FIRST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC),
LAST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC),
TO_CHAR(ts, 'dd') as day,
TO_CHAR(ts, 'HH') as hour,
TO_CHAR(ts, 'mm') as minute,
count(*) as cnt
FROM
ORLI_ANA
WHERE
m = ? AND
ts > current_time()-1 AND //last 24h
ts < current_time()
GROUP BY
day, hour, minute
Phoenix query (ANA)
19
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
20
Added value of data science algorithms on heterogeneous data:
Operations and maintenance can be better optimized through data analytics run on
data coming from the whole fleet
 Active and reactive power are indicators of constraints on alternators: effect on
their wears
• ~ 50 plants
• 20 years of data
• 10 min interval data
• Phoenix queries allow to select plants and periods of time
• Compute and show reactive power per day or per hour of the
day
• More detailed analysis
• Fleet level analysis
• Interactive queries
21
Added value of data science algorithms on heterogeneous data:
Operations and maintenance can be better optimized through data analytics run on
data coming from the whole fleet
Monitoring and control of contractual agreements when network frequency
varies (plants have to contribute to the global balance)
• Pattern matching
• Response time for different plants
• Different levels of analysis : by plant, by
generation, global
• Generic approach implemented for any
kind of patterns
22
Added value of data science algorithms on heterogeneous data
Prediction of plants cooling according to the quality of incoming water in the
plants
• Correlations?
• According to the plants
• Use of GAM models
• Integration of two internal sources +
external data
• Better understanding
• // Work in progress //
23
Integration of data science and visualization: architecture
Hadoop Cluster Web Service REST
(VM)
Browser
24
Integration of data science: a global approach
Pre-processing
Data quality
Sampling
Synchronization
…
Selection and queries
Threshold
Pattern matching
Period of time
…
Analysis and data science
Reporting
Exploratory analysis
(distribution …)
Modelling
…
25
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
26
A Data Lab in progress: a team, an approach …
… and some questions
Objectives:
Bring value from data analytics
Issues:
 Skills and organization (between entities)
 Architecture :
 Operational Hadoop cluster and loads (use of a multitenant
enterprise cluster)
 Other loads (data science)
 Data prep within Hadoop + edge machine for data science (Spark, R,
Python)
 How to quantify value
 Developments costs and maintenance
 How to industrialize
Source: Xebia
27
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
28
Takeaways
 A Data Lake for our nuclear fleet
 In progress : industrialization and decommissioning of Historian applications
 Great reduction of licensing costs
 A Data Lab under construction
 POCs showing the added value of data science algorithms
 predictive maintenance
 In the context of fleet renovation for plant life extension (major overhaul program): operations & maintenance, generation
costs optimization
 Issues remaining : skills, organization, technical architecture, quantify value
 Perspectives and technical issues:
 Data lakes and labs for other fleets (thermal plants, hydro, renewables)
 Scalable time-series analytics (synchronization, missing data …)
 Handling heterogeneous data (textual, images, graphs …)
 IoT platform
References
A proof of concept with Hadoop: storage and analytics of electrical time-series.
Marie-Luce Picard, Bruno Jacquin, Hadoop Summit 2012, Californie, USA, June 2012: http://www.slideshare.net/Hadoop_Summit/proof-of-
concent-with-hadoop
Massive Smart Meter Data Storage and Processing on top of Hadoop.
Leeley D. P. dos Santos, Alzennyr G. da Silva, Bruno Jacquin, Marie-Luce Picard, David Worms,Charles Bernard. Workshop Big Data 2012,
Conférence VLDB (Very Large Data Bases), Istanbul, Turquie, 2012: http://www.cse.buffalo.edu/faculty/tkosar/bigdata2012/program.php
Searching time-series with Hadoop in an electric power company.
Alice Bérard, Georges Hébrail, BigMine Workshop, KDD2013, Chicago, August 2013: http://bigdata-mining.org/
Real-time energy data-analytics with Storm.
Rémy Saissy, Marie-Luce Picard, Charles Bernard, Bruno Jacquin, Simon Maby, Benoît Grossin, Hadoop Summit 2014, Californie, USA, June
2014: http://fr.slideshare.net/Hadoop_Summit/t-525p212picard
Computing Data Quality Indicators on Big Data Stream Using a CEP
Wenlu Yang, Alzennyr Gomes Da Silva, Marie-Luce Picard, IEEE Xplore - IWCIM 2015, Prague, Novembre 2015.
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Network
Guillaume Germaine, Thomas Vial, Hadoop Summit Europe 2016, Dublin
http://www.slideshare.net/HadoopSummit/exploring-titan-and-spark-graphx-for-analyzing-timevarying-electrical-networks

Contenu connexe

Tendances

Data Modeling is Data Governance
Data Modeling is Data GovernanceData Modeling is Data Governance
Data Modeling is Data GovernanceDATAVERSITY
 
La Gouvernance des Données
La Gouvernance des DonnéesLa Gouvernance des Données
La Gouvernance des DonnéesSoft Computing
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Fuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer ShinFuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer ShinDatabricks
 
Session découverte de la Logical Data Fabric soutenue par la Data Virtualization
Session découverte de la Logical Data Fabric soutenue par la Data VirtualizationSession découverte de la Logical Data Fabric soutenue par la Data Virtualization
Session découverte de la Logical Data Fabric soutenue par la Data VirtualizationDenodo
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopNeo4j
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
Lakehouse Analytics with Dremio
Lakehouse Analytics with DremioLakehouse Analytics with Dremio
Lakehouse Analytics with DremioDimitarMitov4
 
La Big Data et ses applications
La Big Data et ses applicationsLa Big Data et ses applications
La Big Data et ses applicationsAffinity Engine
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 
Bases de données no sql.pdf
Bases de données no sql.pdfBases de données no sql.pdf
Bases de données no sql.pdfZkSadrati
 

Tendances (20)

Data Modeling is Data Governance
Data Modeling is Data GovernanceData Modeling is Data Governance
Data Modeling is Data Governance
 
Model Factory at ING Bank
Model Factory at ING BankModel Factory at ING Bank
Model Factory at ING Bank
 
La Gouvernance des Données
La Gouvernance des DonnéesLa Gouvernance des Données
La Gouvernance des Données
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Fuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer ShinFuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer Shin
 
Session découverte de la Logical Data Fabric soutenue par la Data Virtualization
Session découverte de la Logical Data Fabric soutenue par la Data VirtualizationSession découverte de la Logical Data Fabric soutenue par la Data Virtualization
Session découverte de la Logical Data Fabric soutenue par la Data Virtualization
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
Data warehouse proposal
Data warehouse proposalData warehouse proposal
Data warehouse proposal
 
Presentation cassandra
Presentation cassandraPresentation cassandra
Presentation cassandra
 
Chapitre i-intro
Chapitre i-introChapitre i-intro
Chapitre i-intro
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Lakehouse Analytics with Dremio
Lakehouse Analytics with DremioLakehouse Analytics with Dremio
Lakehouse Analytics with Dremio
 
Les BD NoSQL
Les BD NoSQLLes BD NoSQL
Les BD NoSQL
 
Guide talend
Guide talendGuide talend
Guide talend
 
Introducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data EngineIntroducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data Engine
 
La Big Data et ses applications
La Big Data et ses applicationsLa Big Data et ses applications
La Big Data et ses applications
 
Hadoop
HadoopHadoop
Hadoop
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Bases de données no sql.pdf
Bases de données no sql.pdfBases de données no sql.pdf
Bases de données no sql.pdf
 

En vedette

Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
Processing and retrieval of geotagged unmanned aerial system telemetry
Processing and retrieval of geotagged unmanned aerial system telemetry Processing and retrieval of geotagged unmanned aerial system telemetry
Processing and retrieval of geotagged unmanned aerial system telemetry DataWorks Summit/Hadoop Summit
 
Data Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awarenessData Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awarenessDataWorks Summit/Hadoop Summit
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data LakeVMware Tanzu
 
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...DataWorks Summit/Hadoop Summit
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success DataWorks Summit/Hadoop Summit
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataDataWorks Summit/Hadoop Summit
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 
Logistique : Le transport dans le commerce
Logistique : Le transport dans le commerceLogistique : Le transport dans le commerce
Logistique : Le transport dans le commerceThomas Malice
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Capgemini
 
Creative Capital, Information & Communication Technologies, & Economic Growth...
Creative Capital, Information & Communication Technologies, & Economic Growth...Creative Capital, Information & Communication Technologies, & Economic Growth...
Creative Capital, Information & Communication Technologies, & Economic Growth...Regional Science Academy
 

En vedette (20)

Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Building a Data Analytics PaaS for Smart Cities
Building a Data Analytics PaaS for Smart CitiesBuilding a Data Analytics PaaS for Smart Cities
Building a Data Analytics PaaS for Smart Cities
 
Processing and retrieval of geotagged unmanned aerial system telemetry
Processing and retrieval of geotagged unmanned aerial system telemetry Processing and retrieval of geotagged unmanned aerial system telemetry
Processing and retrieval of geotagged unmanned aerial system telemetry
 
Data Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awarenessData Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awareness
 
Why is my Hadoop* job slow?
Why is my Hadoop* job slow?Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
 
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Logistique : Le transport dans le commerce
Logistique : Le transport dans le commerceLogistique : Le transport dans le commerce
Logistique : Le transport dans le commerce
 
The real world use of Big Data to change business
The real world use of Big Data to change businessThe real world use of Big Data to change business
The real world use of Big Data to change business
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and TesseractScalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry
 
Creative Capital, Information & Communication Technologies, & Economic Growth...
Creative Capital, Information & Communication Technologies, & Economic Growth...Creative Capital, Information & Communication Technologies, & Economic Growth...
Creative Capital, Information & Communication Technologies, & Economic Growth...
 
Apache Hive ACID Project
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
 

Similaire à A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...confluent
 
PosterPresentation
PosterPresentationPosterPresentation
PosterPresentationRaj Shekhar
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...Larry Smarr
 
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry ImpactAccelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impactinside-BigData.com
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsDataconomy Media
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …Anubhav Jain
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...Flink Forward
 
big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.pptvishal choudhary
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
 
Weather Station Data Publication at Irstea: an implementation Report.
Weather Station Data Publication at Irstea: an implementation Report.  Weather Station Data Publication at Irstea: an implementation Report.
Weather Station Data Publication at Irstea: an implementation Report. catherine roussey
 
Linked Sensor Data cube
Linked Sensor Data cubeLinked Sensor Data cube
Linked Sensor Data cubeLaurent Lefort
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
SplunkLive! Customer Presentation – Harris
SplunkLive! Customer Presentation – HarrisSplunkLive! Customer Presentation – Harris
SplunkLive! Customer Presentation – HarrisSplunk
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Futureinside-BigData.com
 
Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4Andy Moore
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 

Similaire à A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet (20)

How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
 
PosterPresentation
PosterPresentationPosterPresentation
PosterPresentation
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
 
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry ImpactAccelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx Systems
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
 
big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.ppt
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Weather Station Data Publication at Irstea: an implementation Report.
Weather Station Data Publication at Irstea: an implementation Report.  Weather Station Data Publication at Irstea: an implementation Report.
Weather Station Data Publication at Irstea: an implementation Report.
 
Linked Sensor Data cube
Linked Sensor Data cubeLinked Sensor Data cube
Linked Sensor Data cube
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
SplunkLive! Customer Presentation – Harris
SplunkLive! Customer Presentation – HarrisSplunkLive! Customer Presentation – Harris
SplunkLive! Customer Presentation – Harris
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
Nfcis2009
Nfcis2009Nfcis2009
Nfcis2009
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 

Plus de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Plus de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Dernier

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

  • 1. A Data Lake and a Data Lab to Optimize Operations and Safety Within a Nuclear Fleet Hadoop Summit 2016, San José, June 30th Marie-Luce PICARD, EDF R&D – marie-luce.picard@edf.fr Jean-Marc RANGOD, EDF-DPNT Christophe SALPERWYCK, EDF R&D Special thanks to Raphaël QUERCIA EDF-DTG, Carole MAI and Amandine PIERROT EDF R&D
  • 2. 2 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 3. 3 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 4. 4 ELECTRICITY GENERATION 623.5 TWH All electricity-related activities Generation Transmission & Distribution Trading and Sales & Marketing Energy services Key figures* €72.9 billion in sales 38.5 million customers 158,161 employees worldwide 84.7% of generation does not emit CO2 2014 INVESTMENTS €4.5 BILLION EDF: A GLOBAL LEADER IN ELECTRICITY *as of 2015 EDF : AN EFFICIENT, RESPONSIBLE ELECTRICITY COMPANY AND THE CHAMPION OF LOW-CARBON GROWTH
  • 5. WORLD’S LEADING OPERATOR, EXCELLENT PERFORMANCE IN FRANCE 72.9 GW installed capacity, 54% of the Group’s net generation capacity 477.7 TWh generated, 77% of the Group’s output 58 reactors operated in France, 15 in the UK 3 EPR under construction: — 1 in Flamanville (France) — 2 in Taishan (China) 2 EPR in project phase  OSART safety audit 17 best practices identified by IAEA  France Best generation performance for six years  UK World record for safety in the workplace  China Strengthened cooperation agreement with CNNC NUCLEAR EDF 2015 I P.5
  • 6.
  • 8. Scientific partnerships with actors of Paris- Saclay research departments 8 exceptional buildings 4 outstanding hall test 1 Unique equipment, innovative communication tools Diverse areas of expertise 1500 work stations Plenty of collaborative spaces EDF LAB PARIS-SACLAY
  • 9. 9 Main Big Data related challenges for EDF Power Generation  Process monitoring and condition-based maintenance from sensors  Power generation forecasting for renewables Energy management  Load forecasting  Balancing and optimizing generation and consumption (using smart metering information, including renewables)  Electrical networks  Smart Grid operations (local)  Condition-based maintenance Customers and sales  New services to customers using smart-metering data  Smart Homes, Smart Building, Smart Cities management related to energy
  • 10. 10 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 11. 11 Operations and maintenance of the nuclear fleet  The maintenance policy of EDF generation fleet is optimized to ensure reliability and safety of equipment and systems while strengthening our competitiveness:  Have better diagnosis, improved performance and availability  Make a better use of data and documents, so far stored into Data silos  More globally, the IT teams and projects aim at:  Strengthen performance of operations and maintenance through a global fleet approach  Simplify the Industrial Information System architecture  Improve and develop the way we use our data  Accumulate and archive data through time … while reducing costs
  • 12. 12 Voluminous and heterogeneous data …. stored in data silos Source : Wikipedia One DB by nuclear site, gathering data from sensors. Use of Data Historians.  Focus on data:  High volume:  data is stored up to 40-60 years (lifetime of the plant)  SCADA data can be sampled every 20 to 40 ms (but mainly a few seconds)  Around 10.000 sensors per plant  Variety:  Data is heterogeneous  Time series, images, documents  Various data sources  The actual systems (historians) don’t allow too many concurrent access, and their SLA are quite bad
  • 13. 13 A Data Lake for the nuclear fleet ESPADON : the Data Lake for the nuclear fleet One DB by nuclear site, gathering data from sensors. Use of Data Historians. Source : Wikipedia © M. Caraveo, Hadoop cluster NOE data center
  • 14. 14 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 15. | 15 A data lake for the nuclear fleet: big picture …. Files (chemical information) Historian - SCADA Files (dosimetry) E-monitoring application Viz Interactive queries and reporting Web Service Hadoop cluster – ESPADON Data Lake Reports © M. Caraveo, Hadoop cluster NOE data center
  • 16. 16 Zoom on data  4 generations of plants, but high level of normalization of data and sensors (for example, use of trigrams for identification of elementary systems)  Two main types of sensors : ANA (for analogic) and TOR (for state events)  Time series  Volume  For the POC, 10 plants, 2 years: about 20 billions of points  Target (59 plants) : 15 To of data (all plants, whole lifecycle) Metric, global Date Value Quality BU2ABP177MT- 2015-04-30T22:05:00.000Z 156.6 Good/M BU2ABP177MT- 2015-04-30T22:06:00.000Z 156.4 Good/M BU2ABP177MT- 2015-04-30T22:07:00.000Z 156.2 Good/M BU2ABP177MT- 2015-04-30T22:08:00.000Z 156.0 Good BU2ABP177MT- 2015-04-30T22:09:00.000Z 156.2 Good/M BU2ABP177MT- 2015-04-30T22:10:00.000Z 156.4 Good/M BU2ABP177MT- 2015-04-30T22:12:00.000Z 156.7 Good/M BU2ABP177MT- 2015-04-30T22:14:00.000Z 157.1 Good BU2ABP177MT- 2015-04-30T22:15:00.000Z 157.3 Good BU2ABP177MT- 2015-04-30T22:16:00.000Z 157.5 Good BU2ABP177MT- 2015-04-30T22:19:00.000Z 157.3 Good/M BU2ABP177MT- 2015-04-30T22:20:00.000Z 157.1 Good/M BU2ABP177MT- 2015-04-30T22:21:00.000Z 157.3 Good/M BU2ABP177MT- 2015-04-30T22:22:00.000Z 157.1 Good/M BU2ABP177MT- 2015-04-30T22:24:00.000Z 156.9 Good/M BU2ABP177MT- 2015-04-30T22:27:00.000Z 157.1 Good/M BU2ABP177MT- 2015-04-30T22:28:00.000Z 157.3 Good/M BU2ABP177MT- 2015-04-30T22:29:00.000Z 157.5 Good/M BU2ABP177MT- 2015-04-30T22:30:00.000Z 157.7 Good/M
  • 17. 17 Data model  Use of HBASE and PHOENIX  Distributed key/values store  Allows models update (normalization requirements evolution, new indicators… new plants)  Phoenix for SQL compliance + BI tools  Tables  3 tables : DDT, ANA, TOR  Rowkey : <sensorid, timestamp> (queries mainly consider one or several sensors for a period of time)  Sequential storage ; split into Hfiles and Hregion according to the plant unit Clé ColumnFamily Colonne Valeur Phoenix type m (concat(metriquei d, timestamp)) 0 v H_ValeurANA Float q H_QualitéANA Char(10) n H_NiveauxANA varchar(10) Clé ColumnFamily Colonne Valeur Phoenix type m (concat(metriquei d, timestamp)) 0 v H_ValeurTOR Varchar(10) q H_QualiteTOR Char(10) n H_NiveauxTOR Varchar(10)
  • 18. 18 Validation and performances evaluation  POC validation  Upload of historical data; queries / analyses  Existing functions: viz, reports, services  Data injection: SCADA for the whole fleet, integration of other sources of data  Results  6 weeks (estimated) needed to upload historical data from 59 plants  Queries for validating the model :  Use of Jmeter for simulating load  With or without insertion workload  ~ < 1 second for drawing a curve for a selected month  Integration of an existing GUI for viz (realized within a few days)  Validation of specific calculation within reports  ODBC link for specific e-monitoring application  Integration of various sources of (structured) data into the data lake  ‘Real-time’ insertion of data (micro-batch):  Up to 2M points / s  Very low latency between insertion and availability (< 10s) SELECT MIN(v), MAX(v), FIRST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC), LAST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC), TO_CHAR(ts, 'dd') as day, TO_CHAR(ts, 'HH') as hour, TO_CHAR(ts, 'mm') as minute, count(*) as cnt FROM ORLI_ANA WHERE m = ? AND ts > current_time()-1 AND //last 24h ts < current_time() GROUP BY day, hour, minute Phoenix query (ANA)
  • 19. 19 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 20. 20 Added value of data science algorithms on heterogeneous data: Operations and maintenance can be better optimized through data analytics run on data coming from the whole fleet  Active and reactive power are indicators of constraints on alternators: effect on their wears • ~ 50 plants • 20 years of data • 10 min interval data • Phoenix queries allow to select plants and periods of time • Compute and show reactive power per day or per hour of the day • More detailed analysis • Fleet level analysis • Interactive queries
  • 21. 21 Added value of data science algorithms on heterogeneous data: Operations and maintenance can be better optimized through data analytics run on data coming from the whole fleet Monitoring and control of contractual agreements when network frequency varies (plants have to contribute to the global balance) • Pattern matching • Response time for different plants • Different levels of analysis : by plant, by generation, global • Generic approach implemented for any kind of patterns
  • 22. 22 Added value of data science algorithms on heterogeneous data Prediction of plants cooling according to the quality of incoming water in the plants • Correlations? • According to the plants • Use of GAM models • Integration of two internal sources + external data • Better understanding • // Work in progress //
  • 23. 23 Integration of data science and visualization: architecture Hadoop Cluster Web Service REST (VM) Browser
  • 24. 24 Integration of data science: a global approach Pre-processing Data quality Sampling Synchronization … Selection and queries Threshold Pattern matching Period of time … Analysis and data science Reporting Exploratory analysis (distribution …) Modelling …
  • 25. 25 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 26. 26 A Data Lab in progress: a team, an approach … … and some questions Objectives: Bring value from data analytics Issues:  Skills and organization (between entities)  Architecture :  Operational Hadoop cluster and loads (use of a multitenant enterprise cluster)  Other loads (data science)  Data prep within Hadoop + edge machine for data science (Spark, R, Python)  How to quantify value  Developments costs and maintenance  How to industrialize Source: Xebia
  • 27. 27 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 28. 28 Takeaways  A Data Lake for our nuclear fleet  In progress : industrialization and decommissioning of Historian applications  Great reduction of licensing costs  A Data Lab under construction  POCs showing the added value of data science algorithms  predictive maintenance  In the context of fleet renovation for plant life extension (major overhaul program): operations & maintenance, generation costs optimization  Issues remaining : skills, organization, technical architecture, quantify value  Perspectives and technical issues:  Data lakes and labs for other fleets (thermal plants, hydro, renewables)  Scalable time-series analytics (synchronization, missing data …)  Handling heterogeneous data (textual, images, graphs …)  IoT platform
  • 29. References A proof of concept with Hadoop: storage and analytics of electrical time-series. Marie-Luce Picard, Bruno Jacquin, Hadoop Summit 2012, Californie, USA, June 2012: http://www.slideshare.net/Hadoop_Summit/proof-of- concent-with-hadoop Massive Smart Meter Data Storage and Processing on top of Hadoop. Leeley D. P. dos Santos, Alzennyr G. da Silva, Bruno Jacquin, Marie-Luce Picard, David Worms,Charles Bernard. Workshop Big Data 2012, Conférence VLDB (Very Large Data Bases), Istanbul, Turquie, 2012: http://www.cse.buffalo.edu/faculty/tkosar/bigdata2012/program.php Searching time-series with Hadoop in an electric power company. Alice Bérard, Georges Hébrail, BigMine Workshop, KDD2013, Chicago, August 2013: http://bigdata-mining.org/ Real-time energy data-analytics with Storm. Rémy Saissy, Marie-Luce Picard, Charles Bernard, Bruno Jacquin, Simon Maby, Benoît Grossin, Hadoop Summit 2014, Californie, USA, June 2014: http://fr.slideshare.net/Hadoop_Summit/t-525p212picard Computing Data Quality Indicators on Big Data Stream Using a CEP Wenlu Yang, Alzennyr Gomes Da Silva, Marie-Luce Picard, IEEE Xplore - IWCIM 2015, Prague, Novembre 2015. Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Network Guillaume Germaine, Thomas Vial, Hadoop Summit Europe 2016, Dublin http://www.slideshare.net/HadoopSummit/exploring-titan-and-spark-graphx-for-analyzing-timevarying-electrical-networks

Notes de l'éditeur

  1. Nuclear energy supplies competitive, carbon-free electricity that we generate in the best possible safety conditions. In 2014, the International Atomic Energy Agency conducted an audit on how nuclear safety is integrated into the organisation and processes of our central departments: the IAEA found no departure from its standards and identified 17 best practices. → In France, we achieved our best performance in six years thanks to our management of scheduled shutdowns: the average length of extensions was halved. Wintertime fleet availability topped 90%. Our annual output was up 3% (415.9 TWh). • The principle of the “Grand Carénage” maintenance programme was approved. The programme involves renovating the French nuclear fleet over a 10-year period in order to extend its operating life beyond 40 years if all conditions are met. The investment is put at €55 billion for the entire fleet. • The Flamanville EPR worksite is continuing, the first nuclear plant to be built in France for 15 years. → In the UK, output was good (56.3 TWh) despite the unscheduled shutdown of two plants. EDF Energy established a world record for safety in the workplace (0.98 accidents requiring more than one day of lost time per million hours worked by employees and subcontractors). • The Hinkley Point C project to build two EPR in Somerset took a major step forward: in October, the European Commission approved the main terms of the agreements concluded with the British government. → In China, through partnerships, we are taking good advantage of the expertise we have acquired in the design, construction, operation and maintenance of our nuclear fleet. • Construction of two 1,750 MW EPR in Taishan (EDF 30% in partnership with CGN) is ongoing. • We signed an agreement to strengthen cooperation in engineering, operation and maintenance with CNNC, China’s largest state-owned nuclear company.
  2. 29