SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
NAPE ICE
2019
Outline
● Why Process Petrophysical Big Data?
● What Big Data processing challenges?
● ETL Workflow
● Conclusion
● References
NAPE ICE
2019Why Process Petrophysical
Big Data?
● Re-evaluate old well logs for opportunities
● Conducting pre-drill analysis of offset wells
● Unable to effectively assess well / field reserves
● Challenge with inferring geological features
NAPE ICE
2019
● For 1 to 10 well log files?
- Copying the link and pasting on the browser is straightforward
- Quickly download log data
- Easier to perform ETL with such amount of data
What Big Data
Processing Challenges?
NAPE ICE
2019
● For 1 to 10 well log files?
● For 1000 well log files ???
What Big Data
Processing Challenges?
NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
What Big Data
Processing Challenges?
NAPE ICE
2019
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS S3 bucket
● GOAL: Making data ready for Apache Spark ML and Tensorflow
Deep Learning Pipeline
Extract Transform Load
NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
- get the links to the files
- append all the extracted links to a list
- account for errors
- save the file
ETL Workflow
NAPE ICE
2019ETL Workflow
NAPE ICE
2019ETL Workflow
NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
- extract their actual data and Metadata / Header data
- account for errors
ETL Workflow
NAPE ICE
2019ETL Workflow
NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS s3 bucket
ETL Workflow
NAPE ICE
2019
Why Apache Arrow?
ETL Workflow
NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data format before
loading to AWS s3 bucket
● Making data ready for Apache Spark ML / Keras Deep Learning Pipeline
- drop columns: 152 to 13 , drop duplicates , null / NA values, account for missing values
- Split-apply-combine on grouped data by field and API: @pandas_udf
- Caching dataframe
ETL Workflow
NAPE ICE
2019Conclusion
● Apache Airflow to orchestrate ETL process
● Moving towards real time data processing:
-WITSML data processing
● Apache Kafka, Apache Flink, Apache Storm, Apache Spark
NAPE ICE
2019References
● http://www.searchanddiscovery.com/pdfz/documents/2018/42234ejimuda/ndx_ejimuda.pdf.html
● https://www.slideshare.net/wesm/high-performance-python-on-apache-spark
● https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html
● https://airflow.apache.org/installation.html
NAPE ICE
2019
Questions?
NAPE ICE
2019
Thank you!
NAPE 2019 Presentation

Contenu connexe

Tendances

Five Ways to Move Data to Excel in EnergyCAP
Five Ways to Move Data to Excel in EnergyCAPFive Ways to Move Data to Excel in EnergyCAP
Five Ways to Move Data to Excel in EnergyCAPEnergyCAP, Inc.
 
Hydraulic Modelling with GIS Data
Hydraulic Modelling with GIS DataHydraulic Modelling with GIS Data
Hydraulic Modelling with GIS DataSafe Software
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scaleMark Schroering
 
Review on Apache Spark Technology
Review on Apache Spark TechnologyReview on Apache Spark Technology
Review on Apache Spark TechnologyIRJET Journal
 
Data Quality Assurance
Data Quality AssuranceData Quality Assurance
Data Quality AssuranceSafe Software
 
2016 conservation track: under the hood of an rea: what is within a rapid ec...
2016 conservation track: under the hood of an rea:  what is within a rapid ec...2016 conservation track: under the hood of an rea:  what is within a rapid ec...
2016 conservation track: under the hood of an rea: what is within a rapid ec...GIS in the Rockies
 
Integrating Ontario’s Provincially Tracked Species Data Using FME
Integrating Ontario’s Provincially Tracked Species Data Using FMEIntegrating Ontario’s Provincially Tracked Species Data Using FME
Integrating Ontario’s Provincially Tracked Species Data Using FMESafe Software
 
SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to Text
SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to TextSDE to SPS (Synergi Pipeline Simulator) - Spatial Data to Text
SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to TextSafe Software
 
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013Stinger hadoop summit june 2013
Stinger hadoop summit june 2013alanfgates
 
Using FME for the City of Palo Alto Topobase Implentation
Using FME for the City of Palo Alto Topobase ImplentationUsing FME for the City of Palo Alto Topobase Implentation
Using FME for the City of Palo Alto Topobase ImplentationSafe Software
 
Tenisha Hamilton -BI
Tenisha Hamilton -BITenisha Hamilton -BI
Tenisha Hamilton -BITenishaH
 
Filtering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache SparkFiltering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache SparkDatabricks
 
Why Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringWhy Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringDevOps.com
 
Active Record Internals - Medellin.rb
Active Record Internals - Medellin.rbActive Record Internals - Medellin.rb
Active Record Internals - Medellin.rbOscar Rendon
 
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with SparkSpark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with SparkDatabricks
 

Tendances (20)

Five Ways to Move Data to Excel in EnergyCAP
Five Ways to Move Data to Excel in EnergyCAPFive Ways to Move Data to Excel in EnergyCAP
Five Ways to Move Data to Excel in EnergyCAP
 
Hydraulic Modelling with GIS Data
Hydraulic Modelling with GIS DataHydraulic Modelling with GIS Data
Hydraulic Modelling with GIS Data
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scale
 
ETL to RDF with Talend and AllegroGraph
ETL to RDF with Talend and AllegroGraphETL to RDF with Talend and AllegroGraph
ETL to RDF with Talend and AllegroGraph
 
Review on Apache Spark Technology
Review on Apache Spark TechnologyReview on Apache Spark Technology
Review on Apache Spark Technology
 
Data Quality Assurance
Data Quality AssuranceData Quality Assurance
Data Quality Assurance
 
2016 conservation track: under the hood of an rea: what is within a rapid ec...
2016 conservation track: under the hood of an rea:  what is within a rapid ec...2016 conservation track: under the hood of an rea:  what is within a rapid ec...
2016 conservation track: under the hood of an rea: what is within a rapid ec...
 
Pilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOTPilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOT
 
Integrating Ontario’s Provincially Tracked Species Data Using FME
Integrating Ontario’s Provincially Tracked Species Data Using FMEIntegrating Ontario’s Provincially Tracked Species Data Using FME
Integrating Ontario’s Provincially Tracked Species Data Using FME
 
SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to Text
SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to TextSDE to SPS (Synergi Pipeline Simulator) - Spatial Data to Text
SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to Text
 
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
 
FME & Governement
FME & GovernementFME & Governement
FME & Governement
 
Using FME for the City of Palo Alto Topobase Implentation
Using FME for the City of Palo Alto Topobase ImplentationUsing FME for the City of Palo Alto Topobase Implentation
Using FME for the City of Palo Alto Topobase Implentation
 
Tenisha Hamilton -BI
Tenisha Hamilton -BITenisha Hamilton -BI
Tenisha Hamilton -BI
 
Filtering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache SparkFiltering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache Spark
 
Why Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringWhy Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps Monitoring
 
Active Record Internals - Medellin.rb
Active Record Internals - Medellin.rbActive Record Internals - Medellin.rb
Active Record Internals - Medellin.rb
 
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with SparkSpark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 

Similaire à NAPE 2019 Presentation

Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
 
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateContinuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateMichael Rainey
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCMark Smith
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AITorsten Steinbach
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityElasticsearch
 
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudIBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudTorsten Steinbach
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my dataAndrejs Vorobjovs
 
Todd vatalaro oracle 2004
Todd vatalaro oracle 2004Todd vatalaro oracle 2004
Todd vatalaro oracle 2004Todd Vatalaro
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaDatabricks
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowDatabricks
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
 
Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceTao Feng
 
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudIBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudTorsten Steinbach
 

Similaire à NAPE 2019 Presentation (20)

AAPG Geoscience Technology Workshop 2019
AAPG Geoscience Technology Workshop 2019AAPG Geoscience Technology Workshop 2019
AAPG Geoscience Technology Workshop 2019
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateContinuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
 
Mcneill 01
Mcneill 01Mcneill 01
Mcneill 01
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKC
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudIBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my data
 
Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
 
Todd vatalaro oracle 2004
Todd vatalaro oracle 2004Todd vatalaro oracle 2004
Todd vatalaro oracle 2004
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and Tableau
 
Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conference
 
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudIBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
 

Plus de Chijioke “CJ” Ejimuda

Revolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure ConnectivityRevolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure ConnectivityChijioke “CJ” Ejimuda
 
Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.Chijioke “CJ” Ejimuda
 
Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"Chijioke “CJ” Ejimuda
 
Using Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsUsing Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsChijioke “CJ” Ejimuda
 
Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?Chijioke “CJ” Ejimuda
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLChijioke “CJ” Ejimuda
 
Self Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the EdgeSelf Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the EdgeChijioke “CJ” Ejimuda
 
hybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short CoursehybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short CourseChijioke “CJ” Ejimuda
 
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityIIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityChijioke “CJ” Ejimuda
 

Plus de Chijioke “CJ” Ejimuda (12)

Revolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure ConnectivityRevolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure Connectivity
 
Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.
 
Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"
 
Using Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsUsing Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessments
 
Learning from Autonomous Vehicle Industry
Learning from Autonomous Vehicle IndustryLearning from Autonomous Vehicle Industry
Learning from Autonomous Vehicle Industry
 
Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQL
 
Self Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the EdgeSelf Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the Edge
 
hybriData Energy Services and Data Products
hybriData Energy Services and Data ProductshybriData Energy Services and Data Products
hybriData Energy Services and Data Products
 
hybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short CoursehybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short Course
 
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityIIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
 
elasticsearch X react
elasticsearch X reactelasticsearch X react
elasticsearch X react
 

Dernier

An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 

Dernier (20)

An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 

NAPE 2019 Presentation

  • 1.
  • 2. NAPE ICE 2019 Outline ● Why Process Petrophysical Big Data? ● What Big Data processing challenges? ● ETL Workflow ● Conclusion ● References
  • 3. NAPE ICE 2019Why Process Petrophysical Big Data? ● Re-evaluate old well logs for opportunities ● Conducting pre-drill analysis of offset wells ● Unable to effectively assess well / field reserves ● Challenge with inferring geological features
  • 4. NAPE ICE 2019 ● For 1 to 10 well log files? - Copying the link and pasting on the browser is straightforward - Quickly download log data - Easier to perform ETL with such amount of data What Big Data Processing Challenges?
  • 5. NAPE ICE 2019 ● For 1 to 10 well log files? ● For 1000 well log files ??? What Big Data Processing Challenges?
  • 6. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets What Big Data Processing Challenges?
  • 7. NAPE ICE 2019 ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS S3 bucket ● GOAL: Making data ready for Apache Spark ML and Tensorflow Deep Learning Pipeline Extract Transform Load
  • 8. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web - get the links to the files - append all the extracted links to a list - account for errors - save the file ETL Workflow
  • 11. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file - extract their actual data and Metadata / Header data - account for errors ETL Workflow
  • 13. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS s3 bucket ETL Workflow
  • 14. NAPE ICE 2019 Why Apache Arrow? ETL Workflow
  • 15. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS s3 bucket ● Making data ready for Apache Spark ML / Keras Deep Learning Pipeline - drop columns: 152 to 13 , drop duplicates , null / NA values, account for missing values - Split-apply-combine on grouped data by field and API: @pandas_udf - Caching dataframe ETL Workflow
  • 16. NAPE ICE 2019Conclusion ● Apache Airflow to orchestrate ETL process ● Moving towards real time data processing: -WITSML data processing ● Apache Kafka, Apache Flink, Apache Storm, Apache Spark
  • 17. NAPE ICE 2019References ● http://www.searchanddiscovery.com/pdfz/documents/2018/42234ejimuda/ndx_ejimuda.pdf.html ● https://www.slideshare.net/wesm/high-performance-python-on-apache-spark ● https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html ● https://airflow.apache.org/installation.html