NAPE 2019 Presentation

•

0 j'aime•75 vues

Chijioke “CJ” Ejimuda

Presentation on challenges faced with processing processing petrophysical big data for accessing viable opportunities

Ingénierie

NAPE ICE
2019
Outline
● Why Process Petrophysical Big Data?
● What Big Data processing challenges?
● ETL Workflow
● Conclusion
● References

NAPE ICE
2019Why Process Petrophysical
Big Data?
● Re-evaluate old well logs for opportunities
● Conducting pre-drill analysis of offset wells
● Unable to effectively assess well / field reserves
● Challenge with inferring geological features

NAPE ICE
2019
● For 1 to 10 well log files?
- Copying the link and pasting on the browser is straightforward
- Quickly download log data
- Easier to perform ETL with such amount of data
What Big Data
Processing Challenges?

NAPE ICE
2019
● For 1 to 10 well log files?
● For 1000 well log files ???
What Big Data
Processing Challenges?

NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
What Big Data
Processing Challenges?

NAPE ICE
2019
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS S3 bucket
● GOAL: Making data ready for Apache Spark ML and Tensorflow
Deep Learning Pipeline
Extract Transform Load

NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
- get the links to the files
- append all the extracted links to a list
- account for errors
- save the file
ETL Workflow

NAPE ICE
2019
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
- extract their actual data and Metadata / Header data
- account for errors
ETL Workflow

NAPE ICE
2019
Why Apache Arrow?
ETL Workflow

NAPE ICE
2019Conclusion
● Apache Airflow to orchestrate ETL process
● Moving towards real time data processing:
-WITSML data processing
● Apache Kafka, Apache Flink, Apache Storm, Apache Spark

NAPE ICE
2019References
● http://www.searchanddiscovery.com/pdfz/documents/2018/42234ejimuda/ndx_ejimuda.pdf.html
● https://www.slideshare.net/wesm/high-performance-python-on-apache-spark
● https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html
● https://airflow.apache.org/installation.html

Recommandé

Strongly Consistent Global Indexes for Apache PhoenixYugabyteDB

Stream Analytics with SQL on Apache Flink - Fabian HueskeEvention

Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...Safe Software

Final PresentationSai Sujitha Venkatesan

Managing Data Synchronization Between ArcSDE and POSTGIS using FMESafe Software

Cruising in data lake from zero to scaleJohn Varghese

Dan Querimit - BI Portfolioquerimit

Get Your Aircraft Spare Parts Inventory Management Off the GroundPTC

Recommandé

Strongly Consistent Global Indexes for Apache PhoenixYugabyteDB

Stream Analytics with SQL on Apache Flink - Fabian HueskeEvention

Using FME to Transfer Park Asset Data From an Oracle Database to Trimble GPS ...Safe Software

Final PresentationSai Sujitha Venkatesan

Managing Data Synchronization Between ArcSDE and POSTGIS using FMESafe Software

Cruising in data lake from zero to scaleJohn Varghese

Dan Querimit - BI Portfolioquerimit

Get Your Aircraft Spare Parts Inventory Management Off the GroundPTC

Five Ways to Move Data to Excel in EnergyCAPEnergyCAP, Inc.

Hydraulic Modelling with GIS DataSafe Software

Loading Data into RedshiftAmazon Web Services

Processing genetic data at scaleMark Schroering

ETL to RDF with Talend and AllegroGraphFranz Inc. - AllegroGraph

Review on Apache Spark TechnologyIRJET Journal

Data Quality AssuranceSafe Software

2016 conservation track: under the hood of an rea: what is within a rapid ec...GIS in the Rockies

Pilot Project for HDF5 Metadata Structures for SWOTThe HDF-EOS Tools and Information Center

Integrating Ontario’s Provincially Tracked Species Data Using FMESafe Software

SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to TextSafe Software

Stinger hadoop summit june 2013alanfgates

FME & GovernementSafe Software

Using FME for the City of Palo Alto Topobase ImplentationSafe Software

Tenisha Hamilton -BITenishaH

Filtering vs Enriching Data in Apache SparkDatabricks

Why Open Source Works for DevOps MonitoringDevOps.com

Active Record Internals - Medellin.rbOscar Rendon

Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with SparkDatabricks

Apache KafkaMaher TEBOURBI

AAPG Geoscience Technology Workshop 2019Chijioke “CJ” Ejimuda

Actionable Insights with AI - Snowflake for Data ScienceHarald Erb

Contenu connexe

Tendances

Five Ways to Move Data to Excel in EnergyCAPEnergyCAP, Inc.

Hydraulic Modelling with GIS DataSafe Software

Loading Data into RedshiftAmazon Web Services

Processing genetic data at scaleMark Schroering

ETL to RDF with Talend and AllegroGraphFranz Inc. - AllegroGraph

Review on Apache Spark TechnologyIRJET Journal

Data Quality AssuranceSafe Software

2016 conservation track: under the hood of an rea: what is within a rapid ec...GIS in the Rockies

Pilot Project for HDF5 Metadata Structures for SWOTThe HDF-EOS Tools and Information Center

Integrating Ontario’s Provincially Tracked Species Data Using FMESafe Software

SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to TextSafe Software

Stinger hadoop summit june 2013alanfgates

FME & GovernementSafe Software

Using FME for the City of Palo Alto Topobase ImplentationSafe Software

Tenisha Hamilton -BITenishaH

Filtering vs Enriching Data in Apache SparkDatabricks

Why Open Source Works for DevOps MonitoringDevOps.com

Active Record Internals - Medellin.rbOscar Rendon

Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with SparkDatabricks

Apache KafkaMaher TEBOURBI

Tendances (20)

Five Ways to Move Data to Excel in EnergyCAP

Hydraulic Modelling with GIS Data

Loading Data into Redshift

Processing genetic data at scale

ETL to RDF with Talend and AllegroGraph

Review on Apache Spark Technology

Data Quality Assurance

2016 conservation track: under the hood of an rea: what is within a rapid ec...

Pilot Project for HDF5 Metadata Structures for SWOT

Integrating Ontario’s Provincially Tracked Species Data Using FME

SDE to SPS (Synergi Pipeline Simulator) - Spatial Data to Text

Stinger hadoop summit june 2013

FME & Governement

Using FME for the City of Palo Alto Topobase Implentation

Tenisha Hamilton -BI

Filtering vs Enriching Data in Apache Spark

Why Open Source Works for DevOps Monitoring

Active Record Internals - Medellin.rb

Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Apache Kafka

Similaire à NAPE 2019 Presentation

AAPG Geoscience Technology Workshop 2019Chijioke “CJ” Ejimuda

Actionable Insights with AI - Snowflake for Data ScienceHarald Erb

Continuous Data Replication into Cloud Storage with Oracle GoldenGateMichael Rainey

Mcneill 01narsareddy1234

Snowflake for Data EngineeringHarald Erb

Data Pipeline for The Big Data/Data Science OKCMark Smith

Cloud-based Data Lake for Analytics and AITorsten Steinbach

Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi

Combining Logs, Metrics, and Traces for Unified ObservabilityElasticsearch

IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudTorsten Steinbach

Peteris Arajs - Where is my dataAndrejs Vorobjovs

Enterprise Data LakesFarid Gurbanov

Todd vatalaro oracle 2004Todd Vatalaro

Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaDatabricks

Using Databricks as an Analysis PlatformDatabricks

From Idea to Model: Productionizing Data Pipelines with Apache AirflowDatabricks

Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward

Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb

Airflow at lyft for Airflow summit 2020 conferenceTao Feng

IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudTorsten Steinbach

Similaire à NAPE 2019 Presentation (20)

AAPG Geoscience Technology Workshop 2019

Actionable Insights with AI - Snowflake for Data Science

Continuous Data Replication into Cloud Storage with Oracle GoldenGate

Mcneill 01

Snowflake for Data Engineering

Data Pipeline for The Big Data/Data Science OKC

Cloud-based Data Lake for Analytics and AI

Building a Data Pipeline using Apache Airflow (on AWS / GCP)

Combining Logs, Metrics, and Traces for Unified Observability

IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud

Peteris Arajs - Where is my data

Enterprise Data Lakes

Todd vatalaro oracle 2004

Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka

Using Databricks as an Analysis Platform

From Idea to Model: Productionizing Data Pipelines with Apache Airflow

Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"

Delivering rapid-fire Analytics with Snowflake and Tableau

Airflow at lyft for Airflow summit 2020 conference

IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud

Plus de Chijioke “CJ” Ejimuda

Revolutionizing Crtitical Infrastructure ConnectivityChijioke “CJ” Ejimuda

Leveraging big data to maximize value from rail and power infrastructure assets.Chijioke “CJ” Ejimuda

Internet of Energy: "Can python prevent California wildfires?"Chijioke “CJ” Ejimuda

Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsChijioke “CJ” Ejimuda

Learning from Autonomous Vehicle IndustryChijioke “CJ” Ejimuda

Could Edge Computing Lead to the End of Real Time Operating Centers?Chijioke “CJ” Ejimuda

Optimizing PV energy yield with Elasticsearch and graphQLChijioke “CJ” Ejimuda

Self Driving Directional Drilling on the EdgeChijioke “CJ” Ejimuda

hybriData Energy Services and Data ProductsChijioke “CJ” Ejimuda

hybriData IIoT Workshop for AAPG Short CourseChijioke “CJ” Ejimuda

IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityChijioke “CJ” Ejimuda

elasticsearch X reactChijioke “CJ” Ejimuda

Plus de Chijioke “CJ” Ejimuda (12)

Revolutionizing Crtitical Infrastructure Connectivity

Leveraging big data to maximize value from rail and power infrastructure assets.

Internet of Energy: "Can python prevent California wildfires?"

Using Deep Learning and Computer Vision to improve Corrosion risk assessments

Learning from Autonomous Vehicle Industry

Could Edge Computing Lead to the End of Real Time Operating Centers?

Optimizing PV energy yield with Elasticsearch and graphQL

Self Driving Directional Drilling on the Edge

hybriData Energy Services and Data Products

hybriData IIoT Workshop for AAPG Short Course

IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility

elasticsearch X react

Dernier

An experimental study in using natural admixture as an alternative for chemic...Chandu841456

Earthing details of Electrical Substationstephanwindworld

Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774

Design and analysis of solar grass cutter.pdfTagore Institute of Engineering And Technology

Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis

Transport layer issues and challenges - GuideGOPINATHS437943

welding defects observed during the weldingMuhammadUzairLiaqat

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR

POWER SYSTEMS-1 Complete notes examplesDr. Gudipudi Nageswara Rao

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort

Correctly Loading Incremental Data at ScaleAlluxio, Inc.

Application of Residue Theorem to evaluate real integrations.pptx959SahilShah

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani

Electronically Controlled suspensions system .pdfme23b1001

Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423

Indian Dairy Industry Present Status and.pptMadan Karki

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxnull - The Open Security Community

Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3

Dernier (20)

An experimental study in using natural admixture as an alternative for chemic...

Earthing details of Electrical Substation

Arduino_CSE ece ppt for working and principal of arduino.ppt

Design and analysis of solar grass cutter.pdf

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction

Transport layer issues and challenges - Guide

welding defects observed during the welding

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...

POWER SYSTEMS-1 Complete notes examples

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service

Correctly Loading Incremental Data at Scale

Application of Residue Theorem to evaluate real integrations.pptx

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

Instrumentation, measurement and control of bio process parameters ( Temperat...

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf

Electronically Controlled suspensions system .pdf

Vishratwadi & Ghorpadi Bridge Tender documents

Indian Dairy Industry Present Status and.ppt

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx

Concrete Mix Design - IS 10262-2019 - .pptx

NAPE 2019 Presentation

2. NAPE ICE 2019 Outline ● Why Process Petrophysical Big Data? ● What Big Data processing challenges? ● ETL Workflow ● Conclusion ● References

3. NAPE ICE 2019Why Process Petrophysical Big Data? ● Re-evaluate old well logs for opportunities ● Conducting pre-drill analysis of offset wells ● Unable to effectively assess well / field reserves ● Challenge with inferring geological features

4. NAPE ICE 2019 ● For 1 to 10 well log files? - Copying the link and pasting on the browser is straightforward - Quickly download log data - Easier to perform ETL with such amount of data What Big Data Processing Challenges?

5. NAPE ICE 2019 ● For 1 to 10 well log files? ● For 1000 well log files ??? What Big Data Processing Challenges?

6. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets What Big Data Processing Challenges?

7. NAPE ICE 2019 ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS S3 bucket ● GOAL: Making data ready for Apache Spark ML and Tensorflow Deep Learning Pipeline Extract Transform Load

8. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web - get the links to the files - append all the extracted links to a list - account for errors - save the file ETL Workflow

9. NAPE ICE 2019ETL Workflow

10. NAPE ICE 2019ETL Workflow

11. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file - extract their actual data and Metadata / Header data - account for errors ETL Workflow

12. NAPE ICE 2019ETL Workflow

13. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS s3 bucket ETL Workflow

14. NAPE ICE 2019 Why Apache Arrow? ETL Workflow

15. NAPE ICE 2019 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS s3 bucket ● Making data ready for Apache Spark ML / Keras Deep Learning Pipeline - drop columns: 152 to 13 , drop duplicates , null / NA values, account for missing values - Split-apply-combine on grouped data by field and API: @pandas_udf - Caching dataframe ETL Workflow

16. NAPE ICE 2019Conclusion ● Apache Airflow to orchestrate ETL process ● Moving towards real time data processing: -WITSML data processing ● Apache Kafka, Apache Flink, Apache Storm, Apache Spark

17. NAPE ICE 2019References ● http://www.searchanddiscovery.com/pdfz/documents/2018/42234ejimuda/ndx_ejimuda.pdf.html ● https://www.slideshare.net/wesm/high-performance-python-on-apache-spark ● https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html ● https://airflow.apache.org/installation.html

18. NAPE ICE 2019 Questions?

19. NAPE ICE 2019 Thank you!