AAPG Geoscience Technology Workshop 2019

•

1 j'aime•175 vues

Sharing challenges faced processing petrophysical big data of 1000 well logs, how it was resolved and potential next steps to improve process

Ingénierie

AAPG Geoscience Technical Workshop:
Boosting reserves and recovery using ML and Analytics
January 15-17, 2019
Marathon Oil Tower - Houston, TX
Challenges Faced with Processing
Petrophysical Big Data for Assessing
Viable Opportunities
CJ Ejimuda-MS, Emenike Ejimuda-PhD
Hybrid Data Solutions, Los Angeles, CA, USA
web:https://hybridata.us

CJ Ejimuda
Full Stack Data Scientist / Principal
Hybrid Data Solutions
Mine more value leveraging AI, IIoT, Big Data
Domain Expertise in Reservoir and Production Engineering
ExxonMobil, Aera Energy
2
A little about us
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

3
Outline
● Why process petrophysical Big Data?
● What Big Data processing challenges?
● ETL Workflow
● ETL Automation
● Conclusion
● References
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

4
Why process petrophysical Big Data?
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
● Re-evaluate old well logs for opportunities
● Conducting pre-drill analysis of offset wells
● Unable to effectively assess well / field reserves
● Challenge with inferring geological features

5
What Big Data processing challenges?
● For 1 to 10 well log files?
- Copying the link and pasting on the browser is straightforward
- Quickly download log data
- Easier to perform ETL with such amount of data
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

6
What Big Data processing challenges?
● For 1 to 10 well log files?
● For 1000 well log files ???
● Link to ~ 1000 well log data from 5 fields in excel sheets
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

7
ETL Workflow
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS S3 bucket
● GOAL: Making data ready for Apache Spark ML and Tensorflow Deep
Learning Pipeline
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

8
ETL Automation
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
- get the links to the files
- append all the extracted links to a list
- account for errors
- save the file
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

9
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

10
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

11
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
- extract their actual data and Metadata / Header data
- account for errors
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

12
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

13
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS s3 bucket
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

14
ETL Automation
Why Apache Arrow?
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

15
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS s3 bucket
● Making data ready for Apache Spark ML / Keras Deep Learning
Pipeline
- drop columns: 152 to 13 , drop duplicates , null / NA values, account
for missing values
- Split-apply-combine on grouped data by field and API: @pandas_udf
- Caching dataframe
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

16
● Do not Repeat Yourself
● Apache Airflow to orchestrate ETL process
- Define DAG
- You may use Dummy, Sensor and Python operators (* with XCom)
- Use AWS Services (S3,EMR…) / Azure / GCP service
ETL Automation - Potential Next Steps
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

17
● Moving towards real time data processing:
- WITSML data processing
● Apache Kafka, Apache Flink, Apache Storm, Apache Spark
Conclusion
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

18
References
● https://www.slideshare.net/wesm/high-performance-python-on-apache-spark
● https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html
● https://airflow.apache.org/installation.html
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

19
Questions?
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

20
Thank you!
email: cj@hybriData.us
web: https://hybriData.us
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

21
Back Up
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

Contenu connexe

Tendances

Data SourcesJoe Larson

AWS_ac_ra_loganalysis_11Kashif .T Ibrahim

Bigdata 2016- projects listNEWZEN INFOTECH

Advait kulkarniAdvait Kulkarni

Airline Analysis of Data Using HadoopGreater Noida Institute Of Technology

Leveraging big data to maximize value from rail and power infrastructure assets.Chijioke “CJ” Ejimuda

Railroad Modeling at Hadoop ScaleDataWorks Summit

Resume (kaushik shakkari)Kaushik Shakkari

Alan Crosswell Canarie20090304Bill St. Arnaud

HTML Flight Scraper Anthony Kilde

SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"BigData_Europe

Managing Data Synchronization Between ArcSDE and POSTGIS using FMESafe Software

A-ONE consultantsJessica Morris

Shenoy resumeSiddharthShenoy4

DataAnalysisVaibhav Nawathe

GIS #7Jessica Meyer

Data Analyst TrackReaab J. Mohammed

Time_Series_AssignmentKOUSHIK RAKSHIT

Final Project PresentationM Zubair Iqbal

Weather exploratory data analysismadhucharis

Tendances (20)

Data Sources

AWS_ac_ra_loganalysis_11

Bigdata 2016- projects list

Advait kulkarni

Airline Analysis of Data Using Hadoop

Leveraging big data to maximize value from rail and power infrastructure assets.

Railroad Modeling at Hadoop Scale

Resume (kaushik shakkari)

Alan Crosswell Canarie20090304

HTML Flight Scraper

SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"

Managing Data Synchronization Between ArcSDE and POSTGIS using FME

A-ONE consultants

Shenoy resume

DataAnalysis

GIS #7

Data Analyst Track

Time_Series_Assignment

Final Project Presentation

Weather exploratory data analysis

Similaire à AAPG Geoscience Technology Workshop 2019

NAPE 2019 PresentationChijioke “CJ” Ejimuda

Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...DataBench

Exploratory Analysis of Spark Structured Streamingt_ivanov

Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Amazon Web Services

ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoDB Database

Introduction to Apache Spark 2.0Knoldus Inc.

Big Data Driven At Eway Tu Pham

Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Databricks

Os Lonerganoscon2007

Make your data fly - Building data platform in AWSKimmo Kantojärvi

Measuring, Quantifying, & Predicting the Cost-Accuracy TradeoffHong-Linh Truong

UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...Karen Cannell

Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019javier ramirez

Google cloud platformrajdeep

Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Databricks

Processing genetic data at scaleMark Schroering

Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito

Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi

Syngenta's Predictive Analytics Platform for Seeds R&DMichael Swanson

Evaluation of TPC-H on Spark and Spark SQL in ALOJADataWorks Summit

Similaire à AAPG Geoscience Technology Workshop 2019 (20)

NAPE 2019 Presentation

Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...

Exploratory Analysis of Spark Structured Streaming

Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...

ArangoML Pipeline Cloud - Managed Machine Learning Metadata

Introduction to Apache Spark 2.0

Big Data Driven At Eway

Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...

Os Lonergan

Make your data fly - Building data platform in AWS

Measuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff

UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...

Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019

Google cloud platform

Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...

Processing genetic data at scale

Presto At Arm Treasure Data - 2019 Updates

Building a Data Pipeline using Apache Airflow (on AWS / GCP)

Syngenta's Predictive Analytics Platform for Seeds R&D

Evaluation of TPC-H on Spark and Spark SQL in ALOJA

Plus de Chijioke “CJ” Ejimuda

Revolutionizing Crtitical Infrastructure ConnectivityChijioke “CJ” Ejimuda

Internet of Energy: "Can python prevent California wildfires?"Chijioke “CJ” Ejimuda

Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsChijioke “CJ” Ejimuda

Learning from Autonomous Vehicle IndustryChijioke “CJ” Ejimuda

Could Edge Computing Lead to the End of Real Time Operating Centers?Chijioke “CJ” Ejimuda

Optimizing PV energy yield with Elasticsearch and graphQLChijioke “CJ” Ejimuda

Self Driving Directional Drilling on the EdgeChijioke “CJ” Ejimuda

hybriData Energy Services and Data ProductsChijioke “CJ” Ejimuda

hybriData IIoT Workshop for AAPG Short CourseChijioke “CJ” Ejimuda

IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityChijioke “CJ” Ejimuda

elasticsearch X reactChijioke “CJ” Ejimuda

Plus de Chijioke “CJ” Ejimuda (11)

Revolutionizing Crtitical Infrastructure Connectivity

Internet of Energy: "Can python prevent California wildfires?"

Using Deep Learning and Computer Vision to improve Corrosion risk assessments

Learning from Autonomous Vehicle Industry

Could Edge Computing Lead to the End of Real Time Operating Centers?

Optimizing PV energy yield with Elasticsearch and graphQL

Self Driving Directional Drilling on the Edge

hybriData Energy Services and Data Products

hybriData IIoT Workshop for AAPG Short Course

IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility

elasticsearch X react

Dernier

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...9953056974 Low Rate Call Girls In Saket, Delhi NCR

Hostel management system project report..pdfKamal Acharya

Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY

Design For Accessibility: Getting it right from the startQuintin Balsdon

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698

(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7Call Girls in Nagpur High Profile Call Girls

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control

data_management_and _data_science_cheat_sheet.pdfJiananWang21

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya

Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698

Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies

Dernier (20)

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...

Hostel management system project report..pdf

Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service

Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service

VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...

Design For Accessibility: Getting it right from the start

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking

(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7

Water Industry Process Automation & Control Monthly - April 2024

data_management_and _data_science_cheat_sheet.pdf

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf

Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking

Standard vs Custom Battery Packs - Decoding the Power Play

AAPG Geoscience Technology Workshop 2019

1. AAPG Geoscience Technical Workshop: Boosting reserves and recovery using ML and Analytics January 15-17, 2019 Marathon Oil Tower - Houston, TX Challenges Faced with Processing Petrophysical Big Data for Assessing Viable Opportunities CJ Ejimuda-MS, Emenike Ejimuda-PhD Hybrid Data Solutions, Los Angeles, CA, USA web:https://hybridata.us

2. CJ Ejimuda Full Stack Data Scientist / Principal Hybrid Data Solutions Mine more value leveraging AI, IIoT, Big Data Domain Expertise in Reservoir and Production Engineering ExxonMobil, Aera Energy 2 A little about us AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

3. 3 Outline ● Why process petrophysical Big Data? ● What Big Data processing challenges? ● ETL Workflow ● ETL Automation ● Conclusion ● References AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

4. 4 Why process petrophysical Big Data? AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX ● Re-evaluate old well logs for opportunities ● Conducting pre-drill analysis of offset wells ● Unable to effectively assess well / field reserves ● Challenge with inferring geological features

5. 5 What Big Data processing challenges? ● For 1 to 10 well log files? - Copying the link and pasting on the browser is straightforward - Quickly download log data - Easier to perform ETL with such amount of data AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

6. 6 What Big Data processing challenges? ● For 1 to 10 well log files? ● For 1000 well log files ??? ● Link to ~ 1000 well log data from 5 fields in excel sheets AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

7. 7 ETL Workflow ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS S3 bucket ● GOAL: Making data ready for Apache Spark ML and Tensorflow Deep Learning Pipeline AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

8. 8 ETL Automation ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web - get the links to the files - append all the extracted links to a list - account for errors - save the file AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

9. 9 ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

10. 10 ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

11. 11 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file - extract their actual data and Metadata / Header data - account for errors ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

12. 12 ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

13. 13 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS s3 bucket ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

14. 14 ETL Automation Why Apache Arrow? AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

15. 15 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS s3 bucket ● Making data ready for Apache Spark ML / Keras Deep Learning Pipeline - drop columns: 152 to 13 , drop duplicates , null / NA values, account for missing values - Split-apply-combine on grouped data by field and API: @pandas_udf - Caching dataframe ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

16. 16 ● Do not Repeat Yourself ● Apache Airflow to orchestrate ETL process - Define DAG - You may use Dummy, Sensor and Python operators (* with XCom) - Use AWS Services (S3,EMR…) / Azure / GCP service ETL Automation - Potential Next Steps AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

17. 17 ● Moving towards real time data processing: - WITSML data processing ● Apache Kafka, Apache Flink, Apache Storm, Apache Spark Conclusion AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

18. 18 References ● https://www.slideshare.net/wesm/high-performance-python-on-apache-spark ● https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html ● https://airflow.apache.org/installation.html AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

19. 19 Questions? AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

20. 20 Thank you! email: cj@hybriData.us web: https://hybriData.us AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

21. 21 Back Up AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX

AAPG Geoscience Technology Workshop 2019

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à AAPG Geoscience Technology Workshop 2019

Similaire à AAPG Geoscience Technology Workshop 2019 (20)

Plus de Chijioke “CJ” Ejimuda

Plus de Chijioke “CJ” Ejimuda (11)

Dernier

Dernier (20)

AAPG Geoscience Technology Workshop 2019