SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
AAPG Geoscience Technical Workshop:
Boosting reserves and recovery using ML and Analytics
January 15-17, 2019
Marathon Oil Tower - Houston, TX
Challenges Faced with Processing
Petrophysical Big Data for Assessing
Viable Opportunities
CJ Ejimuda-MS, Emenike Ejimuda-PhD
Hybrid Data Solutions, Los Angeles, CA, USA
web:https://hybridata.us
CJ Ejimuda
Full Stack Data Scientist / Principal
Hybrid Data Solutions
Mine more value leveraging AI, IIoT, Big Data
Domain Expertise in Reservoir and Production Engineering
ExxonMobil, Aera Energy
2
A little about us
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
3
Outline
● Why process petrophysical Big Data?
● What Big Data processing challenges?
● ETL Workflow
● ETL Automation
● Conclusion
● References
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
4
Why process petrophysical Big Data?
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
● Re-evaluate old well logs for opportunities
● Conducting pre-drill analysis of offset wells
● Unable to effectively assess well / field reserves
● Challenge with inferring geological features
5
What Big Data processing challenges?
● For 1 to 10 well log files?
- Copying the link and pasting on the browser is straightforward
- Quickly download log data
- Easier to perform ETL with such amount of data
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
6
What Big Data processing challenges?
● For 1 to 10 well log files?
● For 1000 well log files ???
● Link to ~ 1000 well log data from 5 fields in excel sheets
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
7
ETL Workflow
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS S3 bucket
● GOAL: Making data ready for Apache Spark ML and Tensorflow Deep
Learning Pipeline
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
8
ETL Automation
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
- get the links to the files
- append all the extracted links to a list
- account for errors
- save the file
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
9
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
10
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
11
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
- extract their actual data and Metadata / Header data
- account for errors
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
12
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
13
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS s3 bucket
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
14
ETL Automation
Why Apache Arrow?
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
15
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS s3 bucket
● Making data ready for Apache Spark ML / Keras Deep Learning
Pipeline
- drop columns: 152 to 13 , drop duplicates , null / NA values, account
for missing values
- Split-apply-combine on grouped data by field and API: @pandas_udf
- Caching dataframe
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
16
● Do not Repeat Yourself
● Apache Airflow to orchestrate ETL process
- Define DAG
- You may use Dummy, Sensor and Python operators (* with XCom)
- Use AWS Services (S3,EMR…) / Azure / GCP service
ETL Automation - Potential Next Steps
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
17
● Moving towards real time data processing:
- WITSML data processing
● Apache Kafka, Apache Flink, Apache Storm, Apache Spark
Conclusion
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
18
References
● https://www.slideshare.net/wesm/high-performance-python-on-apache-spark
● https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html
● https://airflow.apache.org/installation.html
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
19
Questions?
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
20
Thank you!
email: cj@hybriData.us
web: https://hybriData.us
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
21
Back Up
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX

Contenu connexe

Tendances

Bigdata 2016- projects list
Bigdata  2016- projects listBigdata  2016- projects list
Bigdata 2016- projects listNEWZEN INFOTECH
 
Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.Chijioke “CJ” Ejimuda
 
Railroad Modeling at Hadoop Scale
Railroad Modeling at Hadoop ScaleRailroad Modeling at Hadoop Scale
Railroad Modeling at Hadoop ScaleDataWorks Summit
 
Alan Crosswell Canarie20090304
Alan Crosswell  Canarie20090304Alan Crosswell  Canarie20090304
Alan Crosswell Canarie20090304Bill St. Arnaud
 
HTML Flight Scraper
HTML Flight Scraper HTML Flight Scraper
HTML Flight Scraper Anthony Kilde
 
SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"
SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"
SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"BigData_Europe
 
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FMEManaging Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FMESafe Software
 
Final Project Presentation
Final Project PresentationFinal Project Presentation
Final Project PresentationM Zubair Iqbal
 
Weather exploratory data analysis
Weather   exploratory data analysisWeather   exploratory data analysis
Weather exploratory data analysismadhucharis
 

Tendances (20)

Data Sources
Data SourcesData Sources
Data Sources
 
AWS_ac_ra_loganalysis_11
AWS_ac_ra_loganalysis_11AWS_ac_ra_loganalysis_11
AWS_ac_ra_loganalysis_11
 
Bigdata 2016- projects list
Bigdata  2016- projects listBigdata  2016- projects list
Bigdata 2016- projects list
 
Advait kulkarni
Advait kulkarniAdvait kulkarni
Advait kulkarni
 
Airline Analysis of Data Using Hadoop
Airline Analysis of Data Using HadoopAirline Analysis of Data Using Hadoop
Airline Analysis of Data Using Hadoop
 
Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.
 
Railroad Modeling at Hadoop Scale
Railroad Modeling at Hadoop ScaleRailroad Modeling at Hadoop Scale
Railroad Modeling at Hadoop Scale
 
Resume (kaushik shakkari)
Resume (kaushik shakkari)Resume (kaushik shakkari)
Resume (kaushik shakkari)
 
Alan Crosswell Canarie20090304
Alan Crosswell  Canarie20090304Alan Crosswell  Canarie20090304
Alan Crosswell Canarie20090304
 
HTML Flight Scraper
HTML Flight Scraper HTML Flight Scraper
HTML Flight Scraper
 
SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"
SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"
SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"
 
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FMEManaging Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
 
A-ONE consultants
A-ONE consultantsA-ONE consultants
A-ONE consultants
 
Shenoy resume
Shenoy resumeShenoy resume
Shenoy resume
 
DataAnalysis
DataAnalysisDataAnalysis
DataAnalysis
 
GIS #7
GIS #7GIS #7
GIS #7
 
Data Analyst Track
Data Analyst TrackData Analyst Track
Data Analyst Track
 
Time_Series_Assignment
Time_Series_AssignmentTime_Series_Assignment
Time_Series_Assignment
 
Final Project Presentation
Final Project PresentationFinal Project Presentation
Final Project Presentation
 
Weather exploratory data analysis
Weather   exploratory data analysisWeather   exploratory data analysis
Weather exploratory data analysis
 

Similaire à AAPG Geoscience Technology Workshop 2019

Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...DataBench
 
Exploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured StreamingExploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured Streamingt_ivanov
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Amazon Web Services
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoDB Database
 
Introduction to Apache Spark 2.0
Introduction to Apache Spark 2.0Introduction to Apache Spark 2.0
Introduction to Apache Spark 2.0Knoldus Inc.
 
Big Data Driven At Eway
Big Data Driven At Eway Big Data Driven At Eway
Big Data Driven At Eway Tu Pham
 
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Databricks
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
Measuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
Measuring, Quantifying, & Predicting the Cost-Accuracy TradeoffMeasuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
Measuring, Quantifying, & Predicting the Cost-Accuracy TradeoffHong-Linh Truong
 
UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...
UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...
UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...Karen Cannell
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019javier ramirez
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platformrajdeep
 
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Databricks
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scaleMark Schroering
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Syngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DSyngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DMichael Swanson
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJADataWorks Summit
 

Similaire à AAPG Geoscience Technology Workshop 2019 (20)

NAPE 2019 Presentation
NAPE 2019 PresentationNAPE 2019 Presentation
NAPE 2019 Presentation
 
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
 
Exploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured StreamingExploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured Streaming
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
 
Introduction to Apache Spark 2.0
Introduction to Apache Spark 2.0Introduction to Apache Spark 2.0
Introduction to Apache Spark 2.0
 
Big Data Driven At Eway
Big Data Driven At Eway Big Data Driven At Eway
Big Data Driven At Eway
 
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
 
Os Lonergan
Os LonerganOs Lonergan
Os Lonergan
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Measuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
Measuring, Quantifying, & Predicting the Cost-Accuracy TradeoffMeasuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
Measuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
 
UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...
UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...
UTOUG Training Days 2019 APEX Interactive Grids: API Essentials, the Stuff Yo...
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platform
 
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scale
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Syngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DSyngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&D
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
 

Plus de Chijioke “CJ” Ejimuda

Revolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure ConnectivityRevolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure ConnectivityChijioke “CJ” Ejimuda
 
Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"Chijioke “CJ” Ejimuda
 
Using Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsUsing Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsChijioke “CJ” Ejimuda
 
Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?Chijioke “CJ” Ejimuda
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLChijioke “CJ” Ejimuda
 
Self Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the EdgeSelf Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the EdgeChijioke “CJ” Ejimuda
 
hybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short CoursehybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short CourseChijioke “CJ” Ejimuda
 
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityIIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityChijioke “CJ” Ejimuda
 

Plus de Chijioke “CJ” Ejimuda (11)

Revolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure ConnectivityRevolutionizing Crtitical Infrastructure Connectivity
Revolutionizing Crtitical Infrastructure Connectivity
 
Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"Internet of Energy: "Can python prevent California wildfires?"
Internet of Energy: "Can python prevent California wildfires?"
 
Using Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessmentsUsing Deep Learning and Computer Vision to improve Corrosion risk assessments
Using Deep Learning and Computer Vision to improve Corrosion risk assessments
 
Learning from Autonomous Vehicle Industry
Learning from Autonomous Vehicle IndustryLearning from Autonomous Vehicle Industry
Learning from Autonomous Vehicle Industry
 
Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?Could Edge Computing Lead to the End of Real Time Operating Centers?
Could Edge Computing Lead to the End of Real Time Operating Centers?
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQL
 
Self Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the EdgeSelf Driving Directional Drilling on the Edge
Self Driving Directional Drilling on the Edge
 
hybriData Energy Services and Data Products
hybriData Energy Services and Data ProductshybriData Energy Services and Data Products
hybriData Energy Services and Data Products
 
hybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short CoursehybriData IIoT Workshop for AAPG Short Course
hybriData IIoT Workshop for AAPG Short Course
 
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> FacilityIIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
IIoT: The Whole Gamut - Exploration --> Drilling --> Production --> Facility
 
elasticsearch X react
elasticsearch X reactelasticsearch X react
elasticsearch X react
 

Dernier

Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 

Dernier (20)

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 

AAPG Geoscience Technology Workshop 2019

  • 1. AAPG Geoscience Technical Workshop: Boosting reserves and recovery using ML and Analytics January 15-17, 2019 Marathon Oil Tower - Houston, TX Challenges Faced with Processing Petrophysical Big Data for Assessing Viable Opportunities CJ Ejimuda-MS, Emenike Ejimuda-PhD Hybrid Data Solutions, Los Angeles, CA, USA web:https://hybridata.us
  • 2. CJ Ejimuda Full Stack Data Scientist / Principal Hybrid Data Solutions Mine more value leveraging AI, IIoT, Big Data Domain Expertise in Reservoir and Production Engineering ExxonMobil, Aera Energy 2 A little about us AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 3. 3 Outline ● Why process petrophysical Big Data? ● What Big Data processing challenges? ● ETL Workflow ● ETL Automation ● Conclusion ● References AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 4. 4 Why process petrophysical Big Data? AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX ● Re-evaluate old well logs for opportunities ● Conducting pre-drill analysis of offset wells ● Unable to effectively assess well / field reserves ● Challenge with inferring geological features
  • 5. 5 What Big Data processing challenges? ● For 1 to 10 well log files? - Copying the link and pasting on the browser is straightforward - Quickly download log data - Easier to perform ETL with such amount of data AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 6. 6 What Big Data processing challenges? ● For 1 to 10 well log files? ● For 1000 well log files ??? ● Link to ~ 1000 well log data from 5 fields in excel sheets AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 7. 7 ETL Workflow ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS S3 bucket ● GOAL: Making data ready for Apache Spark ML and Tensorflow Deep Learning Pipeline AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 8. 8 ETL Automation ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web - get the links to the files - append all the extracted links to a list - account for errors - save the file AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 9. 9 ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 10. 10 ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 11. 11 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file - extract their actual data and Metadata / Header data - account for errors ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 12. 12 ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 13. 13 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS s3 bucket ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 14. 14 ETL Automation Why Apache Arrow? AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 15. 15 ● Link to ~ 1000 well log data from 5 fields in excel sheets ● Download each well log file individually from the web ● Read log data from each file ● Enrich metadata and actual data files and save as Apache Arrow data format before loading to AWS s3 bucket ● Making data ready for Apache Spark ML / Keras Deep Learning Pipeline - drop columns: 152 to 13 , drop duplicates , null / NA values, account for missing values - Split-apply-combine on grouped data by field and API: @pandas_udf - Caching dataframe ETL Automation AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 16. 16 ● Do not Repeat Yourself ● Apache Airflow to orchestrate ETL process - Define DAG - You may use Dummy, Sensor and Python operators (* with XCom) - Use AWS Services (S3,EMR…) / Azure / GCP service ETL Automation - Potential Next Steps AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 17. 17 ● Moving towards real time data processing: - WITSML data processing ● Apache Kafka, Apache Flink, Apache Storm, Apache Spark Conclusion AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 18. 18 References ● https://www.slideshare.net/wesm/high-performance-python-on-apache-spark ● https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html ● https://airflow.apache.org/installation.html AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 19. 19 Questions? AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 20. 20 Thank you! email: cj@hybriData.us web: https://hybriData.us AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX
  • 21. 21 Back Up AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics January 16, 2019 - Houston, TX