Standard vs Custom Battery Packs - Decoding the Power Play
AAPG Geoscience Technology Workshop 2019
1. AAPG Geoscience Technical Workshop:
Boosting reserves and recovery using ML and Analytics
January 15-17, 2019
Marathon Oil Tower - Houston, TX
Challenges Faced with Processing
Petrophysical Big Data for Assessing
Viable Opportunities
CJ Ejimuda-MS, Emenike Ejimuda-PhD
Hybrid Data Solutions, Los Angeles, CA, USA
web:https://hybridata.us
2. CJ Ejimuda
Full Stack Data Scientist / Principal
Hybrid Data Solutions
Mine more value leveraging AI, IIoT, Big Data
Domain Expertise in Reservoir and Production Engineering
ExxonMobil, Aera Energy
2
A little about us
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
3. 3
Outline
● Why process petrophysical Big Data?
● What Big Data processing challenges?
● ETL Workflow
● ETL Automation
● Conclusion
● References
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
4. 4
Why process petrophysical Big Data?
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
● Re-evaluate old well logs for opportunities
● Conducting pre-drill analysis of offset wells
● Unable to effectively assess well / field reserves
● Challenge with inferring geological features
5. 5
What Big Data processing challenges?
● For 1 to 10 well log files?
- Copying the link and pasting on the browser is straightforward
- Quickly download log data
- Easier to perform ETL with such amount of data
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
6. 6
What Big Data processing challenges?
● For 1 to 10 well log files?
● For 1000 well log files ???
● Link to ~ 1000 well log data from 5 fields in excel sheets
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
7. 7
ETL Workflow
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS S3 bucket
● GOAL: Making data ready for Apache Spark ML and Tensorflow Deep
Learning Pipeline
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
8. 8
ETL Automation
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
- get the links to the files
- append all the extracted links to a list
- account for errors
- save the file
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
11. 11
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
- extract their actual data and Metadata / Header data
- account for errors
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
13. 13
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS s3 bucket
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
14. 14
ETL Automation
Why Apache Arrow?
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
15. 15
● Link to ~ 1000 well log data from 5 fields in excel sheets
● Download each well log file individually from the web
● Read log data from each file
● Enrich metadata and actual data files and save as Apache Arrow data
format before loading to AWS s3 bucket
● Making data ready for Apache Spark ML / Keras Deep Learning
Pipeline
- drop columns: 152 to 13 , drop duplicates , null / NA values, account
for missing values
- Split-apply-combine on grouped data by field and API: @pandas_udf
- Caching dataframe
ETL Automation
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
16. 16
● Do not Repeat Yourself
● Apache Airflow to orchestrate ETL process
- Define DAG
- You may use Dummy, Sensor and Python operators (* with XCom)
- Use AWS Services (S3,EMR…) / Azure / GCP service
ETL Automation - Potential Next Steps
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX
17. 17
● Moving towards real time data processing:
- WITSML data processing
● Apache Kafka, Apache Flink, Apache Storm, Apache Spark
Conclusion
AAPG GTW:Boosting Reserves and Recovery Using ML and Analytics
January 16, 2019 - Houston, TX