ExxonMobil leveraged machine learning at scale using Databricks to extract insights from equipment maintenance logs and improve operations. The logs contained both structured and unstructured text data across a global fleet maintained in legacy systems, limiting traditional analysis. By ingesting and enriching over 60 million records using natural language processing, the system identified outliers, enabled capacity planning, and prioritized maintenance tasks, projected to save millions annually through more effective reliability and maintenance guidance.
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
1. NLP focused applied ML at scale for global fleet analytics at ExxonMobil
Data Driven
Guidance for
Operations
Impact
Deliver insights by using text-heavy unstructured data to answer the questions - “What, when and why it happened”
2. NLP focused applied ML at scale for global fleet analytics at ExxonMobil
Data Driven
Guidance for
Operations
Impact
Technology team‡:
Hans Brende†, Liz Curry-Logan*, Ricardo Ceslinski*, Jijo Jose*, Colby Lopez*, Chris Marchini*, Gaurav Nair*, Harsha
Namburi*, Kevin Pauli†, Sandeep Sihag† and Sumeet Trehan*
‡Team as of Dec. 2020; * ExxonMobil; † Contractor at ExxonMobil
3. Agenda
Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven decisions.
1. Business problem
2. Architecture, tech stack and impact
3. Results (one specific example)
4. Conclusion
4. Business driver: Can we use maintenance/service log of each equipment to answer “What, when and why”? This contextual information can
provide insights.
Insights - Outlier identification, capacity planning and prioritization of maintenance tasks.
NLP focused applied ML at scale for global fleet analytics at ExxonMobil
4
5. Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
Challenges
• Equipment maintenance log of our
global fleet is maintained using legacy
infrastructure and data models.
• Legacy systems limit ability to extract
insights at scale.
Legacy system limit ability to do ML at
scale
1
5
6. Challenges
• Equipment maintenance log of our
global fleet is maintained using legacy
infrastructure and data models.
• Legacy systems limit ability to extract
insights at scale.
Legacy system limit ability to do ML at
scale
1
6
• Analysis at a local level may produce
inaccurate results.
• It is critical to ingest and enrich
global fleet data.
• “Big data” is needed for honest
insights.
Ingest and enrich global data
2
Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
7. Challenges
• Equipment maintenance log of our
global fleet is maintained using legacy
infrastructure and data models.
• Legacy systems limit ability to extract
insights at scale.
Legacy system limit ability to do ML at
scale
• Analysis at a local level may produce
inaccurate results.
• It is critical to ingest and enrich
global fleet data.
• “Big data” is needed for honest
insights.
Ingest and enrich global data
• Inconsistent data quality. Data input is
not comparable. Example:
• Large variability in how we enter
information in the
maintenance/service logs:
“Replace the TX – it is corrorde”.)
• Data is disconnected.
Data quality
2 3
1
7
Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
8. Solution
NLP focused applied ML product:
• Ingests batch and streaming data (operational ML pipeline) from legacy systems.
• Sifts through 60 MM+ records (growing nonlinearly) to extract insights using
NLP.
• Example: Given maintenance log such as “Replace the TX – it is corrorde”,
answer questions such as what happened, why it happened and when it
happened.
8
9. Architecture
Store
Azure Data Factory
Batch pipeline Orchestration
Azure
ML
Serve
Prep and train
Ingest
Frontend
QLik
Streaming data
Model Serving
Batch data
Azure Event Hubs
Azure Data Explorer
Real-Time Analysis
Data
Engineering
Azure Databricks
Data Science & Machine
Learning
Azure Databricks
+
Model Repository &
Deployment
9
10. • Model development
• Applied ML scientists use notebooks and common utilities to train and publish models to the MLflow model
registry.
• ML pipeline development
• ML engineers create building blocks (discrete steps) that transform source data to target data, utilizing
common utilities as well as the models published by the data scientists.
• ML engineers develop common utilities to perform data and model I/O, to reduce boilerplate and promote
standardization and reusability.
• Pipeline runtime
• The entire ELO pipeline is represented in Azure Data Factory (ADF) as a DAG of pipeline steps.
• The ADF pipeline is triggered on a daily schedule.
Model development, ML pipeline setup and pipeline runtime.
ELO architecture
10
14. Agenda
Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven decisions.
1. Business problem
2. Architecture, tech stack and impact
3. Results (one specific example)
4. Conclusion
15. Input data
1. The xyz pump has failed
2. P-1234 to the seal is down
3. Replace the TX – it is corrorde
4. t/s/r old rod
5. Look broke – maybe fix
6. c/o old seal on v/v
7. 2 seal on psv-123 fail
….
….
REGEX Cleanup & Tokenization
1. [the, xyz, pump, has, failed]
2. [p , to, the, seal, is, down]
3. [replace, the, tx, it, is,
corroded]
4. [tsr, old, rod]
5. [look, broke, maybe, fix]
6. [co, old, seal, on, vv]
7. [2, seal, on, psv, fail]
….
….
FastText
Ingestion
NLP
Hybrid of unsupervised and supervised learning. Pipeline involves data cleaning, tokenization, feature vector generation (using
FastText) followed by deep learning classifier.
Feature vector generation using FastText for a sentence with N
ngram features (x1, x2, x3, ….., xN-1, xN). The features are embedded
and averaged to form the hidden variable
Output
Hidden layers
x1 x2 xN
………………..
15
16. 1. Generate word embeddings for input
text by appending the feature vectors
for each token. Padding with zero is
followed to handle input text of
different length.
2. Multiclass classification using deep
neural network.
3. Switch to linguistic (unsupervised
model) if the predictions do not have
enough confidence.
4. If step 7 is initiated, the predictions are
used for reinforcement learning to
update training steps on the deep
neural net.
Step Overview
NLP Workflow
16
FastText
Word
Embeddings
Deep Neural
Net for
Predictions
Confidence
> 95% or
Unidentified
prediction?
FastText
Training
Display Output
from Deep
Neural Net
Display Output
from Linguistic
Model
Work
Order
Input
Deep Neural
Net training
Update
Training
Step 1 Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
17. Linguistic model attempts to understand failure items like a human.
• It learns what words actually mean from seeing them used in the past (such as TX and P-1234).
• It understands the subject of a sentence based on parts of speech (verbs, adjectives, etc.).
• It understands dependencies (how positions of words in a sentence relate to each other).
• It understands what verbs indicate a failure item; It also understands misspellings & short-hand notion.
Simple Example
Input Text Prediction
The TX on the P-1234 has failed and so has the motor Pump Transmitter, Motor
1. Semantics – it knows that TX means
transmitter as it has seen both
words used in similar context. It
knows P-1234 means pump as it
has seen both words used in similar
context.
2. Context – the linguistic model
identifies nouns, prepositions
(which link two parts of speech),
verbs (action taken on noun) and
conjunctions, which identify two
nouns that are talked about in the
same manner.
Linguistic (Unsupervised) Model
17
18. Conclusion
1. Leveraged Databricks to build and ship operational ML pipeline and overcome limitations of legacy
infrastructure and data models.
• Scaled application horizontally using Databricks.
• ML model training and serving done using MLflow.
2. Product includes extracting contextual information (what, when and why) from structured and unstructured
text. The contextual information together generate insights.
3. The extracted insights enabled outlier identification, capacity planning, maintenance prioritization etc. The
data driven guidance is projected to help save millions of dollars on annual basis.
18
19. Abstract/Summary
Equipment maintenance log of the global fleet is traditionally maintained using legacy infrastructure and data
models, which limit the ability to extract insights at scale. However, to impact the bottom line, it is critical to ingest
and enrich global fleet data to generate data driven guidance for operations. The impact of such insights is
projected to be millions of dollars per annum.
To this end, we leverage Databricks to perform machine learning at scale, including ingesting (structured and
unstructured data) from legacy systems, and then sifting through millions of nonlinearly growing records to
extract insights using NLP. The insights enable outlier identification, capacity planning, prioritization of cost
reduction opportunities, and the discovery process for cross-functional teams.
19
20. • Python and any related marks are trademarks are of the Python Software Foundation
• Pytorch and any related marks are trademarks are of Facebook, Inc.
• Tensorflow - TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
• Docker and any related marks are trademarks are of Docker, Inc
• Parquet and any related marks are trademarks are of Apache Software Foundation
• Snowflake and any related marks are trademarks are of Snowflake Inc.
• Databricks and any related marks are trademarks are of Databricks
• Azure and any related marks are trademarks are of Microsoft Corporation
• Scikit Learn is trademarks are of Scikit-learn consortium
• Numpy and any related marks are trademarks are of The SciPy community
• pandas is trademark for Python Pandas Package released under BSD 3 license
• Dask and any related marks are trademarks are of Anaconda, Inc. and contributors Revision 399c843d.
Logos
20