SlideShare a Scribd company logo
1 of 20
Download to read offline
NLP focused applied ML at scale for global fleet analytics at ExxonMobil
Data Driven
Guidance for
Operations
Impact
Deliver insights by using text-heavy unstructured data to answer the questions - “What, when and why it happened”
NLP focused applied ML at scale for global fleet analytics at ExxonMobil
Data Driven
Guidance for
Operations
Impact
Technology team‡:
Hans Brende†, Liz Curry-Logan*, Ricardo Ceslinski*, Jijo Jose*, Colby Lopez*, Chris Marchini*, Gaurav Nair*, Harsha
Namburi*, Kevin Pauli†, Sandeep Sihag† and Sumeet Trehan*
‡Team as of Dec. 2020; * ExxonMobil; † Contractor at ExxonMobil
Agenda
Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven decisions.
1. Business problem
2. Architecture, tech stack and impact
3. Results (one specific example)
4. Conclusion
Business driver: Can we use maintenance/service log of each equipment to answer “What, when and why”? This contextual information can
provide insights.
Insights - Outlier identification, capacity planning and prioritization of maintenance tasks.
NLP focused applied ML at scale for global fleet analytics at ExxonMobil
4
Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
Challenges
• Equipment maintenance log of our
global fleet is maintained using legacy
infrastructure and data models.
• Legacy systems limit ability to extract
insights at scale.
Legacy system limit ability to do ML at
scale
1
5
Challenges
• Equipment maintenance log of our
global fleet is maintained using legacy
infrastructure and data models.
• Legacy systems limit ability to extract
insights at scale.
Legacy system limit ability to do ML at
scale
1
6
• Analysis at a local level may produce
inaccurate results.
• It is critical to ingest and enrich
global fleet data.
• “Big data” is needed for honest
insights.
Ingest and enrich global data
2
Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
Challenges
• Equipment maintenance log of our
global fleet is maintained using legacy
infrastructure and data models.
• Legacy systems limit ability to extract
insights at scale.
Legacy system limit ability to do ML at
scale
• Analysis at a local level may produce
inaccurate results.
• It is critical to ingest and enrich
global fleet data.
• “Big data” is needed for honest
insights.
Ingest and enrich global data
• Inconsistent data quality. Data input is
not comparable. Example:
• Large variability in how we enter
information in the
maintenance/service logs:
“Replace the TX – it is corrorde”.)
• Data is disconnected.
Data quality
2 3
1
7
Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
Solution
NLP focused applied ML product:
• Ingests batch and streaming data (operational ML pipeline) from legacy systems.
• Sifts through 60 MM+ records (growing nonlinearly) to extract insights using
NLP.
• Example: Given maintenance log such as “Replace the TX – it is corrorde”,
answer questions such as what happened, why it happened and when it
happened.
8
Architecture
Store
Azure Data Factory
Batch pipeline Orchestration
Azure
ML
Serve
Prep and train
Ingest
Frontend
QLik
Streaming data
Model Serving
Batch data
Azure Event Hubs
Azure Data Explorer
Real-Time Analysis
Data
Engineering
Azure Databricks
Data Science & Machine
Learning
Azure Databricks
+
Model Repository &
Deployment
9
• Model development
• Applied ML scientists use notebooks and common utilities to train and publish models to the MLflow model
registry.
• ML pipeline development
• ML engineers create building blocks (discrete steps) that transform source data to target data, utilizing
common utilities as well as the models published by the data scientists.
• ML engineers develop common utilities to perform data and model I/O, to reduce boilerplate and promote
standardization and reusability.
• Pipeline runtime
• The entire ELO pipeline is represented in Azure Data Factory (ADF) as a DAG of pipeline steps.
• The ADF pipeline is triggered on a daily schedule.
Model development, ML pipeline setup and pipeline runtime.
ELO architecture
10
11
Model development
12
ML pipeline development
13
Operational ML pipeline at runtime
Agenda
Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven decisions.
1. Business problem
2. Architecture, tech stack and impact
3. Results (one specific example)
4. Conclusion
Input data
1. The xyz pump has failed
2. P-1234 to the seal is down
3. Replace the TX – it is corrorde
4. t/s/r old rod
5. Look broke – maybe fix
6. c/o old seal on v/v
7. 2 seal on psv-123 fail
….
….
REGEX Cleanup & Tokenization
1. [the, xyz, pump, has, failed]
2. [p , to, the, seal, is, down]
3. [replace, the, tx, it, is,
corroded]
4. [tsr, old, rod]
5. [look, broke, maybe, fix]
6. [co, old, seal, on, vv]
7. [2, seal, on, psv, fail]
….
….
FastText
Ingestion
NLP
Hybrid of unsupervised and supervised learning. Pipeline involves data cleaning, tokenization, feature vector generation (using
FastText) followed by deep learning classifier.
Feature vector generation using FastText for a sentence with N
ngram features (x1, x2, x3, ….., xN-1, xN). The features are embedded
and averaged to form the hidden variable
Output
Hidden layers
x1 x2 xN
………………..
15
1. Generate word embeddings for input
text by appending the feature vectors
for each token. Padding with zero is
followed to handle input text of
different length.
2. Multiclass classification using deep
neural network.
3. Switch to linguistic (unsupervised
model) if the predictions do not have
enough confidence.
4. If step 7 is initiated, the predictions are
used for reinforcement learning to
update training steps on the deep
neural net.
Step Overview
NLP Workflow
16
FastText
Word
Embeddings
Deep Neural
Net for
Predictions
Confidence
> 95% or
Unidentified
prediction?
FastText
Training
Display Output
from Deep
Neural Net
Display Output
from Linguistic
Model
Work
Order
Input
Deep Neural
Net training
Update
Training
Step 1 Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Linguistic model attempts to understand failure items like a human.
• It learns what words actually mean from seeing them used in the past (such as TX and P-1234).
• It understands the subject of a sentence based on parts of speech (verbs, adjectives, etc.).
• It understands dependencies (how positions of words in a sentence relate to each other).
• It understands what verbs indicate a failure item; It also understands misspellings & short-hand notion.
Simple Example
Input Text Prediction
The TX on the P-1234 has failed and so has the motor Pump Transmitter, Motor
1. Semantics – it knows that TX means
transmitter as it has seen both
words used in similar context. It
knows P-1234 means pump as it
has seen both words used in similar
context.
2. Context – the linguistic model
identifies nouns, prepositions
(which link two parts of speech),
verbs (action taken on noun) and
conjunctions, which identify two
nouns that are talked about in the
same manner.
Linguistic (Unsupervised) Model
17
Conclusion
1. Leveraged Databricks to build and ship operational ML pipeline and overcome limitations of legacy
infrastructure and data models.
• Scaled application horizontally using Databricks.
• ML model training and serving done using MLflow.
2. Product includes extracting contextual information (what, when and why) from structured and unstructured
text. The contextual information together generate insights.
3. The extracted insights enabled outlier identification, capacity planning, maintenance prioritization etc. The
data driven guidance is projected to help save millions of dollars on annual basis.
18
Abstract/Summary
Equipment maintenance log of the global fleet is traditionally maintained using legacy infrastructure and data
models, which limit the ability to extract insights at scale. However, to impact the bottom line, it is critical to ingest
and enrich global fleet data to generate data driven guidance for operations. The impact of such insights is
projected to be millions of dollars per annum.
To this end, we leverage Databricks to perform machine learning at scale, including ingesting (structured and
unstructured data) from legacy systems, and then sifting through millions of nonlinearly growing records to
extract insights using NLP. The insights enable outlier identification, capacity planning, prioritization of cost
reduction opportunities, and the discovery process for cross-functional teams.
19
• Python and any related marks are trademarks are of the Python Software Foundation
• Pytorch and any related marks are trademarks are of Facebook, Inc.
• Tensorflow - TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
• Docker and any related marks are trademarks are of Docker, Inc
• Parquet and any related marks are trademarks are of Apache Software Foundation
• Snowflake and any related marks are trademarks are of Snowflake Inc.
• Databricks and any related marks are trademarks are of Databricks
• Azure and any related marks are trademarks are of Microsoft Corporation
• Scikit Learn is trademarks are of Scikit-learn consortium
• Numpy and any related marks are trademarks are of The SciPy community
• pandas is trademark for Python Pandas Package released under BSD 3 license
• Dask and any related marks are trademarks are of Anaconda, Inc. and contributors Revision 399c843d.
Logos
20

More Related Content

What's hot

Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
Databricks
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
Databricks
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Databricks
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and Fugue
Databricks
 

What's hot (20)

Semantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowSemantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflow
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
 
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Gender Prediction with Databricks AutoML Pipeline
Gender Prediction with Databricks AutoML PipelineGender Prediction with Databricks AutoML Pipeline
Gender Prediction with Databricks AutoML Pipeline
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflow
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
 
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
 
Operationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer NoriOperationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer Nori
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
 
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkDistributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and Fugue
 

Similar to NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil

Shanish_SQL_PLSQL_Profile
Shanish_SQL_PLSQL_ProfileShanish_SQL_PLSQL_Profile
Shanish_SQL_PLSQL_Profile
Shanish Jain
 
Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016
Sanjay Mane
 

Similar to NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil (20)

Database performance management
Database performance managementDatabase performance management
Database performance management
 
Shanish_SQL_PLSQL_Profile
Shanish_SQL_PLSQL_ProfileShanish_SQL_PLSQL_Profile
Shanish_SQL_PLSQL_Profile
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
Veera Narayanaswamy_PLSQL_Profile
Veera Narayanaswamy_PLSQL_ProfileVeera Narayanaswamy_PLSQL_Profile
Veera Narayanaswamy_PLSQL_Profile
 
Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016
 
Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
oracle-complex-event-processing-066421
oracle-complex-event-processing-066421oracle-complex-event-processing-066421
oracle-complex-event-processing-066421
 
Siraj_DBA
Siraj_DBASiraj_DBA
Siraj_DBA
 
Siraj_DBA
Siraj_DBASiraj_DBA
Siraj_DBA
 
cchoubey_resume
cchoubey_resumecchoubey_resume
cchoubey_resume
 
Daya_DBA
Daya_DBADaya_DBA
Daya_DBA
 
Resume_Raj Ganesh Subramanian
Resume_Raj Ganesh SubramanianResume_Raj Ganesh Subramanian
Resume_Raj Ganesh Subramanian
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
 
Migrating Data Warehouse Solutions from Oracle to non-Oracle Databases
Migrating Data Warehouse Solutions from Oracle to non-Oracle DatabasesMigrating Data Warehouse Solutions from Oracle to non-Oracle Databases
Migrating Data Warehouse Solutions from Oracle to non-Oracle Databases
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
 
Resume new no
Resume new noResume new no
Resume new no
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
Navendu_Resume
Navendu_ResumeNavendu_Resume
Navendu_Resume
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Recently uploaded (20)

Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil

  • 1. NLP focused applied ML at scale for global fleet analytics at ExxonMobil Data Driven Guidance for Operations Impact Deliver insights by using text-heavy unstructured data to answer the questions - “What, when and why it happened”
  • 2. NLP focused applied ML at scale for global fleet analytics at ExxonMobil Data Driven Guidance for Operations Impact Technology team‡: Hans Brende†, Liz Curry-Logan*, Ricardo Ceslinski*, Jijo Jose*, Colby Lopez*, Chris Marchini*, Gaurav Nair*, Harsha Namburi*, Kevin Pauli†, Sandeep Sihag† and Sumeet Trehan* ‡Team as of Dec. 2020; * ExxonMobil; † Contractor at ExxonMobil
  • 3. Agenda Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven decisions. 1. Business problem 2. Architecture, tech stack and impact 3. Results (one specific example) 4. Conclusion
  • 4. Business driver: Can we use maintenance/service log of each equipment to answer “What, when and why”? This contextual information can provide insights. Insights - Outlier identification, capacity planning and prioritization of maintenance tasks. NLP focused applied ML at scale for global fleet analytics at ExxonMobil 4
  • 5. Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors. Challenges • Equipment maintenance log of our global fleet is maintained using legacy infrastructure and data models. • Legacy systems limit ability to extract insights at scale. Legacy system limit ability to do ML at scale 1 5
  • 6. Challenges • Equipment maintenance log of our global fleet is maintained using legacy infrastructure and data models. • Legacy systems limit ability to extract insights at scale. Legacy system limit ability to do ML at scale 1 6 • Analysis at a local level may produce inaccurate results. • It is critical to ingest and enrich global fleet data. • “Big data” is needed for honest insights. Ingest and enrich global data 2 Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
  • 7. Challenges • Equipment maintenance log of our global fleet is maintained using legacy infrastructure and data models. • Legacy systems limit ability to extract insights at scale. Legacy system limit ability to do ML at scale • Analysis at a local level may produce inaccurate results. • It is critical to ingest and enrich global fleet data. • “Big data” is needed for honest insights. Ingest and enrich global data • Inconsistent data quality. Data input is not comparable. Example: • Large variability in how we enter information in the maintenance/service logs: “Replace the TX – it is corrorde”.) • Data is disconnected. Data quality 2 3 1 7 Leveraging global data to enhance maintenance effectiveness and reliability is complicated by several factors.
  • 8. Solution NLP focused applied ML product: • Ingests batch and streaming data (operational ML pipeline) from legacy systems. • Sifts through 60 MM+ records (growing nonlinearly) to extract insights using NLP. • Example: Given maintenance log such as “Replace the TX – it is corrorde”, answer questions such as what happened, why it happened and when it happened. 8
  • 9. Architecture Store Azure Data Factory Batch pipeline Orchestration Azure ML Serve Prep and train Ingest Frontend QLik Streaming data Model Serving Batch data Azure Event Hubs Azure Data Explorer Real-Time Analysis Data Engineering Azure Databricks Data Science & Machine Learning Azure Databricks + Model Repository & Deployment 9
  • 10. • Model development • Applied ML scientists use notebooks and common utilities to train and publish models to the MLflow model registry. • ML pipeline development • ML engineers create building blocks (discrete steps) that transform source data to target data, utilizing common utilities as well as the models published by the data scientists. • ML engineers develop common utilities to perform data and model I/O, to reduce boilerplate and promote standardization and reusability. • Pipeline runtime • The entire ELO pipeline is represented in Azure Data Factory (ADF) as a DAG of pipeline steps. • The ADF pipeline is triggered on a daily schedule. Model development, ML pipeline setup and pipeline runtime. ELO architecture 10
  • 14. Agenda Built and ship product (equipment lifecycle optimization or ELO) that leverages data to make smart data-driven decisions. 1. Business problem 2. Architecture, tech stack and impact 3. Results (one specific example) 4. Conclusion
  • 15. Input data 1. The xyz pump has failed 2. P-1234 to the seal is down 3. Replace the TX – it is corrorde 4. t/s/r old rod 5. Look broke – maybe fix 6. c/o old seal on v/v 7. 2 seal on psv-123 fail …. …. REGEX Cleanup & Tokenization 1. [the, xyz, pump, has, failed] 2. [p , to, the, seal, is, down] 3. [replace, the, tx, it, is, corroded] 4. [tsr, old, rod] 5. [look, broke, maybe, fix] 6. [co, old, seal, on, vv] 7. [2, seal, on, psv, fail] …. …. FastText Ingestion NLP Hybrid of unsupervised and supervised learning. Pipeline involves data cleaning, tokenization, feature vector generation (using FastText) followed by deep learning classifier. Feature vector generation using FastText for a sentence with N ngram features (x1, x2, x3, ….., xN-1, xN). The features are embedded and averaged to form the hidden variable Output Hidden layers x1 x2 xN ……………….. 15
  • 16. 1. Generate word embeddings for input text by appending the feature vectors for each token. Padding with zero is followed to handle input text of different length. 2. Multiclass classification using deep neural network. 3. Switch to linguistic (unsupervised model) if the predictions do not have enough confidence. 4. If step 7 is initiated, the predictions are used for reinforcement learning to update training steps on the deep neural net. Step Overview NLP Workflow 16 FastText Word Embeddings Deep Neural Net for Predictions Confidence > 95% or Unidentified prediction? FastText Training Display Output from Deep Neural Net Display Output from Linguistic Model Work Order Input Deep Neural Net training Update Training Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7
  • 17. Linguistic model attempts to understand failure items like a human. • It learns what words actually mean from seeing them used in the past (such as TX and P-1234). • It understands the subject of a sentence based on parts of speech (verbs, adjectives, etc.). • It understands dependencies (how positions of words in a sentence relate to each other). • It understands what verbs indicate a failure item; It also understands misspellings & short-hand notion. Simple Example Input Text Prediction The TX on the P-1234 has failed and so has the motor Pump Transmitter, Motor 1. Semantics – it knows that TX means transmitter as it has seen both words used in similar context. It knows P-1234 means pump as it has seen both words used in similar context. 2. Context – the linguistic model identifies nouns, prepositions (which link two parts of speech), verbs (action taken on noun) and conjunctions, which identify two nouns that are talked about in the same manner. Linguistic (Unsupervised) Model 17
  • 18. Conclusion 1. Leveraged Databricks to build and ship operational ML pipeline and overcome limitations of legacy infrastructure and data models. • Scaled application horizontally using Databricks. • ML model training and serving done using MLflow. 2. Product includes extracting contextual information (what, when and why) from structured and unstructured text. The contextual information together generate insights. 3. The extracted insights enabled outlier identification, capacity planning, maintenance prioritization etc. The data driven guidance is projected to help save millions of dollars on annual basis. 18
  • 19. Abstract/Summary Equipment maintenance log of the global fleet is traditionally maintained using legacy infrastructure and data models, which limit the ability to extract insights at scale. However, to impact the bottom line, it is critical to ingest and enrich global fleet data to generate data driven guidance for operations. The impact of such insights is projected to be millions of dollars per annum. To this end, we leverage Databricks to perform machine learning at scale, including ingesting (structured and unstructured data) from legacy systems, and then sifting through millions of nonlinearly growing records to extract insights using NLP. The insights enable outlier identification, capacity planning, prioritization of cost reduction opportunities, and the discovery process for cross-functional teams. 19
  • 20. • Python and any related marks are trademarks are of the Python Software Foundation • Pytorch and any related marks are trademarks are of Facebook, Inc. • Tensorflow - TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc. • Docker and any related marks are trademarks are of Docker, Inc • Parquet and any related marks are trademarks are of Apache Software Foundation • Snowflake and any related marks are trademarks are of Snowflake Inc. • Databricks and any related marks are trademarks are of Databricks • Azure and any related marks are trademarks are of Microsoft Corporation • Scikit Learn is trademarks are of Scikit-learn consortium • Numpy and any related marks are trademarks are of The SciPy community • pandas is trademark for Python Pandas Package released under BSD 3 license • Dask and any related marks are trademarks are of Anaconda, Inc. and contributors Revision 399c843d. Logos 20