SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
Shivnath Babu
Cofounder/CTO, Unravel
Adjunct Professor, Duke University
An AI-powered Chatbot to
Simplify Spark Performance
Management
#UnifiedAnalytics #SparkAISummit
Meet the speaker
• Cofounder/CTO at Unravel
• Adjunct Professor of Computer Science at
Duke University
• Focusing on ease-of-use and manageability of
data-intensive systems
• Recipient of US National Science Foundation
CAREER Award, three IBM Faculty Awards,
HP Labs Innovation Research Award
3#UnifiedAnalytics #SparkAISummit
What is a Chatbot?
4#UnifiedAnalytics #SparkAISummit
A program which conducts a
conversation via text or voice
5#UnifiedAnalytics #SparkAISummit
Chatbots are making a
real difference
6#UnifiedAnalytics #SparkAISummit
7#UnifiedAnalytics #SparkAISummit
Source: https://chatbottle.co/awards/2018
8#UnifiedAnalytics #SparkAISummit
TOBi
generates
2x more
ecommerce
conversions
in ½ the time
for Vodafone
9#UnifiedAnalytics #SparkAISummit
Zara
provides fast
services to
20% of Zurich
Insurance
customers
10#UnifiedAnalytics #SparkAISummit
Woebot, the
therapist chatbot,
talks to more
people in a day
than a human
therapist does in
a lifetime
Chatbots ó Spark Performance
What is the connection?
11#UnifiedAnalytics #SparkAISummit
The happy Spark user
12#UnifiedAnalytics #SparkAISummit
• Spark is fast
• Spark has easy-to-use and
comprehensive APIs
• Wow, I can do SQL, Streaming,
AI/ML, and Graphs in one system!
• Spark has a rich ecosystem
13#UnifiedAnalytics #SparkAISummit
“I have no clue
which cloud
instance type to
pick for my
workload”
“My cloud
costs are
getting out of
control. Help!”
“I have no
idea why
my app is
slow”
“My app
failed and I
don’t know
why!”
The frustrated Spark user
• Many levels of correlated stack traces
• Identifying the root cause is hard and time consuming
14
Typical app failure in Spark
#UnifiedAnalytics #SparkAISummit
15#UnifiedAnalytics #SparkAISummit
“My app
failed and I
don’t know
why!”
Spark User Spark Chatbot
“I know that sucks! Let me take
a look here …”
“I see the problem. Executors
are running out of memory”
“Setting
spark.executor.memory to 12g
fixes the problem. I have
verified it. See this run here”
“Wow.
Thanks.
You are
awesome!”
I will show you a Chatbot that
• Makes you more productive
• Saves you time and money
• Becomes your AI-driven Spark Expert in a Bot!
16#UnifiedAnalytics #SparkAISummit
My app is too slow…
17
DATA ENGINEER
#UnifiedAnalytics #SparkAISummit
I need to make it faster…
18
DATA ENGINEER
#UnifiedAnalytics #SparkAISummit
Current approach
19
1. Review Spark/YARN UI to find the app
2. Review metrics in the UI
3. Review jobs and stages associated with the app
4. Identify all containers associated with the app
6. Identify “problematic” jobs, stages, or containers
7. Guess which parameters to tune for performance
5. Review and debug container logs
9. Rinse & repeat
8. Do trial-and-error by changing a parameter setting
#UnifiedAnalytics #SparkAISummit
There has to be a better way
20#UnifiedAnalytics #SparkAISummit
What is going on here?
21#UnifiedAnalytics #SparkAISummit
22#UnifiedAnalytics #SparkAISummit
Messaging
Platform
Bot’s NLP
Layer
Bot’s Backend
Layer
Chatbot Architecture from 30000 ft
Monitoring
Data
Historic Data
&
Probe Data
Recommendation
Algorithm
Cluster Services On-premises and Cloud
App,Goal
Orchestrator
Algorithm running in bot’s backend
Xnext
Probe Algorithm
23#UnifiedAnalytics #SparkAISummit
spark.driver.cores 2
spark.executor.cores
…
10
spark.sql.shuffle.partitions 300
spark.sql.autoBroadcastJoinThres
hold
20MB
…
SKEW('orders', 'o_custId') true
spark.catalog.cacheTable(“orders") true
…
We represent this setting as vector X X
PERFORMANCE
24
Spark tuning parameters
#UnifiedAnalytics #SparkAISummit
• Find the setting of X that
best meets the goal
• Challenge: Response
surface y = ƒ(X) is
unknown
X
PERFORMANCE
Given: App + Goal
25#UnifiedAnalytics #SparkAISummit
Model the response surface as
The Gaussian Process model captures the
uncertainty in our current knowledge of the
response surface
)()()(ˆ XZXfXy t
+= b
!!
b
!!
)(Xf t
)(XZ
X
PERFORMANCE
Challenge: Response surface
y = ƒ(X) is unknown
Here:
is a regression model
is the residual captured as a
Gaussian Process
#AI7SAIS 26
ò
=
-¥=
-=
)(
)(ˆ
*
*
)())(()(
Xyp
p
Xy dpppdfpXyXEIP
We can now estimate the expected improvement EIP(X) from
doing a probe at any setting X
Gaussian Process model helps estimate EIP(X)
Improvement at any
setting X over the best
performance seen so far
Probability density
function (uncertainty
estimate)
X
Opportunity
27
PERFORMANCE
#UnifiedAnalytics #SparkAISummit
Get initial set of
monitoring data from
history or via
probes: <X1,y1>,
<X2,y2>, …, <Xn,yn>
1
Select next probe
Xnext based on all
history and probe data
available so far to
calculate the setting
with maximum expected
improvement EIP(X)
2
Bootstrap
Probe Algorithm
Until the
stopping
condition
is
reached
#AI7SAIS 28
PERFORMANCE
X
4 6 8 10 12
02468
x1
y
4 6 8 10 12
02468
x1
y
4 6 8 10 12
02468
x1
y
4 6 8 10 12
02468
x1
y
X
Performance
U
EIP(X)
U
Xnext: Do next
probe here
This approach
balances
Exploration Vs.
Exploitation
U
Exploration
U
Exploitation
29#UnifiedAnalytics #SparkAISummit
Credit: https://discovery.rsm.nl/articles/detail/130-how-to-balance-exploration-and-exploitation-in-multinational-enterprises
Data Starved
& High Uncertainty
Data Rich
& Low
Uncertainty
30
App,Goal
Xnext
Probe Algorithm
#UnifiedAnalytics #SparkAISummit
31#UnifiedAnalytics #SparkAISummit
Messaging
Platform
Bot’s NLP
Layer
Bot’s Backend
Layer
Chatbot architecture
• Many levels of correlated stack traces
• Identifying the root cause is hard and time consuming
32
Typical app failure in Spark
#UnifiedAnalytics #SparkAISummit
Let us see a better way
33#UnifiedAnalytics #SparkAISummit
What is going on here?
34#UnifiedAnalytics #SparkAISummit
35#UnifiedAnalytics #SparkAISummit
Predictive
Model
Root cause
of the failure
App failure
App’s
Container
Logs
Error
Template
Extraction
Feature
vector
36#UnifiedAnalytics #SparkAISummit
Predictive
Model
Root cause
of the failure
App failure
App’s
Container
Logs
Error
Template
Extraction
Error
Template
Extraction
Feature
vectors
Model
Learning
Container
Logs
Root cause labels
Logs from
millions of
app failures
Label
Generation
Feature
vector
Two ways to get root-cause labels
• Manual diagnosis by a domain expert
• Automatic injection of the root cause
37#UnifiedAnalytics #SparkAISummit
Unravel’s large-scale lab framework for
automatic root cause analysis
Spark and multi-tenant Workloads:
- Variety of workloads: Batch, ML, SQL, Streaming, etc.
Failures:
- Large set of root causes learned from customers &
partners. Constantly updated
- Continuously inject these root causes to train & test
models for root-cause prediction
Environment:
- Lab created on demand on cloud or on-premises
- Workloads are run and failures are injected
38#UnifiedAnalytics #SparkAISummit
Injecting “labeled” failures
Application
Execution
Application
Monitor
FAILED
Injected
Failure
Label
Labeled
Failures
• Invalid input
• Invalid memory configuration
• OOME: Java heap space
• OOME: GC overhead limit
• Container killed by YARN
• Runtime incompatibility
Injected failure examples:
• No space left on device
• Transformations inside other
transformations
• Runtime error
• Arithmetic error
• Invalid configuration settings
Input Feature
Extraction
39
Training
data
#UnifiedAnalytics #SparkAISummit
We created a Failure Taxonomy for Labels
Configuration
Errors
Data
Errors
Resource
Errors
Deployment
Errors
Root Node
Category of failure
Input Path
Not
Available
Number
Format
Exception
SparkSQL
JsonProcessing
Exception
…
Root cause labels
40#UnifiedAnalytics #SparkAISummit
Extracting input features from logs
java.lang.OutOfMemoryError: Java heap space
at
scala.reflect.ManifestFactory$$anon$9.newArray(Manifest.scala:114)
at
scala.reflect.ManifestFactory$$anon$9.newArray(Manifest.scala:112)
at …
• Extracting stack traces and error messages
• Tokenize by class names and words
Tokens example:
java.lang.OutOfmemoryError Java heap space at
scala.reflect.ManifestFactory$$anon$9.newArray(Manife
st.scala:114)
41#UnifiedAnalytics #SparkAISummit
Input feature extraction
• Bag of Words with TF-IDF
– Computes a vocabulary of words
– Uses TF-IDF to reflect importance of words in a document
• Doc2Vec
– Maps words, paragraphs, or documents to multi-dimensional vectors
– Evaluates the placement of words wrt neighboring words
– Uses a 3-layer neural network
42#Exp8SAIS
43#UnifiedAnalytics #SparkAISummit
Predictive
Model
Root cause
of the failure
App failure
App’s
Container
Logs
Error
Template
Extraction
Error
Template
Extraction
Feature
vectors
Model
Learning
Container
Logs
Root cause labels
Logs from
millions of
app failures
Label
Generation
Feature
vector
Learning the predictive model
• Shallow Learning
– Logistic Regression
– Random forests
• Deep Learning
– Neural networks
44
• Training and testing with injected failures
• Test to train data set ratio 75% to 25%
• Models: logistic regression, random forests
80
85
90
95
100
TF-IDF Doc2Vec
AccuracyScore
[%]
Logistic Regression Random Forests
#UnifiedAnalytics #SparkAISummit
45#UnifiedAnalytics #SparkAISummit
Messaging
Platform
Bot’s NLP
Layer
Bot’s Backend
Layer
The NLP element in the Chatbot
Algorithm
Compute
Storage
46#UnifiedAnalytics #SparkAISummit
Extract
the intent
Intent =
AppAutoTune
Entities: {
AppName =
‘CEO report’,
TuningGoal =
Speedup }
Invoke app
autotuning
algorithm
How can I make
CEO report query
faster Tune an app
Fetch a metric
Generate a report
Set an alert
Diagnose a failure
…
Extract entities
for the intent
Take
action
The NLP element in the Chatbot
Many use cases can be addressed
• Who are the top resource-wasting users on the cluster?
• Which app is causing contention on the cluster?
• Why is my app stuck?
• Alert me if my query fails
• Which part of my query failed?
• Kill the sales report BI app if it uses more than $25
• And many more …
47#UnifiedAnalytics #SparkAISummit
In summary
• AI-driven Spark Expert in a Bot!
– Makes you more productive
– Saves you time and money
48#UnifiedAnalytics #SparkAISummit
Sign up for a free trial, we value your feedback!
http://unraveldata.com/free-trial
And yes, we are hiring @ Unravel
shivnath@unraveldata.com
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Contenu connexe

Tendances

Tendances (20)

Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
ONNX and MLflow
ONNX and MLflowONNX and MLflow
ONNX and MLflow
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
 
Modern Data Pipelines
Modern Data PipelinesModern Data Pipelines
Modern Data Pipelines
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 

Similaire à An AI-Powered Chatbot to Simplify Apache Spark Performance Management

Connecting the Dots: Integrating Apache Spark into Production Pipelines
Connecting the Dots: Integrating Apache Spark into Production PipelinesConnecting the Dots: Integrating Apache Spark into Production Pipelines
Connecting the Dots: Integrating Apache Spark into Production Pipelines
Databricks
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
Peyman Mohajerian
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Karthik Murugesan
 

Similaire à An AI-Powered Chatbot to Simplify Apache Spark Performance Management (20)

Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
 
Internals of Speeding up PySpark with Arrow
 Internals of Speeding up PySpark with Arrow Internals of Speeding up PySpark with Arrow
Internals of Speeding up PySpark with Arrow
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksScaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber Attacks
 
Connecting the Dots: Integrating Apache Spark into Production Pipelines
Connecting the Dots: Integrating Apache Spark into Production PipelinesConnecting the Dots: Integrating Apache Spark into Production Pipelines
Connecting the Dots: Integrating Apache Spark into Production Pipelines
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Getting Started with Splunk Enterprise Hands-On
Getting Started with Splunk Enterprise Hands-OnGetting Started with Splunk Enterprise Hands-On
Getting Started with Splunk Enterprise Hands-On
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
 

Plus de Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Dernier

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Dernier (20)

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 

An AI-Powered Chatbot to Simplify Apache Spark Performance Management

  • 1. WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
  • 2. Shivnath Babu Cofounder/CTO, Unravel Adjunct Professor, Duke University An AI-powered Chatbot to Simplify Spark Performance Management #UnifiedAnalytics #SparkAISummit
  • 3. Meet the speaker • Cofounder/CTO at Unravel • Adjunct Professor of Computer Science at Duke University • Focusing on ease-of-use and manageability of data-intensive systems • Recipient of US National Science Foundation CAREER Award, three IBM Faculty Awards, HP Labs Innovation Research Award 3#UnifiedAnalytics #SparkAISummit
  • 4. What is a Chatbot? 4#UnifiedAnalytics #SparkAISummit
  • 5. A program which conducts a conversation via text or voice 5#UnifiedAnalytics #SparkAISummit
  • 6. Chatbots are making a real difference 6#UnifiedAnalytics #SparkAISummit
  • 10. 10#UnifiedAnalytics #SparkAISummit Woebot, the therapist chatbot, talks to more people in a day than a human therapist does in a lifetime
  • 11. Chatbots ó Spark Performance What is the connection? 11#UnifiedAnalytics #SparkAISummit
  • 12. The happy Spark user 12#UnifiedAnalytics #SparkAISummit • Spark is fast • Spark has easy-to-use and comprehensive APIs • Wow, I can do SQL, Streaming, AI/ML, and Graphs in one system! • Spark has a rich ecosystem
  • 13. 13#UnifiedAnalytics #SparkAISummit “I have no clue which cloud instance type to pick for my workload” “My cloud costs are getting out of control. Help!” “I have no idea why my app is slow” “My app failed and I don’t know why!” The frustrated Spark user
  • 14. • Many levels of correlated stack traces • Identifying the root cause is hard and time consuming 14 Typical app failure in Spark #UnifiedAnalytics #SparkAISummit
  • 15. 15#UnifiedAnalytics #SparkAISummit “My app failed and I don’t know why!” Spark User Spark Chatbot “I know that sucks! Let me take a look here …” “I see the problem. Executors are running out of memory” “Setting spark.executor.memory to 12g fixes the problem. I have verified it. See this run here” “Wow. Thanks. You are awesome!”
  • 16. I will show you a Chatbot that • Makes you more productive • Saves you time and money • Becomes your AI-driven Spark Expert in a Bot! 16#UnifiedAnalytics #SparkAISummit
  • 17. My app is too slow… 17 DATA ENGINEER #UnifiedAnalytics #SparkAISummit
  • 18. I need to make it faster… 18 DATA ENGINEER #UnifiedAnalytics #SparkAISummit
  • 19. Current approach 19 1. Review Spark/YARN UI to find the app 2. Review metrics in the UI 3. Review jobs and stages associated with the app 4. Identify all containers associated with the app 6. Identify “problematic” jobs, stages, or containers 7. Guess which parameters to tune for performance 5. Review and debug container logs 9. Rinse & repeat 8. Do trial-and-error by changing a parameter setting #UnifiedAnalytics #SparkAISummit
  • 20. There has to be a better way 20#UnifiedAnalytics #SparkAISummit
  • 21. What is going on here? 21#UnifiedAnalytics #SparkAISummit
  • 23. Monitoring Data Historic Data & Probe Data Recommendation Algorithm Cluster Services On-premises and Cloud App,Goal Orchestrator Algorithm running in bot’s backend Xnext Probe Algorithm 23#UnifiedAnalytics #SparkAISummit
  • 24. spark.driver.cores 2 spark.executor.cores … 10 spark.sql.shuffle.partitions 300 spark.sql.autoBroadcastJoinThres hold 20MB … SKEW('orders', 'o_custId') true spark.catalog.cacheTable(“orders") true … We represent this setting as vector X X PERFORMANCE 24 Spark tuning parameters #UnifiedAnalytics #SparkAISummit
  • 25. • Find the setting of X that best meets the goal • Challenge: Response surface y = ƒ(X) is unknown X PERFORMANCE Given: App + Goal 25#UnifiedAnalytics #SparkAISummit
  • 26. Model the response surface as The Gaussian Process model captures the uncertainty in our current knowledge of the response surface )()()(ˆ XZXfXy t += b !! b !! )(Xf t )(XZ X PERFORMANCE Challenge: Response surface y = ƒ(X) is unknown Here: is a regression model is the residual captured as a Gaussian Process #AI7SAIS 26
  • 27. ò = -¥= -= )( )(ˆ * * )())(()( Xyp p Xy dpppdfpXyXEIP We can now estimate the expected improvement EIP(X) from doing a probe at any setting X Gaussian Process model helps estimate EIP(X) Improvement at any setting X over the best performance seen so far Probability density function (uncertainty estimate) X Opportunity 27 PERFORMANCE #UnifiedAnalytics #SparkAISummit
  • 28. Get initial set of monitoring data from history or via probes: <X1,y1>, <X2,y2>, …, <Xn,yn> 1 Select next probe Xnext based on all history and probe data available so far to calculate the setting with maximum expected improvement EIP(X) 2 Bootstrap Probe Algorithm Until the stopping condition is reached #AI7SAIS 28 PERFORMANCE X
  • 29. 4 6 8 10 12 02468 x1 y 4 6 8 10 12 02468 x1 y 4 6 8 10 12 02468 x1 y 4 6 8 10 12 02468 x1 y X Performance U EIP(X) U Xnext: Do next probe here This approach balances Exploration Vs. Exploitation U Exploration U Exploitation 29#UnifiedAnalytics #SparkAISummit
  • 30. Credit: https://discovery.rsm.nl/articles/detail/130-how-to-balance-exploration-and-exploitation-in-multinational-enterprises Data Starved & High Uncertainty Data Rich & Low Uncertainty 30 App,Goal Xnext Probe Algorithm #UnifiedAnalytics #SparkAISummit
  • 32. • Many levels of correlated stack traces • Identifying the root cause is hard and time consuming 32 Typical app failure in Spark #UnifiedAnalytics #SparkAISummit
  • 33. Let us see a better way 33#UnifiedAnalytics #SparkAISummit
  • 34. What is going on here? 34#UnifiedAnalytics #SparkAISummit
  • 35. 35#UnifiedAnalytics #SparkAISummit Predictive Model Root cause of the failure App failure App’s Container Logs Error Template Extraction Feature vector
  • 36. 36#UnifiedAnalytics #SparkAISummit Predictive Model Root cause of the failure App failure App’s Container Logs Error Template Extraction Error Template Extraction Feature vectors Model Learning Container Logs Root cause labels Logs from millions of app failures Label Generation Feature vector
  • 37. Two ways to get root-cause labels • Manual diagnosis by a domain expert • Automatic injection of the root cause 37#UnifiedAnalytics #SparkAISummit
  • 38. Unravel’s large-scale lab framework for automatic root cause analysis Spark and multi-tenant Workloads: - Variety of workloads: Batch, ML, SQL, Streaming, etc. Failures: - Large set of root causes learned from customers & partners. Constantly updated - Continuously inject these root causes to train & test models for root-cause prediction Environment: - Lab created on demand on cloud or on-premises - Workloads are run and failures are injected 38#UnifiedAnalytics #SparkAISummit
  • 39. Injecting “labeled” failures Application Execution Application Monitor FAILED Injected Failure Label Labeled Failures • Invalid input • Invalid memory configuration • OOME: Java heap space • OOME: GC overhead limit • Container killed by YARN • Runtime incompatibility Injected failure examples: • No space left on device • Transformations inside other transformations • Runtime error • Arithmetic error • Invalid configuration settings Input Feature Extraction 39 Training data #UnifiedAnalytics #SparkAISummit
  • 40. We created a Failure Taxonomy for Labels Configuration Errors Data Errors Resource Errors Deployment Errors Root Node Category of failure Input Path Not Available Number Format Exception SparkSQL JsonProcessing Exception … Root cause labels 40#UnifiedAnalytics #SparkAISummit
  • 41. Extracting input features from logs java.lang.OutOfMemoryError: Java heap space at scala.reflect.ManifestFactory$$anon$9.newArray(Manifest.scala:114) at scala.reflect.ManifestFactory$$anon$9.newArray(Manifest.scala:112) at … • Extracting stack traces and error messages • Tokenize by class names and words Tokens example: java.lang.OutOfmemoryError Java heap space at scala.reflect.ManifestFactory$$anon$9.newArray(Manife st.scala:114) 41#UnifiedAnalytics #SparkAISummit
  • 42. Input feature extraction • Bag of Words with TF-IDF – Computes a vocabulary of words – Uses TF-IDF to reflect importance of words in a document • Doc2Vec – Maps words, paragraphs, or documents to multi-dimensional vectors – Evaluates the placement of words wrt neighboring words – Uses a 3-layer neural network 42#Exp8SAIS
  • 43. 43#UnifiedAnalytics #SparkAISummit Predictive Model Root cause of the failure App failure App’s Container Logs Error Template Extraction Error Template Extraction Feature vectors Model Learning Container Logs Root cause labels Logs from millions of app failures Label Generation Feature vector
  • 44. Learning the predictive model • Shallow Learning – Logistic Regression – Random forests • Deep Learning – Neural networks 44 • Training and testing with injected failures • Test to train data set ratio 75% to 25% • Models: logistic regression, random forests 80 85 90 95 100 TF-IDF Doc2Vec AccuracyScore [%] Logistic Regression Random Forests #UnifiedAnalytics #SparkAISummit
  • 45. 45#UnifiedAnalytics #SparkAISummit Messaging Platform Bot’s NLP Layer Bot’s Backend Layer The NLP element in the Chatbot Algorithm Compute Storage
  • 46. 46#UnifiedAnalytics #SparkAISummit Extract the intent Intent = AppAutoTune Entities: { AppName = ‘CEO report’, TuningGoal = Speedup } Invoke app autotuning algorithm How can I make CEO report query faster Tune an app Fetch a metric Generate a report Set an alert Diagnose a failure … Extract entities for the intent Take action The NLP element in the Chatbot
  • 47. Many use cases can be addressed • Who are the top resource-wasting users on the cluster? • Which app is causing contention on the cluster? • Why is my app stuck? • Alert me if my query fails • Which part of my query failed? • Kill the sales report BI app if it uses more than $25 • And many more … 47#UnifiedAnalytics #SparkAISummit
  • 48. In summary • AI-driven Spark Expert in a Bot! – Makes you more productive – Saves you time and money 48#UnifiedAnalytics #SparkAISummit Sign up for a free trial, we value your feedback! http://unraveldata.com/free-trial And yes, we are hiring @ Unravel shivnath@unraveldata.com
  • 49. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT