SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Andreu Mora, Adyen
Time series forecasting
and monitoring with
Apache Spark and
ElasticSearch
#UnifiedDataAnalytics #SparkAISummit
Adyen
Payments Processor
Tech company
International customers (aka merchants)
Omnichannel
Back in the day…
The legacy monitor was based on a SQL query that
would compute an average for the hour of the week
and compare to a threshold.
Doesn’t quite work:
• Generates loads of False Positives
• It was fairly trimmed down: top merchants.
Reduce False
Positives
Catch anomalies
Do that at scale
Harness the
detection
performance
Connect to a live
platform
OK, but
What is an anomaly?
No luxury of a labelled dataset, divergence 

of opinions.
Connecting to a live platform without 

ML deployment hooks ready.
We were working on MLflow but not there yet.
No standard for timeseries forecasting at scale
With spark, several choices.
Considerations when dealing with Big Data
Big Technology
Leverage on mature Tech to
solve the problem (hello Spark).
Big diversity
Many different topologies for
our merchants and yet one
algorithm to track them all.
Big consequences
1000 merchants * 10 min * 95%
accuracy = 50400 emails/week
Big Data Platform
Volumes Predictions
Big Data Platform
Volumes Predictions
TimeSeries Ecosystem
Flint
Spark-ts
FB Prophet
Stats models
TimeSeries Ecosystem
Flint
Spark-ts
FB Prophet
Stats models
Data size
consideration
1 year @ 1 min @ double64 = 4.2 mb
Scoring in Java
While working on a fully functional engine to
deploy ML models based on MLflow.
Launch fast and iterate!
Transporting the model
The model transported for tens of thousands of
accounts needs to be lightweight.
Harness the maths
No using blackboxed models, equations need to
be understood and replicated in Java.
Needs to perform fast
Score and decide whether our seen traffic form
ElasticSearch is actually anomalous on the ms
scale.
Big Data Platform
Volumes Predictions
Big Data Platform
Volumes
Model
Coefficients
Fourier
components
Would not
optimise the
business cycles
ARIMA
Not perfect for
picking up
seasonality
Isolation Forests
Great for
multidimensional
data, not so much
for time series.
Autoencoders
Good luck
transporting the
model for each
merchant.
XGBM
Noice, but score
that in Java.
Research stage
Understand a problem and build a solution, decide what’s best.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy.
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Ridge Regression
Makes scoring in Java nice and
kinda easy. easy
Residuals
Confidence intervals modelled
through quantile regression of
observed values.
Events
Recurrent or one-off events are
shown to the model.
Piece-wise linear trends
Breaks down the signal into pieces
and learn the last trends.
Gaussian Basis Functions
Allow us to teach the model to
understand business cycles
The model
Discover anomalous behaviour
based on a probability p.
Pre-sampling
Allow us to sample and bucketize
the merchants to adequate
intervals.
Trendspotting
Estimating hinges and trends and offering it as
subproduct to Account Managers for evaluating
the low variations of volume.
Train set: 90 days
Test set: 7 days
Real volume
Predicted volume
95% confidence
How do the predictions look like?
Missed event
The implementation
on Spark
How did we get there, on the Spark side.
Reusability
Overloads of scikit-learns and pandas allow us to
ensure reusability
Cross-validation
Ensure the best tuning through tuning of
hyperparameters.
Scalability
Using Spark’s map-reduce paradigm we totally
control the computational performances.
SeasonalEstimator(BaseEstimator,RegressorMixin)
Input daily time series —> {t:[…], v:[…]}
Collect to list —> [{t:[…], v:[…]}]
Hinges and Hyperparameters
Distribute UDF
Making it happen at scale
Cross-validation
F4-sampling score: favours higher sampling
considering classical precision and recall.
Custom cv folds split TimeSeriesWeekSplit get the
sense of the business cycle
The output
Harnessing 

the prediction
performance
Enabling canary
roll-out based on
scores
Overcoming
unsupervised
learning
Alarm rate and synthetic recall allow us to
know for each case how many alarms would
have been captured and raised, even without
having a labelled dataset.
Trade-off alarm
rates and recall
We provide a number of choices (95%, 97%,
99% probability and completely profile what to
expect in terms of anomalies.
The model payload
Go Live
Houston? Houston? …
Grafana dashboard
So we saw this on the data
’You don’t call us, we call you’
Post on Medium
https://medium.com/adyen
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Contenu connexe

Tendances

4 C's of Investment Process - Cloning, Checklist, Capital Allocation, Checkout
4 C's of Investment Process - Cloning, Checklist, Capital Allocation, Checkout4 C's of Investment Process - Cloning, Checklist, Capital Allocation, Checkout
4 C's of Investment Process - Cloning, Checklist, Capital Allocation, Checkoutperfectresearch
 
Random Number Generation
Random Number GenerationRandom Number Generation
Random Number GenerationRaj Bhatt
 
Evaluating classification algorithms
Evaluating classification algorithmsEvaluating classification algorithms
Evaluating classification algorithmsUtkarsh Sharma
 
Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoningShiwani Gupta
 
"Trading Strategies That Are Designed Not Fitted" by Robert Carver, Independe...
"Trading Strategies That Are Designed Not Fitted" by Robert Carver, Independe..."Trading Strategies That Are Designed Not Fitted" by Robert Carver, Independe...
"Trading Strategies That Are Designed Not Fitted" by Robert Carver, Independe...Quantopian
 
Wavelet Multi-resolution Analysis of High Frequency FX Rates
Wavelet Multi-resolution Analysis of High Frequency FX RatesWavelet Multi-resolution Analysis of High Frequency FX Rates
Wavelet Multi-resolution Analysis of High Frequency FX RatesaiQUANT
 
Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Ravindra Raju Kolahalam
 
Project time series ppt
Project time series pptProject time series ppt
Project time series pptamar patil
 

Tendances (16)

Q-learning
Q-learningQ-learning
Q-learning
 
4 C's of Investment Process - Cloning, Checklist, Capital Allocation, Checkout
4 C's of Investment Process - Cloning, Checklist, Capital Allocation, Checkout4 C's of Investment Process - Cloning, Checklist, Capital Allocation, Checkout
4 C's of Investment Process - Cloning, Checklist, Capital Allocation, Checkout
 
Deadlock Slides
Deadlock SlidesDeadlock Slides
Deadlock Slides
 
Discrete event-simulation
Discrete event-simulationDiscrete event-simulation
Discrete event-simulation
 
Process
ProcessProcess
Process
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Random Number Generation
Random Number GenerationRandom Number Generation
Random Number Generation
 
Semaphores
SemaphoresSemaphores
Semaphores
 
Evaluating classification algorithms
Evaluating classification algorithmsEvaluating classification algorithms
Evaluating classification algorithms
 
Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoning
 
Distributed Systems Naming
Distributed Systems NamingDistributed Systems Naming
Distributed Systems Naming
 
"Trading Strategies That Are Designed Not Fitted" by Robert Carver, Independe...
"Trading Strategies That Are Designed Not Fitted" by Robert Carver, Independe..."Trading Strategies That Are Designed Not Fitted" by Robert Carver, Independe...
"Trading Strategies That Are Designed Not Fitted" by Robert Carver, Independe...
 
Wavelet Multi-resolution Analysis of High Frequency FX Rates
Wavelet Multi-resolution Analysis of High Frequency FX RatesWavelet Multi-resolution Analysis of High Frequency FX Rates
Wavelet Multi-resolution Analysis of High Frequency FX Rates
 
Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]
 
Project time series ppt
Project time series pptProject time series ppt
Project time series ppt
 
Time series forecasting
Time series forecastingTime series forecasting
Time series forecasting
 

Similaire à Scalable Time Series Forecasting and Monitoring using Apache Spark and ElasticSearch at Adye

Risk Analysis in the Financial Services Industry
Risk Analysis in the Financial Services IndustryRisk Analysis in the Financial Services Industry
Risk Analysis in the Financial Services IndustryRevolution Analytics
 
Modeling at scale in systematic trading
Modeling at scale in systematic tradingModeling at scale in systematic trading
Modeling at scale in systematic tradingSigOpt
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
 
BIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxBIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxLSURYAPRAKASHREDDY
 
Auto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADBAuto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADBDatabricks
 
Being Reactive with Spring
Being Reactive with SpringBeing Reactive with Spring
Being Reactive with SpringKris Galea
 
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)Elad Rosenheim
 
Stop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Stop Flying Blind! Quantifying Risk with Monte Carlo SimulationStop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Stop Flying Blind! Quantifying Risk with Monte Carlo SimulationSam McAfee
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilitiesAllan D. Butler
 
Neotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys_Partner
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxKiranKumar918931
 
Stock Market Trends Prediction after Earning Release.pptx
Stock Market Trends Prediction after Earning Release.pptxStock Market Trends Prediction after Earning Release.pptx
Stock Market Trends Prediction after Earning Release.pptxChen Qian
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1SigOpt
 
Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019SigOpt
 
Softwareudvikling og vaerdiskabelse
Softwareudvikling og vaerdiskabelseSoftwareudvikling og vaerdiskabelse
Softwareudvikling og vaerdiskabelseSusanne Brøndberg
 

Similaire à Scalable Time Series Forecasting and Monitoring using Apache Spark and ElasticSearch at Adye (20)

MLProjectReport
MLProjectReportMLProjectReport
MLProjectReport
 
Risk Analysis in the Financial Services Industry
Risk Analysis in the Financial Services IndustryRisk Analysis in the Financial Services Industry
Risk Analysis in the Financial Services Industry
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Modeling at scale in systematic trading
Modeling at scale in systematic tradingModeling at scale in systematic trading
Modeling at scale in systematic trading
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
 
BIG MART SALES.pptx
BIG MART SALES.pptxBIG MART SALES.pptx
BIG MART SALES.pptx
 
BIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxBIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptx
 
Auto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADBAuto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADB
 
Being Reactive with Spring
Being Reactive with SpringBeing Reactive with Spring
Being Reactive with Spring
 
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
 
Stop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Stop Flying Blind! Quantifying Risk with Monte Carlo SimulationStop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Stop Flying Blind! Quantifying Risk with Monte Carlo Simulation
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
 
Neotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys PAC - Stijn Schepers
Neotys PAC - Stijn Schepers
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptx
 
Stock Market Trends Prediction after Earning Release.pptx
Stock Market Trends Prediction after Earning Release.pptxStock Market Trends Prediction after Earning Release.pptx
Stock Market Trends Prediction after Earning Release.pptx
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1
 
Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019
 
Softwareudvikling og vaerdiskabelse
Softwareudvikling og vaerdiskabelseSoftwareudvikling og vaerdiskabelse
Softwareudvikling og vaerdiskabelse
 
Softwareudvikling og vaerdiskabelse
Softwareudvikling og vaerdiskabelseSoftwareudvikling og vaerdiskabelse
Softwareudvikling og vaerdiskabelse
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Dernier

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

Scalable Time Series Forecasting and Monitoring using Apache Spark and ElasticSearch at Adye

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Andreu Mora, Adyen Time series forecasting and monitoring with Apache Spark and ElasticSearch #UnifiedDataAnalytics #SparkAISummit
  • 3. Adyen Payments Processor Tech company International customers (aka merchants) Omnichannel
  • 4. Back in the day… The legacy monitor was based on a SQL query that would compute an average for the hour of the week and compare to a threshold.
  • 5. Doesn’t quite work: • Generates loads of False Positives • It was fairly trimmed down: top merchants.
  • 8. Do that at scale
  • 10. Connect to a live platform
  • 11. OK, but What is an anomaly? No luxury of a labelled dataset, divergence 
 of opinions. Connecting to a live platform without 
 ML deployment hooks ready. We were working on MLflow but not there yet. No standard for timeseries forecasting at scale With spark, several choices.
  • 12. Considerations when dealing with Big Data Big Technology Leverage on mature Tech to solve the problem (hello Spark). Big diversity Many different topologies for our merchants and yet one algorithm to track them all. Big consequences 1000 merchants * 10 min * 95% accuracy = 50400 emails/week
  • 16. TimeSeries Ecosystem Flint Spark-ts FB Prophet Stats models Data size consideration 1 year @ 1 min @ double64 = 4.2 mb
  • 17. Scoring in Java While working on a fully functional engine to deploy ML models based on MLflow. Launch fast and iterate! Transporting the model The model transported for tens of thousands of accounts needs to be lightweight. Harness the maths No using blackboxed models, equations need to be understood and replicated in Java. Needs to perform fast Score and decide whether our seen traffic form ElasticSearch is actually anomalous on the ms scale.
  • 20. Fourier components Would not optimise the business cycles ARIMA Not perfect for picking up seasonality Isolation Forests Great for multidimensional data, not so much for time series. Autoencoders Good luck transporting the model for each merchant. XGBM Noice, but score that in Java. Research stage Understand a problem and build a solution, decide what’s best.
  • 21. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 22. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 23. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 24. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 25. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 26. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 27. Ridge Regression Makes scoring in Java nice and kinda easy. Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 28. Ridge Regression Makes scoring in Java nice and kinda easy. easy Residuals Confidence intervals modelled through quantile regression of observed values. Events Recurrent or one-off events are shown to the model. Piece-wise linear trends Breaks down the signal into pieces and learn the last trends. Gaussian Basis Functions Allow us to teach the model to understand business cycles The model Discover anomalous behaviour based on a probability p. Pre-sampling Allow us to sample and bucketize the merchants to adequate intervals.
  • 29. Trendspotting Estimating hinges and trends and offering it as subproduct to Account Managers for evaluating the low variations of volume.
  • 30. Train set: 90 days Test set: 7 days Real volume Predicted volume 95% confidence How do the predictions look like?
  • 32. The implementation on Spark How did we get there, on the Spark side. Reusability Overloads of scikit-learns and pandas allow us to ensure reusability Cross-validation Ensure the best tuning through tuning of hyperparameters. Scalability Using Spark’s map-reduce paradigm we totally control the computational performances.
  • 34. Input daily time series —> {t:[…], v:[…]} Collect to list —> [{t:[…], v:[…]}] Hinges and Hyperparameters Distribute UDF Making it happen at scale
  • 35. Cross-validation F4-sampling score: favours higher sampling considering classical precision and recall. Custom cv folds split TimeSeriesWeekSplit get the sense of the business cycle
  • 39. Overcoming unsupervised learning Alarm rate and synthetic recall allow us to know for each case how many alarms would have been captured and raised, even without having a labelled dataset.
  • 40. Trade-off alarm rates and recall We provide a number of choices (95%, 97%, 99% probability and completely profile what to expect in terms of anomalies.
  • 44. So we saw this on the data
  • 45.
  • 46.
  • 47.
  • 48. ’You don’t call us, we call you’
  • 50. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT