SlideShare a Scribd company logo
1 of 56
Download to read offline
Robust Automated Forecasting
In Python & R
Pranav Bahl, Data Scientist
Jonathan Stacks, DevOps Engineer
Time Series Forecasting
Time Series: A series of data points indexed in time order, spaced at equal time
intervals. It consists of two variables, time and values.
Overview
Goal: Demonstrate how to make millions of [robust] forecasts in an automated fashion
● Define the problem
● Retrieve and preprocess data
○ Profile
○ Transform
○ Detect outliers/anomalies
● Create forecast models employing multiple strategies and parameter combinations
○ Using Python
○ Using R
● Evaluate models with contextual evaluation metrics (and meta-metrics)
● Discuss how to choose the most appropriate hardware
Defining the problem
● Forecasts are used throughout
our risk management system
○ Bias towards risk aversion
● > 250,000 unique time series
○ Growing at 2x each year
● Elastic compute capacity
● Runtime <= 5 hours
● Cost <= $200/day
● Reduce error by 10%
○ Based on cumulative forecast % error
Runtime and cost calculation
Because the number of unique time series to be processed change over time so the idea is to adjust
number of servers to get the job done in desired time.
Time series profiling
● Ensure index is a timestamp
● Check for completeness of panel
● Remove leap days
● Make sure data has at least one complete season
● Truncate data to in favor of complete seasons
Handling missing values
● Impute using…
○ Descriptive statistics (e.g. mean or median)
○ Interpolation (e.g. linear or polynomial)
○ Extrapolation (e.g. training a linear model)
Outlier/anomaly detection
Outlier: An observation that lies an abnormal distance from other values in a random
sample from a population.
- Boxplot
- Time-series decomposition routine
Anomaly: Illegitimate data point that’s generated by a different process than whatever
generated the rest of the data.
- Forecasting
- Robust Principal Component Analysis (RPCA)
Outlier/anomaly detection
● Regression
○ [Python] StatsModels (OLS, polynomial)
○ [Python/R] XGBoost (gradient boosted regression trees)
● ARMA / ARIMA / SARMIA (Autoregressive Moving-Average)
○ [R] Forecast
○ [Python] StatsModels
○ [Python] Pyramid
● Exponential smoothing
○ [R] Forecast (Holt-Winters)
● Structural Models / State space models
○ [R] BSTS (Bayesian Structural Time Series)
○ [Python] Pyflux
○ [Python/R] Prophet
Forecasting library landscape
Evaluation Metrics
Evaluation metrics are used to measure the quality of forecast. It is also used to
compare different strategies.There are two types of evaluation metrics:
Scale dependent: The metrics which requires all the compared time series be on the same scale.
Eg:
Scale independent: The metrics used to compare forecast performance amongst different data sets
Eg:
Bringing Down Runtime, Scale and Cost
Runtime optimization: Run different experiments to narrow down poor
performing strategies or transformations
Cost optimization: Optimize the cost function by experimenting different server
options
Runtime optimization
Initial Approach
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
MAE
MSE
NMSE
SMSE
SSE
Scale dependent
Scale independent
MPE
MAPE
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
First experiment: Remove unnecessary metrics
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
MAE
MSE
NMSE
SMSE
SSE
Scale dependent
Scale independent
MPE
MAPE
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
First experiment: Remove unnecessary metrics
First experiment: Remove unnecessary metrics
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
MAE
MSE
NMSE
SMSE
SSE
Scale dependent
Scale independent
MPE
MAPE
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
First experiment: Result
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
Second experiment: Forecast model selection
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
Second experiment: Forecast model selection
● Choose distinct but meaningful parameters for all strategies
● Initial experiment with default settings
○ Exception: reduced number of simulations for faster runtime
● Compare strategies across filtered evaluation metrics
Second experiment: Forecast model selection
Second experiment: Forecast model selection
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
Second experiment: Result
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA
Third experiment: Auto ARIMA vs PyARIMA
Software
Hardware
TBATS ProphetPyARIMA
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
Auto ARIMA
Third experiment: Auto ARIMA vs PyARIMA
● Choose best Arima Implementation
○ Since Python’s Auto Arima reflect R’s implementation
● Compare error differences
○ Hold implementation with better performance
Error differences, where Python Auto ARIMA wins
50% py_auto_arima_log
50% py_auto_arima
Third experiment: Auto ARIMA vs PyARIMA
Software
Hardware
TBATS ProphetPyARIMA
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
Auto ARIMA
Third experiment: Result
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
Auto ARIMA
Added another transformation
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Fourth Experiment: Python Version Testing
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Fourth Experiment: Python Version Testing
● Execute all strategies on both Python versions
○ Run over different server types
● Compare different processing times
Fourth Experiment: Python Version Testing
Fourth Experiment: Python Version Testing
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Fourth Experiment: Result
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Fifth Experiment: Hardware Optimization
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Fifth Experiment: Hardware Optimization
Goal: Run all strategies with different
transformation across different
servers. Pick best server based on
time(cumulative) consumption.
Fifth Experiment: Hardware Optimization
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Fifth Experiment: Result
Software
Hardware
TBATS Prophet
C4
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Determine bottlenecks:
● Determine if any strategy is consuming considerably more time compared to
others
● Try optimizing overhead created by rpy2 while spinning up R kernel
Sixth Experiment: How to improve processing time?
Sixth Experiment: How to improve processing time?
Let’s look at the time distribution of different strategies
Auto_ARIMA TBATS Prophet
Under 2nd
standard deviation
Across all samples
Result: Apply timeout duration of 300 secs and analyse the loss of forecasting
strength
Sixth Experiment: How to improve processing time?
● Win distribution where Auto_ARIMA exceeds timeout threshold
○ time consuming accounts: 26/1000
○ time consuming accounts with Auto Arima as winning strategy: 14/26
Sixth Experiment: How to improve processing time?
● How much forecasting strength did we lose
○ Is ARIMA winning with huge margins
Sixth Experiment: How to improve processing time?
How does the time threshold help us:
● To gain 10% speedup
● Maintain acceptable loss of forecasting strength
Sixth Experiment: How to improve processing time?
Rpy2 ineffectiveness: It is not feasible to timeout the process when using rpy2
Solution: Create individual R scripts for each R strategy and run them as a
sub-process in Python
Sixth Experiment: How to improve processing time?
Software
Hardware
TBATS Prophet
C4
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
BSTS
Add BSTS(Bayesian Structural Time Series) strategy
Add BSTS(Bayesian Structural Time Series) strategy
Concluding Pipeline
Software
Hardware
TBATS Prophet
C4
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
BSTS
Scale + Cost Optimization
● Portable environments
● Immutable image
● Fast & Easy to deploy
across a cluster
● Same code in production
as in local development
Architecture
The Basics:
● AWS elastic infrastructure
● Redshift data warehouse
● Luigi task orchestration
● Celery queue to distribute work
● Docker on c4.8xlarge(s)
● Amazon Linux OS
● Anaconda distro
○ Python 3.6
○ R 3.4
● Apache Parquet on S3
ELERY
S3
ECS
Redshift
Parquet
EC2
Concurrency Implementation
Cluster Stats
● 28 Instances (c4.8xlarge)
● 1000 workers
● ~2.6 Million forecasts
● ~5.5 hours
On Demand Spot
Cost per hour $1.59 $0.60
Cost per run $267.00 $100.80
Conclusion
New forecasting pipeline resulted in reduction of forecast error
Future Improvements
● Run the experiment if any of the library experience upgrades
● Periodically look for new time series strategies
● Add regression models
● Investigate AWS Batch or Kubernetes.
References
https://c2fo.com/
https://www.otexts.org/fpp
https://arxiv.org/pdf/1302.6613.pdf
https://sites.google.com/site/stevethebayesian/googlepageforstevenlscott/course-a
nd-seminar-materials/bsts-bayesian-structural-time-series
Questions?
@pranav_bahl

More Related Content

What's hot

Mining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert BifetMining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert BifetJ On The Beach
 
Complexity Analysis
Complexity Analysis Complexity Analysis
Complexity Analysis Shaista Qadir
 
Lec 2 algorithms efficiency complexity
Lec 2 algorithms efficiency  complexityLec 2 algorithms efficiency  complexity
Lec 2 algorithms efficiency complexityAnaya Zafar
 
Big o notation
Big o notationBig o notation
Big o notationkeb97
 
Towards Verified Implementation of Event-B Models in Dafny
Towards Verified Implementation of Event-B Models in DafnyTowards Verified Implementation of Event-B Models in Dafny
Towards Verified Implementation of Event-B Models in DafnySadegh Dalvandi
 
핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNNJaey Jeong
 
Scilab: Computing Tool For Engineers
Scilab: Computing Tool For EngineersScilab: Computing Tool For Engineers
Scilab: Computing Tool For EngineersNaren P.R.
 
The maths behind microscaling
The maths behind microscalingThe maths behind microscaling
The maths behind microscalingLiz Rice
 
Update on Benchmark 7
Update on Benchmark 7Update on Benchmark 7
Update on Benchmark 7PFHub PFHub
 
Design & implementation of machine learning algorithm in (2)
Design & implementation of machine learning algorithm in (2)Design & implementation of machine learning algorithm in (2)
Design & implementation of machine learning algorithm in (2)saurabh Kumar Chaudhary
 
Cape2013 scilab-workshop-19Oct13
Cape2013 scilab-workshop-19Oct13Cape2013 scilab-workshop-19Oct13
Cape2013 scilab-workshop-19Oct13Naren P.R.
 
source code metrics and other maintenance tools and techniques
source code metrics and other maintenance tools and techniquessource code metrics and other maintenance tools and techniques
source code metrics and other maintenance tools and techniquesSiva Priya
 
Anders Nielsen AD Model-Builder
Anders Nielsen AD Model-BuilderAnders Nielsen AD Model-Builder
Anders Nielsen AD Model-BuilderDavid LeBauer
 
CNIT 127 Ch 5: Introduction to heap overflows
CNIT 127 Ch 5: Introduction to heap overflowsCNIT 127 Ch 5: Introduction to heap overflows
CNIT 127 Ch 5: Introduction to heap overflowsSam Bowne
 
Anders Nielsen template model-builder
Anders Nielsen template model-builderAnders Nielsen template model-builder
Anders Nielsen template model-builderDavid LeBauer
 

What's hot (20)

Time Series Made Easy
Time Series Made EasyTime Series Made Easy
Time Series Made Easy
 
Day 3 chapter 2 unit 1
Day 3 chapter 2 unit 1Day 3 chapter 2 unit 1
Day 3 chapter 2 unit 1
 
growth rates
growth ratesgrowth rates
growth rates
 
PREDIcT
PREDIcTPREDIcT
PREDIcT
 
Mining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert BifetMining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert Bifet
 
Complexity Analysis
Complexity Analysis Complexity Analysis
Complexity Analysis
 
Lec 2 algorithms efficiency complexity
Lec 2 algorithms efficiency  complexityLec 2 algorithms efficiency  complexity
Lec 2 algorithms efficiency complexity
 
Big o notation
Big o notationBig o notation
Big o notation
 
Towards Verified Implementation of Event-B Models in Dafny
Towards Verified Implementation of Event-B Models in DafnyTowards Verified Implementation of Event-B Models in Dafny
Towards Verified Implementation of Event-B Models in Dafny
 
핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN
 
Scilab: Computing Tool For Engineers
Scilab: Computing Tool For EngineersScilab: Computing Tool For Engineers
Scilab: Computing Tool For Engineers
 
The maths behind microscaling
The maths behind microscalingThe maths behind microscaling
The maths behind microscaling
 
Update on Benchmark 7
Update on Benchmark 7Update on Benchmark 7
Update on Benchmark 7
 
Design & implementation of machine learning algorithm in (2)
Design & implementation of machine learning algorithm in (2)Design & implementation of machine learning algorithm in (2)
Design & implementation of machine learning algorithm in (2)
 
Cape2013 scilab-workshop-19Oct13
Cape2013 scilab-workshop-19Oct13Cape2013 scilab-workshop-19Oct13
Cape2013 scilab-workshop-19Oct13
 
source code metrics and other maintenance tools and techniques
source code metrics and other maintenance tools and techniquessource code metrics and other maintenance tools and techniques
source code metrics and other maintenance tools and techniques
 
Anders Nielsen AD Model-Builder
Anders Nielsen AD Model-BuilderAnders Nielsen AD Model-Builder
Anders Nielsen AD Model-Builder
 
Fpga 12-event-control
Fpga 12-event-controlFpga 12-event-control
Fpga 12-event-control
 
CNIT 127 Ch 5: Introduction to heap overflows
CNIT 127 Ch 5: Introduction to heap overflowsCNIT 127 Ch 5: Introduction to heap overflows
CNIT 127 Ch 5: Introduction to heap overflows
 
Anders Nielsen template model-builder
Anders Nielsen template model-builderAnders Nielsen template model-builder
Anders Nielsen template model-builder
 

Similar to Robust Automated Time Series Forecasting in Python & R

Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkDatabricks
 
Agile Testing Analytics
Agile Testing AnalyticsAgile Testing Analytics
Agile Testing AnalyticsQASymphony
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3LibbySchulze
 
Statistical Arbitrage
Statistical ArbitrageStatistical Arbitrage
Statistical ArbitrageShubham Patil
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningSigOpt
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
Cutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneCutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneXiaoweiJiang7
 
Getting Started with Calc Manager for HFM
Getting Started with Calc Manager for HFMGetting Started with Calc Manager for HFM
Getting Started with Calc Manager for HFMAlithya
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenPoo Kuan Hoong
 
How To Transform the Manual Testing Process to Incorporate Test Automation
How To Transform the Manual Testing Process to Incorporate Test AutomationHow To Transform the Manual Testing Process to Incorporate Test Automation
How To Transform the Manual Testing Process to Incorporate Test AutomationRanorex
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
How to apply machine learning into your CI/CD pipeline
How to apply machine learning into your CI/CD pipelineHow to apply machine learning into your CI/CD pipeline
How to apply machine learning into your CI/CD pipelineAlon Weiss
 
Sumo Logic Cert Jam - Metrics Mastery
Sumo Logic Cert Jam - Metrics MasterySumo Logic Cert Jam - Metrics Mastery
Sumo Logic Cert Jam - Metrics MasterySumo Logic
 
A practical introduction to observability
A practical introduction to observabilityA practical introduction to observability
A practical introduction to observabilityNikolay Stoitsev
 
Declarative Experimentation in Information Retrieval using PyTerrier
Declarative Experimentation in Information Retrieval using PyTerrierDeclarative Experimentation in Information Retrieval using PyTerrier
Declarative Experimentation in Information Retrieval using PyTerrierCrai Macdonald
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet SentimentLucinda Linde
 

Similar to Robust Automated Time Series Forecasting in Python & R (20)

Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
Agile Testing Analytics
Agile Testing AnalyticsAgile Testing Analytics
Agile Testing Analytics
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Statistical Arbitrage
Statistical ArbitrageStatistical Arbitrage
Statistical Arbitrage
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
CNCF opa
CNCF opaCNCF opa
CNCF opa
 
Cutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneCutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tune
 
Getting Started with Calc Manager for HFM
Getting Started with Calc Manager for HFMGetting Started with Calc Manager for HFM
Getting Started with Calc Manager for HFM
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
How To Transform the Manual Testing Process to Incorporate Test Automation
How To Transform the Manual Testing Process to Incorporate Test AutomationHow To Transform the Manual Testing Process to Incorporate Test Automation
How To Transform the Manual Testing Process to Incorporate Test Automation
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
How to apply machine learning into your CI/CD pipeline
How to apply machine learning into your CI/CD pipelineHow to apply machine learning into your CI/CD pipeline
How to apply machine learning into your CI/CD pipeline
 
Sumo Logic Cert Jam - Metrics Mastery
Sumo Logic Cert Jam - Metrics MasterySumo Logic Cert Jam - Metrics Mastery
Sumo Logic Cert Jam - Metrics Mastery
 
A practical introduction to observability
A practical introduction to observabilityA practical introduction to observability
A practical introduction to observability
 
Declarative Experimentation in Information Retrieval using PyTerrier
Declarative Experimentation in Information Retrieval using PyTerrierDeclarative Experimentation in Information Retrieval using PyTerrier
Declarative Experimentation in Information Retrieval using PyTerrier
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
 

More from PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Robust Automated Time Series Forecasting in Python & R

  • 1. Robust Automated Forecasting In Python & R Pranav Bahl, Data Scientist Jonathan Stacks, DevOps Engineer
  • 2. Time Series Forecasting Time Series: A series of data points indexed in time order, spaced at equal time intervals. It consists of two variables, time and values.
  • 3. Overview Goal: Demonstrate how to make millions of [robust] forecasts in an automated fashion ● Define the problem ● Retrieve and preprocess data ○ Profile ○ Transform ○ Detect outliers/anomalies ● Create forecast models employing multiple strategies and parameter combinations ○ Using Python ○ Using R ● Evaluate models with contextual evaluation metrics (and meta-metrics) ● Discuss how to choose the most appropriate hardware
  • 4. Defining the problem ● Forecasts are used throughout our risk management system ○ Bias towards risk aversion ● > 250,000 unique time series ○ Growing at 2x each year ● Elastic compute capacity ● Runtime <= 5 hours ● Cost <= $200/day ● Reduce error by 10% ○ Based on cumulative forecast % error
  • 5. Runtime and cost calculation Because the number of unique time series to be processed change over time so the idea is to adjust number of servers to get the job done in desired time.
  • 6. Time series profiling ● Ensure index is a timestamp ● Check for completeness of panel ● Remove leap days ● Make sure data has at least one complete season ● Truncate data to in favor of complete seasons
  • 7. Handling missing values ● Impute using… ○ Descriptive statistics (e.g. mean or median) ○ Interpolation (e.g. linear or polynomial) ○ Extrapolation (e.g. training a linear model)
  • 8. Outlier/anomaly detection Outlier: An observation that lies an abnormal distance from other values in a random sample from a population. - Boxplot - Time-series decomposition routine Anomaly: Illegitimate data point that’s generated by a different process than whatever generated the rest of the data. - Forecasting - Robust Principal Component Analysis (RPCA)
  • 10. ● Regression ○ [Python] StatsModels (OLS, polynomial) ○ [Python/R] XGBoost (gradient boosted regression trees) ● ARMA / ARIMA / SARMIA (Autoregressive Moving-Average) ○ [R] Forecast ○ [Python] StatsModels ○ [Python] Pyramid ● Exponential smoothing ○ [R] Forecast (Holt-Winters) ● Structural Models / State space models ○ [R] BSTS (Bayesian Structural Time Series) ○ [Python] Pyflux ○ [Python/R] Prophet Forecasting library landscape
  • 11. Evaluation Metrics Evaluation metrics are used to measure the quality of forecast. It is also used to compare different strategies.There are two types of evaluation metrics: Scale dependent: The metrics which requires all the compared time series be on the same scale. Eg: Scale independent: The metrics used to compare forecast performance amongst different data sets Eg:
  • 12. Bringing Down Runtime, Scale and Cost Runtime optimization: Run different experiments to narrow down poor performing strategies or transformations Cost optimization: Optimize the cost function by experimenting different server options
  • 14. Initial Approach Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE MAE MSE NMSE SMSE SSE Scale dependent Scale independent MPE MAPE SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA Pyflux
  • 15. First experiment: Remove unnecessary metrics Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE MAE MSE NMSE SMSE SSE Scale dependent Scale independent MPE MAPE SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA Pyflux
  • 16. First experiment: Remove unnecessary metrics
  • 17. First experiment: Remove unnecessary metrics Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE MAE MSE NMSE SMSE SSE Scale dependent Scale independent MPE MAPE SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA Pyflux
  • 18. First experiment: Result Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA Pyflux
  • 19. Second experiment: Forecast model selection Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA Pyflux
  • 20. Second experiment: Forecast model selection ● Choose distinct but meaningful parameters for all strategies ● Initial experiment with default settings ○ Exception: reduced number of simulations for faster runtime ● Compare strategies across filtered evaluation metrics
  • 21. Second experiment: Forecast model selection
  • 22. Second experiment: Forecast model selection Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA Pyflux
  • 23. Second experiment: Result Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA
  • 24. Third experiment: Auto ARIMA vs PyARIMA Software Hardware TBATS ProphetPyARIMA C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log Auto ARIMA
  • 25. Third experiment: Auto ARIMA vs PyARIMA ● Choose best Arima Implementation ○ Since Python’s Auto Arima reflect R’s implementation ● Compare error differences ○ Hold implementation with better performance Error differences, where Python Auto ARIMA wins 50% py_auto_arima_log 50% py_auto_arima
  • 26. Third experiment: Auto ARIMA vs PyARIMA Software Hardware TBATS ProphetPyARIMA C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log Auto ARIMA
  • 27. Third experiment: Result Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log Auto ARIMA
  • 28. Added another transformation Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 29. Fourth Experiment: Python Version Testing Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 30. Fourth Experiment: Python Version Testing ● Execute all strategies on both Python versions ○ Run over different server types ● Compare different processing times
  • 31. Fourth Experiment: Python Version Testing
  • 32. Fourth Experiment: Python Version Testing Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 33. Fourth Experiment: Result Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 34. Fifth Experiment: Hardware Optimization Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 35. Fifth Experiment: Hardware Optimization Goal: Run all strategies with different transformation across different servers. Pick best server based on time(cumulative) consumption.
  • 36. Fifth Experiment: Hardware Optimization Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 37. Fifth Experiment: Result Software Hardware TBATS Prophet C4 Amazon EC2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 38. Determine bottlenecks: ● Determine if any strategy is consuming considerably more time compared to others ● Try optimizing overhead created by rpy2 while spinning up R kernel Sixth Experiment: How to improve processing time?
  • 39. Sixth Experiment: How to improve processing time?
  • 40. Let’s look at the time distribution of different strategies Auto_ARIMA TBATS Prophet Under 2nd standard deviation Across all samples Result: Apply timeout duration of 300 secs and analyse the loss of forecasting strength Sixth Experiment: How to improve processing time?
  • 41. ● Win distribution where Auto_ARIMA exceeds timeout threshold ○ time consuming accounts: 26/1000 ○ time consuming accounts with Auto Arima as winning strategy: 14/26 Sixth Experiment: How to improve processing time?
  • 42. ● How much forecasting strength did we lose ○ Is ARIMA winning with huge margins Sixth Experiment: How to improve processing time?
  • 43. How does the time threshold help us: ● To gain 10% speedup ● Maintain acceptable loss of forecasting strength Sixth Experiment: How to improve processing time?
  • 44. Rpy2 ineffectiveness: It is not feasible to timeout the process when using rpy2 Solution: Create individual R scripts for each R strategy and run them as a sub-process in Python Sixth Experiment: How to improve processing time?
  • 45. Software Hardware TBATS Prophet C4 Amazon EC2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox BSTS Add BSTS(Bayesian Structural Time Series) strategy
  • 46. Add BSTS(Bayesian Structural Time Series) strategy
  • 47. Concluding Pipeline Software Hardware TBATS Prophet C4 Amazon EC2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox BSTS
  • 48. Scale + Cost Optimization
  • 49. ● Portable environments ● Immutable image ● Fast & Easy to deploy across a cluster ● Same code in production as in local development
  • 50. Architecture The Basics: ● AWS elastic infrastructure ● Redshift data warehouse ● Luigi task orchestration ● Celery queue to distribute work ● Docker on c4.8xlarge(s) ● Amazon Linux OS ● Anaconda distro ○ Python 3.6 ○ R 3.4 ● Apache Parquet on S3 ELERY S3 ECS Redshift Parquet EC2
  • 52. Cluster Stats ● 28 Instances (c4.8xlarge) ● 1000 workers ● ~2.6 Million forecasts ● ~5.5 hours On Demand Spot Cost per hour $1.59 $0.60 Cost per run $267.00 $100.80
  • 53. Conclusion New forecasting pipeline resulted in reduction of forecast error
  • 54. Future Improvements ● Run the experiment if any of the library experience upgrades ● Periodically look for new time series strategies ● Add regression models ● Investigate AWS Batch or Kubernetes.