SlideShare a Scribd company logo
1 of 21
Download to read offline
Amer Ather
Netflix Performance Engineering
Machine Learning
at
➢ ML gives machines (computers) ability to learn without being explicitly
programmed
➢ ML is about teaching machines to perform tasks on prior experiences
(knowledge). Experience comes from data
➢ ML Algorithms enable machines to identify patterns in observed data
➢ Predict things without having explicit pre-programmed rules
Training: A learner is trained on dataset and emits a learned model
Inference: A trained or learned model takes real world inputs and make predictions
Machine Learning (ML)
ML Algorithms
➢ The main objective of a ML algorithm (algo) is to pick the most
sensible place to put a fence in data.
➢ The goal of all ML algo is to best estimate a target function (f) that
maps input data (X) onto output variables (Y).
➢ There are bunch of ML algo available. Choice depends on the
specific problem.
➢ Tree based Ensemble algo (gradient tree boosting, random forest)
are known to work best on wide variety of datasets
➢ In addition, Hyperparameters optimization of a given algo can
sometime leads to significant improvement in predictive accuracy
for most problems
Popular ML algorithms:
○ Gaussian Naive Bayes (GNB)
○ Bernoulli Naive Bayes (BNB)
○ Multinomial Naive Bayes (MNB)
○ Logistic Regression (LR)
○ Stochastic Gradient Descent (SGD)
○ Passive Aggressive Classifier (PAC)
○ Support Vector Classifier (SVC)
○ K-Nearest Neighbor (KNN)
○ Decision Tree (DT)
○ Random Forest (RF)
○ Extra Trees Classifier (ERF)
○ AdaBoost (AB)
○ Gradient Tree Boosting (GTB)
10-fold CV balanced accuracy of each algorithm on a given dataset, with a lower ranking indicating higher accuracy. The rankings
show the strength of ensemble-based tree algorithms in generating accurate models: The first, second, and fourth-ranked
algorithms belong to this class of algorithms.
Data driven advice to applying machine learning
Deep Learning (DL)
➢ Deep Learning (DL) uses deep neural networks (DNNs) that are built via deep
layering of connected artificial neurons, also called perceptrons
○ A neuron can be thought of a function that takes in multiple inputs and yields a
single output. Types of function commonly used are: sigmoid, softmax, ReLu..
➢ DNN functions define relationship between input and output layers, which is
parameterized by weights.
○ Activation functions allows performing various learning tasks by reducing a cost
function and adjusting parameter weights. Errors are minimized by adjusting
weight (w) and bias (b) via gradient descent
Features:
➢ DL Models can extract useful features from raw data, called Feature Learning
➢ DL models are trained to form a non-linear relationships
➢ DL models can be tweaked easily to avoid overfitting
➢ DL does as good a job in nonlinear dimensionality reduction than PCA
○ Autoencoder can recreate the image from low-dimensional codes
➢ NN layers and weights can be tweaked to implement Transfer Learning
DL has proven to work best in the fields of: image (computer vision) and speech recognition,
NLP, sentiment analysis, self driving and recommendation systems
CNN in action
Machine Learning - Scalability
➢ Vast majority of ML use cases are data parallel
➢ Parallel processing across multiple GPU/CPU can reduce the model training
time. Parallel computation can be applied to:
○ Model training via ensembles of decision trees (DT)
○ Model Evaluation via resampling procedures like k-fold cross-validation
○ Tuning hyperparameters via grid/random search
➢ ML libraries that support multi-gpu model training:
○ XGBoost
○ LightGBM
○ Horovod - Trained convolutional Networks and LSTMs in hours instead of days or weeks
➢ Data Parallelism in DL
○ LSTM - One layer per GPU
○ Distributed SGD - SGD mini-batches over a pool of parallel workers by using learning rate
adjustment as a function of minibatch size technique
➢ Model Parallelism in DL
○ Stacked LSTM
Accurate,Large Minibatch SGD: Training ImageNet in 1 Hour
GPU - General Purpose Computing
➢ Ability to program GPU in high level programming languages like C, C++
○ No knowledge of graphics prog (OpenGL or DirectX)
○ Knowledge of CUDA language, modestly extended version of C
➢ CUDA program utilizes GPUs in conjunction with CPUs to accelerate
compute heavy tasks
○ Application code runs on CPUs but can offload compute intensive task to
GPUs, called CUDA kernel function
➢ Data parallel problems, common in ML/DL, fit well for GPU
computation, where each data element can run in parallel and same
kernel function can be applied to each data element
➢ Neural networks are created from identical neurons that are highly
parallel by nature and rely heavily on matrix math operations, best
supported on GPU. Significant speedup over CPU-only model training
General Purpose GPU programming
GPU vs. CPU
GPU
● Thousands of smaller cores. Ideal for compute intensive
parallel tasks or stream processing
● gpu is connected via PCI-e to system bus
● gpu offers much higher instruction throughput and
memory bandwidth than cpu
● gpu has more transistors dedicated for data processing
rather than data caching
● Physical gpu has 20-80 streaming multiprocessors
(SM). Each SM can have hundreds of cores, that adds up
to thousand of cores. Each core runs one thread
● Stream processing can get 10x performance speed up
on gpu due to efficient memory access and higher level
of parallel processing
● A systems can have multiple gpus. gpu-gpu
communication is possible via NVidia NVLINK without
going over PCIe bus.
● Each SM in gpu has on-chip 512 KB register file, 128
KB shared memory and off-chip 1.5 MB shared L2
● gpu cores run in lock step mode, called warp. All
threads in warp starts at the same program address
Getting started with Nvidia CUDA GPUs
CPU
● Fewer cores. Optimize for sequential serial processing
● cpu socket is directly attached to system bus
● Physical cpu socket can have multiple logical cores with
each core has two hyperthread (HT) of execution
● A system can have multiple physical cpus.
Inter-processor communication is via system bus
● Each cpu core has a dedicated L1/L2 and off-core
shared L3 cache
● Each cpu run independently of each other
● Stream processing speed up is limited, ~1.5%
improvement
CPU GPU
GPU - Performance Considerations
➢ Improve gpu utilization by reducing cpu (host) and gpu (device) memory transfers
○ For example: All stages of the Decision Tree construction can be efficiently performed on GPU
➢ Gradient Boosting works best on GPU. ML libraries, like XGBoost, are optimized to run all phases of training
○ Data compression, gradient calculation, feature quantization, prediction, decision tree construction and
evaluation
➢ Scale computation across multiple GPUs on a system. Nvidia GPUs supports NVLink for inter-gpu communication,
that offers 10x times higher throughput than communicating over PCIe bus
➢ Train model with mixed precision. Nvidia Tensor cores (Volta/Turing GPU) support mixed precision training
○ Lower precision than 32-bit floating point requires less memory and computation bandwidth
○ Math operations run faster in reduced precision
➢ GPU primitives can be used to compose more complicated algorithms while retaining high performance, readability
and reliability. Simple algo can be used to build massively parallel algo. Some examples of parallel primitives:
○ Radix sort, Reduction Harris, Parallel prefix sum (scan), Segmented scan and reduce, Interleaved sequences
(multi-reduce), interleaved sequences (multi-scan)
➢ GPUs are optimized for 32-bit floating point operations, but not for 64-bit double precision
○ 32-bit parallel and sequential summation show dramatically superior numerical stability
○ Errors of parallel summation has O(logn) complexity, as compared to O(n) for sequential summation
Scan Primitives for GPU Computing
Mason - Netflix ML Workflow and Orchestration
➢ Models should learn and adapt to new data as it arrives. ML
workflow involves:
Labeling -> Feature Generation -> Training -> Metrics
➢ At Netflix, Meson performs workflow orchestration and job
scheduling, and Mesos is used for cluster management
○ Several ML pipelines are built to train and test
recommendation algo.
➢ Meson supports:
○ Convenient authoring of workflow via Scala based DSL
○ Support ML specific constructs like: parallel parameters
sweeping, cross validation, bootstrapping etc.
○ Support custom extensions to perform various tasks:
Submit jobs to Spark cluster, query Hive tables, access to
Netflix microservices and plugin visualizations
Netflix - Metaflow
➢ Netflix python library for creating & executing DAGs (directed acyclic graph) as
workflow. Each node in DAG is a processing step
○ Metaflow handles data flow and state transfers at each layer
➢ Gives user a freedom to design and implement their own code inside the DAG
➢ Makes it easy for ML workloads to interact with AWS cloud infrastructure like:
storage, compute, notebooks or other UI..
➢ Takes snapshot of the code, data and dependencies automatically. Ease of
collaboration due to built-in versioning and logging
➢ Support for resuming workflows, reproducing past results, and inspecting
workflow in a notebook
➢ Graphs can be large (fan-outs) with thousands of tasks in a single workflow
➢ Job scheduler layer (Meson, AWS Step functions) is responsible for
orchestrating the workflow and assigning DAG to compute layer
○ Schedule steps in topological order
○ Making sure each step in graph is finished before executing next
○ Support trigger based (cron, external condition..) execution of workflows
At Netflix scale, scheduler handles hundreds of thousands of active workflows
Netflix - Notebook (Polynote)
➢ Notebook is a web tool popular among ML community for:
○ Sharing live code, visualization..
○ Data cleaning, transformation, simulation, modeling..
➢ Polynote is a new notebook system built at Netflix, that offers
○ IDE like features: autocomplete, parameter hints, in-line error highlighting..
○ Parameterized notebook for building reusable templates
○ Polyglot language: Python, SQL, Scala
○ Apache Spark integration
➢ ML engineers are required to work with multiple languages:
○ Scala and Spark to generate training data (cleaning, subsampling,..)
○ Training model with Python ML libraries like tensorflow, scikit-learn..
➢ Polynote improves notebook’s reproducibility and visibility features:
○ Keep notebook hidden state intact when cells are executed in any order
○ Dependency and configuration setup (Spark) are saved within notebook
○ Data Visualization with matplotlib and Vega
Source: Polynote - an IDE-inspired polyglot notebook
Netflix - Notebook Infrastructure
Netflix users construct entire workflows in a notebook. To support varying use cases and automation,
Netflix built a notebook infrastructure with open source and home grown projects:
➢ nteract : next gen react-based UI for Jupyter notebooks
➢ Meson: Netflix workflow orchestration platform
➢ Papermill : Library for parameterizing, executing and analyzing jupyter notebooks.
➢ Commuter: Service for viewing and sharing notebooks, stored on S3
➢ Titus: Netflix container management platform
➢ Storage: S3, EFS
➢ Compute: All jobs are scheduled on container
Beyond Interactive, Notebook innovation at Netflix
ML
Use Cases
Recommendation
➢ Netflix recommendation engine is responsible for:
○ Personalizing member ‘s home page
○ Recommending what shows to watch
○ Displaying artworks
➢ Various ML models are tested offline on historical viewing
data to see if it would have improved recommendations. If it
would, deploy a live A/B testing to see if it performs well in
production
➢ Goal is to predict better what you want to watch before you
watch it.
➢ All sorts of models are tested during exploration:
○ Logistic regression (2014)
○ Ensemble Model of Decision Trees (DT)
○ Trees and Very Large GBDT (xgboost)
○ FeedForward Neural Network (NN)
○ Recurrent NN
○ Convolutional NN
○ LTSM / Stacked LTSM
Streaming Quality
➢ Netflix optimizes content delivery by a combination of intelligent caching and encoding recipes that
incorporate: device capabilities, title complexity, geographical location and network bandwidth
➢ Viewing experience and streaming quality are enhanced by applying predictive models:
○ Device caching takes into account user immediate (20 seconds) viewing history to predict what next
unwatched episode in series will be watched next
○ Network quality characterization and prediction to adopt video quality during playback
○ Actively monitoring constraints around resource usage like: device memory, available network bandwidth to
reduce video start time
➢ By best predicting regional demands, video assets can be cached closer to subscriber location..and
that reduces rebuffer events even with a higher quality streaming
➢ Remove redundancy in video encoded via Spatial and temporal prediction and correlation, resulting
in less bandwidth requirements for delivering same quality video
➢ Content allocation algorithm to improve Netflix CDN hardware utilization
Source: How Data Science Helps Power Worldwide Delivery of Netflix Content
➢ Netflix load balances subscriber load across multiple AWS regions
➢ Drop in SPS (Stream Per Second) metric triggers regional failover
➢ Linear regression model is used to predict the traffic that will be
routed to savior regions
➢ Model is trained on historical scaling behavior of the microservice
to predict level of scale up or system resources required to handle
the load for that time of day.
➢ Regional failover takes into account geographical location of
subscribers and capacity requirements of microservice to achieve
graceful failover
➢ Failover efficiency ( 7 mins to failover the regions) is achieved by
keeping enough dark capacity online in each region that meets
service scaling requirements for that day
➢ Dark capacity is whitelisted to take production traffic at failover
Regional Failover
Resource Management
(Predictive container placement)
➢ Optimum container placement using combinatorial optimization and ML instead of solely relying
on Linux CFS scheduler to make placement decisions
➢ Allocate containers closer to compute resources by detecting optimum collocation opportunities
➢ Gradient boosting ML model is trained via LightGBM library on container cpu usage data. Model
predicts 95 percentile cpu usage of each container for next 10 minutes via condition quantile
regression
○ Container metadata (image, app name, memory, net..) along with time series cpu usage for
last hour are used for model training.
○ Model prediction is fed into MIP (Mix Integer Programming) that spits out the optimized
placement. Container isolation is applied by cgroup cpusets changes.
➢ Type of constraints applied to placement decisions:
○ Assign all tasks within a container to same socket to avoid numa latencies
○ Container is assigned a minimum of one core to avoid core and L1/L2 cache sharing
○ Spread different containers across sockets, if possible to reduce shared L3 cache
contention
○ Not to modify placement of running container when adding/removing containers
Predictive CPU isolation of containers at Netflix
Container tasks
runtime distribution
with and without
improved isolation.
Less outliers with
container isolation
applied
Anomaly Detection
➢ Anomaly detection systems are optimized for higher precision ( reduce false
detection) while maintaining recall (true anomaly)
➢ Outliers are points in data that exhibit significantly different properties than the
majority of the points, commonly used for detecting anomaly in areas:
○ suspicious financial or credit card transactions,
○ Traffic violation and management
○ Network intrusions or hacking. Surveillance
○ Health monitoring
○ Event detection in time series and sensor data
➢ Netflix device reliability team apply statistical and predictive modeling to prioritize
device reliability issues by controlling various covariates
➢ Models are trained with past incidents that are labeled False and True (known to be
real issue and actionable).
○ Incident data is high dimensional with a rich structure to reliably determine the root cause
○ Trained model predicts the likelihood that if a given set of measured conditions constitutes
a real problem.
➢ Netflix data team uses Robust Anomaly Detection (RAD) algo to detect anomalies
in high cardinality Big Data.
○ RAD algo is being used at Netflix to detect anomaly: failures in receiving bank payments
and to identify subscriber sign up problems across devices and browsers
Capacity Forecasting
➢ Regression model that predicts Netflix microservice RPS (Request per Second) by
identifying its relationship with system resources (cpu, mem, net, io)
○ Model is trained with system resource usage (features) and RPS (label) metrics of
popular Netflix services. Additional dimensions like: AWS region, time of day can be
added to make more precise prediction
○ Service metrics (last 2 weeks) are fetched from Netflix telemetry system (Atlas) to
retrain the model
➢ Helps with capacity planning by forecasting system resource needed that can
scale with a service RPS growth.
➢ Trained model is deployed as a WebApp or microservice to help Netflix service
team to estimate cloud cost increase in relation to service RPS changes
Feature (cpu, mem, io..) correlation with RPS
Thanks
For
Listening
References➢ Netflix Machine Learning - Techblog at Medium
➢ General Purpose GPU Computing. Getting Started with Nvidia CUDA
➢ Using Machine Learning to improve Streaming Quality at Netflix
➢ How Data Science Helps Power Worldwide Delivery of Netflix Content
➢ Telltale: Netflix Application Monitoring Simplified
➢ Predictive CPU isolation of Containers at Netflix
➢ Mason - ML Workflow Orchestration at Netflix
➢ MetaFlow, a Human-Centric Framework for Data Science, Metaflow and AWS Step Functions, Metaflow Docs
➢ Polynote - IDE inspired Polyglot Notebook
➢ Scheduling Notebooks at Netflix
➢ Justin Basilico Presentations on Netflix Personalization
➢ Introduction to Causality in Machine Learning. How Netflix Applies Computation Causal Inference
➢ Comparing Popular ML Algorithms on different Datasets. When to use a particular ML Algorithms
➢ Model Selection for Machine Learning
➢ Strength and Weaknesses of ML Algorithms
➢ Dimensionality Reduction, Feature Selection and Extraction
➢ How to use Gradient Boosting Libraries, XGBoost, LightGBM using Scikit-Learn ML framework
➢ Model Learning Rate tuning when training Deep Learning Neural Networks
➢ 4 Automatic Outlier Detection Algorithms in Python
➢ 17 Statistical Hypothesis Tests needed in ML
➢ Understand Intuitively Different ML Classification Algorithm Principles
➢ Understand Intuitively How Neural Networks Work
➢ Model Performance Validation and Metrics
➢ Convolutional Neural Network (CNN) in action
➢ Distributed Training via Large Minibatch SGD
➢ Scan Primitives for GPU Computing. GPU Primitives for implementing popular algorithms like sorting, prefix sum..
➢ GPU parallel programming using Cuda. Free online class at Udacity

More Related Content

What's hot

Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
 
The State of Global AI Adoption in 2023
The State of Global AI Adoption in 2023The State of Global AI Adoption in 2023
The State of Global AI Adoption in 2023InData Labs
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AIBenjaminlapid1
 
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
Gen AI Cognizant & AWS event presentation_12 Oct.pdfGen AI Cognizant & AWS event presentation_12 Oct.pdf
Gen AI Cognizant & AWS event presentation_12 Oct.pdfPhilipBasford
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumSasha Rosenbaum
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoMLNing Jiang
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Krishnaram Kenthapadi
 
Generative AI For Everyone on AWS.pdf
Generative AI For Everyone on AWS.pdfGenerative AI For Everyone on AWS.pdf
Generative AI For Everyone on AWS.pdfManjunatha Sai
 
A Framework for Navigating Generative Artificial Intelligence for Enterprise
A Framework for Navigating Generative Artificial Intelligence for EnterpriseA Framework for Navigating Generative Artificial Intelligence for Enterprise
A Framework for Navigating Generative Artificial Intelligence for EnterpriseRocketSource
 
Generative AI Fundamentals - Databricks
Generative AI Fundamentals - DatabricksGenerative AI Fundamentals - Databricks
Generative AI Fundamentals - DatabricksVijayananda Mohire
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
UNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdf
UNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdfUNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdf
UNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdfHermes Romero
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...Amazon Web Services
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 

What's hot (20)

Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
The State of Global AI Adoption in 2023
The State of Global AI Adoption in 2023The State of Global AI Adoption in 2023
The State of Global AI Adoption in 2023
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
 
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
Gen AI Cognizant & AWS event presentation_12 Oct.pdfGen AI Cognizant & AWS event presentation_12 Oct.pdf
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha Rosenbaum
 
The-CxO-Guide-to.pdf
The-CxO-Guide-to.pdfThe-CxO-Guide-to.pdf
The-CxO-Guide-to.pdf
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Generative AI For Everyone on AWS.pdf
Generative AI For Everyone on AWS.pdfGenerative AI For Everyone on AWS.pdf
Generative AI For Everyone on AWS.pdf
 
Machine Learning Pitch Deck
Machine Learning Pitch DeckMachine Learning Pitch Deck
Machine Learning Pitch Deck
 
A Framework for Navigating Generative Artificial Intelligence for Enterprise
A Framework for Navigating Generative Artificial Intelligence for EnterpriseA Framework for Navigating Generative Artificial Intelligence for Enterprise
A Framework for Navigating Generative Artificial Intelligence for Enterprise
 
Generative AI Fundamentals - Databricks
Generative AI Fundamentals - DatabricksGenerative AI Fundamentals - Databricks
Generative AI Fundamentals - Databricks
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
UNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdf
UNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdfUNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdf
UNLEASHING INNOVATION Exploring Generative AI in the Enterprise.pdf
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 

Similar to Netflix machine learning

DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptxruvex
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationHao Xu
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDataWorks Summit
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewPoo Kuan Hoong
 
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Anant Corporation
 
Client side machine learning
Client side machine learningClient side machine learning
Client side machine learningKumar Abhinav
 
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0Sahil Kaw
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...areej qasrawi
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...Bharath Sudharsan
 
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Holdings
 
Distributed deep learning optimizations
Distributed deep learning optimizationsDistributed deep learning optimizations
Distributed deep learning optimizationsgeetachauhan
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learningGanesan Narayanasamy
 
If the data cannot come to the algorithm...
If the data cannot come to the algorithm...If the data cannot come to the algorithm...
If the data cannot come to the algorithm...Robert Burrell Donkin
 

Similar to Netflix machine learning (20)

DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale Automation
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
Tf paper ppt
Tf paper pptTf paper ppt
Tf paper ppt
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
 
Client side machine learning
Client side machine learningClient side machine learning
Client side machine learning
 
Apache MXNet AI
Apache MXNet AIApache MXNet AI
Apache MXNet AI
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
 
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part II
 
Distributed deep learning optimizations
Distributed deep learning optimizationsDistributed deep learning optimizations
Distributed deep learning optimizations
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
 
If the data cannot come to the algorithm...
If the data cannot come to the algorithm...If the data cannot come to the algorithm...
If the data cannot come to the algorithm...
 

Recently uploaded

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 

Recently uploaded (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 

Netflix machine learning

  • 1. Amer Ather Netflix Performance Engineering Machine Learning at
  • 2. ➢ ML gives machines (computers) ability to learn without being explicitly programmed ➢ ML is about teaching machines to perform tasks on prior experiences (knowledge). Experience comes from data ➢ ML Algorithms enable machines to identify patterns in observed data ➢ Predict things without having explicit pre-programmed rules Training: A learner is trained on dataset and emits a learned model Inference: A trained or learned model takes real world inputs and make predictions Machine Learning (ML)
  • 3. ML Algorithms ➢ The main objective of a ML algorithm (algo) is to pick the most sensible place to put a fence in data. ➢ The goal of all ML algo is to best estimate a target function (f) that maps input data (X) onto output variables (Y). ➢ There are bunch of ML algo available. Choice depends on the specific problem. ➢ Tree based Ensemble algo (gradient tree boosting, random forest) are known to work best on wide variety of datasets ➢ In addition, Hyperparameters optimization of a given algo can sometime leads to significant improvement in predictive accuracy for most problems Popular ML algorithms: ○ Gaussian Naive Bayes (GNB) ○ Bernoulli Naive Bayes (BNB) ○ Multinomial Naive Bayes (MNB) ○ Logistic Regression (LR) ○ Stochastic Gradient Descent (SGD) ○ Passive Aggressive Classifier (PAC) ○ Support Vector Classifier (SVC) ○ K-Nearest Neighbor (KNN) ○ Decision Tree (DT) ○ Random Forest (RF) ○ Extra Trees Classifier (ERF) ○ AdaBoost (AB) ○ Gradient Tree Boosting (GTB) 10-fold CV balanced accuracy of each algorithm on a given dataset, with a lower ranking indicating higher accuracy. The rankings show the strength of ensemble-based tree algorithms in generating accurate models: The first, second, and fourth-ranked algorithms belong to this class of algorithms. Data driven advice to applying machine learning
  • 4. Deep Learning (DL) ➢ Deep Learning (DL) uses deep neural networks (DNNs) that are built via deep layering of connected artificial neurons, also called perceptrons ○ A neuron can be thought of a function that takes in multiple inputs and yields a single output. Types of function commonly used are: sigmoid, softmax, ReLu.. ➢ DNN functions define relationship between input and output layers, which is parameterized by weights. ○ Activation functions allows performing various learning tasks by reducing a cost function and adjusting parameter weights. Errors are minimized by adjusting weight (w) and bias (b) via gradient descent Features: ➢ DL Models can extract useful features from raw data, called Feature Learning ➢ DL models are trained to form a non-linear relationships ➢ DL models can be tweaked easily to avoid overfitting ➢ DL does as good a job in nonlinear dimensionality reduction than PCA ○ Autoencoder can recreate the image from low-dimensional codes ➢ NN layers and weights can be tweaked to implement Transfer Learning DL has proven to work best in the fields of: image (computer vision) and speech recognition, NLP, sentiment analysis, self driving and recommendation systems CNN in action
  • 5. Machine Learning - Scalability ➢ Vast majority of ML use cases are data parallel ➢ Parallel processing across multiple GPU/CPU can reduce the model training time. Parallel computation can be applied to: ○ Model training via ensembles of decision trees (DT) ○ Model Evaluation via resampling procedures like k-fold cross-validation ○ Tuning hyperparameters via grid/random search ➢ ML libraries that support multi-gpu model training: ○ XGBoost ○ LightGBM ○ Horovod - Trained convolutional Networks and LSTMs in hours instead of days or weeks ➢ Data Parallelism in DL ○ LSTM - One layer per GPU ○ Distributed SGD - SGD mini-batches over a pool of parallel workers by using learning rate adjustment as a function of minibatch size technique ➢ Model Parallelism in DL ○ Stacked LSTM Accurate,Large Minibatch SGD: Training ImageNet in 1 Hour
  • 6. GPU - General Purpose Computing ➢ Ability to program GPU in high level programming languages like C, C++ ○ No knowledge of graphics prog (OpenGL or DirectX) ○ Knowledge of CUDA language, modestly extended version of C ➢ CUDA program utilizes GPUs in conjunction with CPUs to accelerate compute heavy tasks ○ Application code runs on CPUs but can offload compute intensive task to GPUs, called CUDA kernel function ➢ Data parallel problems, common in ML/DL, fit well for GPU computation, where each data element can run in parallel and same kernel function can be applied to each data element ➢ Neural networks are created from identical neurons that are highly parallel by nature and rely heavily on matrix math operations, best supported on GPU. Significant speedup over CPU-only model training General Purpose GPU programming
  • 7. GPU vs. CPU GPU ● Thousands of smaller cores. Ideal for compute intensive parallel tasks or stream processing ● gpu is connected via PCI-e to system bus ● gpu offers much higher instruction throughput and memory bandwidth than cpu ● gpu has more transistors dedicated for data processing rather than data caching ● Physical gpu has 20-80 streaming multiprocessors (SM). Each SM can have hundreds of cores, that adds up to thousand of cores. Each core runs one thread ● Stream processing can get 10x performance speed up on gpu due to efficient memory access and higher level of parallel processing ● A systems can have multiple gpus. gpu-gpu communication is possible via NVidia NVLINK without going over PCIe bus. ● Each SM in gpu has on-chip 512 KB register file, 128 KB shared memory and off-chip 1.5 MB shared L2 ● gpu cores run in lock step mode, called warp. All threads in warp starts at the same program address Getting started with Nvidia CUDA GPUs CPU ● Fewer cores. Optimize for sequential serial processing ● cpu socket is directly attached to system bus ● Physical cpu socket can have multiple logical cores with each core has two hyperthread (HT) of execution ● A system can have multiple physical cpus. Inter-processor communication is via system bus ● Each cpu core has a dedicated L1/L2 and off-core shared L3 cache ● Each cpu run independently of each other ● Stream processing speed up is limited, ~1.5% improvement CPU GPU
  • 8. GPU - Performance Considerations ➢ Improve gpu utilization by reducing cpu (host) and gpu (device) memory transfers ○ For example: All stages of the Decision Tree construction can be efficiently performed on GPU ➢ Gradient Boosting works best on GPU. ML libraries, like XGBoost, are optimized to run all phases of training ○ Data compression, gradient calculation, feature quantization, prediction, decision tree construction and evaluation ➢ Scale computation across multiple GPUs on a system. Nvidia GPUs supports NVLink for inter-gpu communication, that offers 10x times higher throughput than communicating over PCIe bus ➢ Train model with mixed precision. Nvidia Tensor cores (Volta/Turing GPU) support mixed precision training ○ Lower precision than 32-bit floating point requires less memory and computation bandwidth ○ Math operations run faster in reduced precision ➢ GPU primitives can be used to compose more complicated algorithms while retaining high performance, readability and reliability. Simple algo can be used to build massively parallel algo. Some examples of parallel primitives: ○ Radix sort, Reduction Harris, Parallel prefix sum (scan), Segmented scan and reduce, Interleaved sequences (multi-reduce), interleaved sequences (multi-scan) ➢ GPUs are optimized for 32-bit floating point operations, but not for 64-bit double precision ○ 32-bit parallel and sequential summation show dramatically superior numerical stability ○ Errors of parallel summation has O(logn) complexity, as compared to O(n) for sequential summation Scan Primitives for GPU Computing
  • 9. Mason - Netflix ML Workflow and Orchestration ➢ Models should learn and adapt to new data as it arrives. ML workflow involves: Labeling -> Feature Generation -> Training -> Metrics ➢ At Netflix, Meson performs workflow orchestration and job scheduling, and Mesos is used for cluster management ○ Several ML pipelines are built to train and test recommendation algo. ➢ Meson supports: ○ Convenient authoring of workflow via Scala based DSL ○ Support ML specific constructs like: parallel parameters sweeping, cross validation, bootstrapping etc. ○ Support custom extensions to perform various tasks: Submit jobs to Spark cluster, query Hive tables, access to Netflix microservices and plugin visualizations
  • 10. Netflix - Metaflow ➢ Netflix python library for creating & executing DAGs (directed acyclic graph) as workflow. Each node in DAG is a processing step ○ Metaflow handles data flow and state transfers at each layer ➢ Gives user a freedom to design and implement their own code inside the DAG ➢ Makes it easy for ML workloads to interact with AWS cloud infrastructure like: storage, compute, notebooks or other UI.. ➢ Takes snapshot of the code, data and dependencies automatically. Ease of collaboration due to built-in versioning and logging ➢ Support for resuming workflows, reproducing past results, and inspecting workflow in a notebook ➢ Graphs can be large (fan-outs) with thousands of tasks in a single workflow ➢ Job scheduler layer (Meson, AWS Step functions) is responsible for orchestrating the workflow and assigning DAG to compute layer ○ Schedule steps in topological order ○ Making sure each step in graph is finished before executing next ○ Support trigger based (cron, external condition..) execution of workflows At Netflix scale, scheduler handles hundreds of thousands of active workflows
  • 11. Netflix - Notebook (Polynote) ➢ Notebook is a web tool popular among ML community for: ○ Sharing live code, visualization.. ○ Data cleaning, transformation, simulation, modeling.. ➢ Polynote is a new notebook system built at Netflix, that offers ○ IDE like features: autocomplete, parameter hints, in-line error highlighting.. ○ Parameterized notebook for building reusable templates ○ Polyglot language: Python, SQL, Scala ○ Apache Spark integration ➢ ML engineers are required to work with multiple languages: ○ Scala and Spark to generate training data (cleaning, subsampling,..) ○ Training model with Python ML libraries like tensorflow, scikit-learn.. ➢ Polynote improves notebook’s reproducibility and visibility features: ○ Keep notebook hidden state intact when cells are executed in any order ○ Dependency and configuration setup (Spark) are saved within notebook ○ Data Visualization with matplotlib and Vega Source: Polynote - an IDE-inspired polyglot notebook
  • 12. Netflix - Notebook Infrastructure Netflix users construct entire workflows in a notebook. To support varying use cases and automation, Netflix built a notebook infrastructure with open source and home grown projects: ➢ nteract : next gen react-based UI for Jupyter notebooks ➢ Meson: Netflix workflow orchestration platform ➢ Papermill : Library for parameterizing, executing and analyzing jupyter notebooks. ➢ Commuter: Service for viewing and sharing notebooks, stored on S3 ➢ Titus: Netflix container management platform ➢ Storage: S3, EFS ➢ Compute: All jobs are scheduled on container Beyond Interactive, Notebook innovation at Netflix
  • 14. Recommendation ➢ Netflix recommendation engine is responsible for: ○ Personalizing member ‘s home page ○ Recommending what shows to watch ○ Displaying artworks ➢ Various ML models are tested offline on historical viewing data to see if it would have improved recommendations. If it would, deploy a live A/B testing to see if it performs well in production ➢ Goal is to predict better what you want to watch before you watch it. ➢ All sorts of models are tested during exploration: ○ Logistic regression (2014) ○ Ensemble Model of Decision Trees (DT) ○ Trees and Very Large GBDT (xgboost) ○ FeedForward Neural Network (NN) ○ Recurrent NN ○ Convolutional NN ○ LTSM / Stacked LTSM
  • 15. Streaming Quality ➢ Netflix optimizes content delivery by a combination of intelligent caching and encoding recipes that incorporate: device capabilities, title complexity, geographical location and network bandwidth ➢ Viewing experience and streaming quality are enhanced by applying predictive models: ○ Device caching takes into account user immediate (20 seconds) viewing history to predict what next unwatched episode in series will be watched next ○ Network quality characterization and prediction to adopt video quality during playback ○ Actively monitoring constraints around resource usage like: device memory, available network bandwidth to reduce video start time ➢ By best predicting regional demands, video assets can be cached closer to subscriber location..and that reduces rebuffer events even with a higher quality streaming ➢ Remove redundancy in video encoded via Spatial and temporal prediction and correlation, resulting in less bandwidth requirements for delivering same quality video ➢ Content allocation algorithm to improve Netflix CDN hardware utilization Source: How Data Science Helps Power Worldwide Delivery of Netflix Content
  • 16. ➢ Netflix load balances subscriber load across multiple AWS regions ➢ Drop in SPS (Stream Per Second) metric triggers regional failover ➢ Linear regression model is used to predict the traffic that will be routed to savior regions ➢ Model is trained on historical scaling behavior of the microservice to predict level of scale up or system resources required to handle the load for that time of day. ➢ Regional failover takes into account geographical location of subscribers and capacity requirements of microservice to achieve graceful failover ➢ Failover efficiency ( 7 mins to failover the regions) is achieved by keeping enough dark capacity online in each region that meets service scaling requirements for that day ➢ Dark capacity is whitelisted to take production traffic at failover Regional Failover
  • 17. Resource Management (Predictive container placement) ➢ Optimum container placement using combinatorial optimization and ML instead of solely relying on Linux CFS scheduler to make placement decisions ➢ Allocate containers closer to compute resources by detecting optimum collocation opportunities ➢ Gradient boosting ML model is trained via LightGBM library on container cpu usage data. Model predicts 95 percentile cpu usage of each container for next 10 minutes via condition quantile regression ○ Container metadata (image, app name, memory, net..) along with time series cpu usage for last hour are used for model training. ○ Model prediction is fed into MIP (Mix Integer Programming) that spits out the optimized placement. Container isolation is applied by cgroup cpusets changes. ➢ Type of constraints applied to placement decisions: ○ Assign all tasks within a container to same socket to avoid numa latencies ○ Container is assigned a minimum of one core to avoid core and L1/L2 cache sharing ○ Spread different containers across sockets, if possible to reduce shared L3 cache contention ○ Not to modify placement of running container when adding/removing containers Predictive CPU isolation of containers at Netflix Container tasks runtime distribution with and without improved isolation. Less outliers with container isolation applied
  • 18. Anomaly Detection ➢ Anomaly detection systems are optimized for higher precision ( reduce false detection) while maintaining recall (true anomaly) ➢ Outliers are points in data that exhibit significantly different properties than the majority of the points, commonly used for detecting anomaly in areas: ○ suspicious financial or credit card transactions, ○ Traffic violation and management ○ Network intrusions or hacking. Surveillance ○ Health monitoring ○ Event detection in time series and sensor data ➢ Netflix device reliability team apply statistical and predictive modeling to prioritize device reliability issues by controlling various covariates ➢ Models are trained with past incidents that are labeled False and True (known to be real issue and actionable). ○ Incident data is high dimensional with a rich structure to reliably determine the root cause ○ Trained model predicts the likelihood that if a given set of measured conditions constitutes a real problem. ➢ Netflix data team uses Robust Anomaly Detection (RAD) algo to detect anomalies in high cardinality Big Data. ○ RAD algo is being used at Netflix to detect anomaly: failures in receiving bank payments and to identify subscriber sign up problems across devices and browsers
  • 19. Capacity Forecasting ➢ Regression model that predicts Netflix microservice RPS (Request per Second) by identifying its relationship with system resources (cpu, mem, net, io) ○ Model is trained with system resource usage (features) and RPS (label) metrics of popular Netflix services. Additional dimensions like: AWS region, time of day can be added to make more precise prediction ○ Service metrics (last 2 weeks) are fetched from Netflix telemetry system (Atlas) to retrain the model ➢ Helps with capacity planning by forecasting system resource needed that can scale with a service RPS growth. ➢ Trained model is deployed as a WebApp or microservice to help Netflix service team to estimate cloud cost increase in relation to service RPS changes Feature (cpu, mem, io..) correlation with RPS
  • 21. References➢ Netflix Machine Learning - Techblog at Medium ➢ General Purpose GPU Computing. Getting Started with Nvidia CUDA ➢ Using Machine Learning to improve Streaming Quality at Netflix ➢ How Data Science Helps Power Worldwide Delivery of Netflix Content ➢ Telltale: Netflix Application Monitoring Simplified ➢ Predictive CPU isolation of Containers at Netflix ➢ Mason - ML Workflow Orchestration at Netflix ➢ MetaFlow, a Human-Centric Framework for Data Science, Metaflow and AWS Step Functions, Metaflow Docs ➢ Polynote - IDE inspired Polyglot Notebook ➢ Scheduling Notebooks at Netflix ➢ Justin Basilico Presentations on Netflix Personalization ➢ Introduction to Causality in Machine Learning. How Netflix Applies Computation Causal Inference ➢ Comparing Popular ML Algorithms on different Datasets. When to use a particular ML Algorithms ➢ Model Selection for Machine Learning ➢ Strength and Weaknesses of ML Algorithms ➢ Dimensionality Reduction, Feature Selection and Extraction ➢ How to use Gradient Boosting Libraries, XGBoost, LightGBM using Scikit-Learn ML framework ➢ Model Learning Rate tuning when training Deep Learning Neural Networks ➢ 4 Automatic Outlier Detection Algorithms in Python ➢ 17 Statistical Hypothesis Tests needed in ML ➢ Understand Intuitively Different ML Classification Algorithm Principles ➢ Understand Intuitively How Neural Networks Work ➢ Model Performance Validation and Metrics ➢ Convolutional Neural Network (CNN) in action ➢ Distributed Training via Large Minibatch SGD ➢ Scan Primitives for GPU Computing. GPU Primitives for implementing popular algorithms like sorting, prefix sum.. ➢ GPU parallel programming using Cuda. Free online class at Udacity