Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Machine Learning at Netflix Scale

1 281 vues

Publié le

Netflix is the world’s leading Internet television network with over 48 million members in more than 40 countries enjoying more than one billion hours of TV shows and movies per month, including original series. Netflix uses machine learning to deliver a personalized experience to each one of our 48 million users.

In this talk you will hear about the machine learning algorithms that power almost every part of the Netflix experience, including some of our recent work on distributed Neural Networks on AWS GPUs. You will also get an insight into the innovation approach that includes offline experimentation and online AB testing. Finally, you will learn about the system architectures that enable all of this at a Netflix scale.

Publié dans : Ingénierie, Technologie, Formation
  • Soyez le premier à commenter

Machine Learning at Netflix Scale

  1. 1. Machine Learning At Netflix Scale Aish Fenton Manager - Research Engineering @aishfenton
  2. 2. Everything is a recommendation
  3. 3. 4
  4. 4. Top Picks for Aish
  5. 5. Movies based on books
  6. 6. Because you watched Bob’s Burgers
  7. 7. Rank based on your taste Rankbasedonyourtaste
  8. 8. 75% of plays come from homepage
  9. 9. Back Story…
  10. 10. Proxy question: ▪ Accuracy in predicted rating ▪ Improve by 10% = $1million! What we were interested in: ▪ High quality recommendations predicted actual
  11. 11. SVD RBMs Top two results still used in production!
  12. 12. >
  13. 13. 2006 2013
  14. 14. • > 44M members • > 40 countries • > 5B hours in Q3 2013 • Log 100B events/day • 31.62% of peak US downstream traffic
  15. 15. Data and Models
  16. 16. ▪ > 40M subscribers ▪ Ratings: ~5M/day ▪ Searches: >3M/day ▪ Plays: > 50M/day ▪ Streamed hours: o 5B hours in Q3 2013 Geo Info Time Impressions Device Info Metadata Social Ratings Demographics Member Behavior Plays
  17. 17. Aish House of Cards Latent User Vector Latent Item Vector
  18. 18. 3.53 RU M u1 u2 u3 m1 ! m2! m3 House of Cards Aish Aish House of Cards
  19. 19. Mean Rating My Bias Movie Bias Interaction
  20. 20. Mean Rating My Bias Movie Bias Interaction 3.55 = 2.50 + -1.5 + 1.2 + pq My rating for House of Cards
  21. 21. R 3.53 U M u1 u2 u3 m1 ! m2! m3 House of Cards Aish 2.35 1.34 Time T t1 t2 t3 Time
  22. 22. ▪ Matrix/Tensor Factorization ▪ Regression models (Logistic, Linear, Elastic nets) ▪ Factorization Machines ▪ Restricted Boltzmann Machines ▪ Markov Chains & other graph models ▪ Clustering / Topic Models ▪ Neural Networks ▪ Association Rules ▪ GBDT/RF ▪ …
  23. 23. Popularity + Ratings + More Features & Optimized Models 0% 50% 100% 150% 200% 250% 300% Improvement Over Baseline
  24. 24. Anatomy of a Machine Learning Platform
  25. 25. Problem Data Experiment Offline Produce Model Test / Metrics
  26. 26. Near-line Online UI Clients Event Distribution Online Algs Model Trainer Pre- compute AB Test Metrics API Layer Monitoring Offline Hadoop / Data Warehouse Experimentation Platform S3 / HDFS Offline Metrics Query Tools Models Models
  27. 27. Near-line Online UI Clients Event Distribution Online Algs Model Trainer Pre- compute AB Test Metrics API Layer Monitoring Offline Hadoop / Data Warehouse Experimentation Platform S3 / HDFS Offline Metrics Query Tools Models Models
  28. 28. ▪ App Logs ▪ User Actions ▪ Ratings ▪ Plays ▪ Queue Adds ▪ Algo Actions ▪ Impressions (Presentation Bias) ▪ Context ▪ Device Info ▪ User Demographics ▪ Social ▪ Time ▪ … Many different types of data…
  29. 29. Near-line Online UI Clients Event Distribution Online Algs Model Trainer Pre- compute AB Test Metrics API Layer Monitoring Offline Hadoop / Data Warehouse Experimentation Platform S3 / HDFS Offline Metrics Query Tools Models Models Embedded Embedded
  30. 30. Weights Real-time popularity of movie
  31. 31. Example: Neural Network Training
  32. 32. θ Input OutputHidden Layer
  33. 33. Input OutputHidden Layers
  34. 34. Neural Network Training 1,536 cores G2 Instances $0.60 p/h
  35. 35. But… things can go astray
  36. 36. Near-line Online UI Clients Event Distribution Online Algs Model Trainer Pre- compute AB Test Metrics API Layer Monitoring Offline Hadoop / Data Warehouse Experimentation Platform S3 / HDFS Offline Metrics Query Tools Models Models
  37. 37. RU M Pre-compute u1 u2 u3Online
  38. 38. Near-line Online UI Clients Event Distribution Online Algs Model Trainer Pre- compute AB Test Metrics API Layer Monitoring Offline Hadoop / Data Warehouse Experimentation Platform S3 / HDFS Offline Metrics Query Tools Models Models Aish played HoC Publish new model for Aish
  39. 39. Aish Fenton @aishfenton https://www.linkedin.com/profile/view?id=47917219

×