Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Paris ML meetup

72 205 vues

Publié le

Slides for ML @ Netflix (Paris ML meetup talk)

Publié dans : Ingénierie, Technologie
  • Do This Simple 2-Minute Ritual To Loss 1 Pound Of Belly Fat Every 72 Hours ◆◆◆ http://scamcb.com/bkfitness3/pdf
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • What are the best natural remedies for hair loss? ▲▲▲ https://tinyurl.com/y49r9d8j
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Sharpen your mind with brain pill. learn more info.. ➤➤ https://tinyurl.com/brainpill101
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • How One Woman Discovered the Female Fat-Loss Code Missed by Modern Medicine And Lost 84 pounds Using a Simple 2-Step Ritual That Guarantees Shocking Daily Weight Loss >> ★★★ http://scamcb.com/poundinc/pdf
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Have You Seen Mike Walden's new holistic acne System yet? It's called Acne No More I've read the whole thing (all 223 pages) and there's some great information in there about how to naturally and permanently eliminate your acne without drugs, creams or any kind of gimmicks. I highly recommend it - it's very honest and straightforward without all the hype and b.s. you see all over the net these days. Here's the website where you can get more information ◆◆◆ https://tinyurl.com/ybbtmvh8
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Paris ML meetup

  1. Machine Learning @ Netflix (and some lessons learned) Yves Raimond (@moustaki) Research/Engineering Manager Search & Recommendations Algorithm Engineering
  2. Netflix evolution
  3. Netflix scale ● > 69M members ● > 50 countries ● > 1000 device types ● > 3B hours/month ● 36% of peak US downstream traffic
  4. Recommendations @ Netflix ● Goal: Help members find content to watch and enjoy to maximize satisfaction and retention ● Over 80% of what people watch comes from our recommendations ● Top Picks, Because you Watched, Trending Now, Row Ordering, Evidence, Search, Search Recommendations, Personalized Genre Rows, ...
  5. ▪ Regression (Linear, logistic, elastic net) ▪ SVD and other Matrix Factorizations ▪ Factorization Machines ▪ Restricted Boltzmann Machines ▪ Deep Neural Networks ▪ Markov Models and Graph Algorithms ▪ Clustering ▪ Latent Dirichlet Allocation ▪ Gradient Boosted Decision Trees/Random Forests ▪ Gaussian Processes ▪ … Models & Algorithms
  6. Some lessons learned
  7. Build the offline experimentation framework first
  8. When tackling a new problem ● What offline metrics can we compute that capture what online improvements we’ re actually trying to achieve? ● How should the input data to that evaluation be constructed (train, validation, test)? ● How fast and easy is it to run a full cycle of offline experimentations? ○ Minimize time to first metric ● How replicable is the evaluation? How shareable are the results? ○ Provenance (see Dagobah) ○ Notebooks (see Jupyter, Zeppelin, Spark Notebook)
  9. When tackling an old problem ● Same… ○ Were the metrics designed when first running experimentation in that space still appropriate now?
  10. Think about distribution from the outermost layers
  11. 1. For each combination of hyper-parameter (e.g. grid search, random search, gaussian processes…) 2. For each subset of the training data a. Multi-core learning (e.g. HogWild) b. Distributed learning (e.g. ADMM, distributed L-BFGS, …)
  12. When to use distributed learning? ● The impact of communication overhead when building distributed ML algorithms is non-trivial ● Is your data big enough that the distribution offsets the communication overhead?
  13. Example: Uncollapsed Gibbs sampler for LDA (more details here)
  14. Design production code to be experimentation-friendly
  15. Idea Data Offline Modeling (R, Python, MATLAB, …) Iterate Implement in production system (Java, C++, …) Missing post- processing logic Performance issues Actual outputProduction environment (A/B test) Code discrepancies Final model Data discrepancies Example development process
  16. Avoid dual implementations Shared Engine Experiment code Production code ProductionExperiment
  17. To be continued...
  18. We’re hiring! Yves Raimond (@moustaki)

×