Copyright © 2015 Criteo
New challenges for scalable machine
learning in online advertising
Olivier Koch
Engineering Progra...
Copyright © 2015 Criteo
What we do
2
Advertiser Publisher
Copyright © 2015 Criteo
Machine learning applications at Criteo
• Bidding (2nd price auctions)
• Product recommendation
• ...
Copyright © 2015 Criteo
Machine learning at Criteo
• Supervised learning using standard regression methods / optimization ...
Copyright © 2015 Criteo
The good news
• New generations of algorithms
• NLP (word embeddings), reinforcement learning, pol...
Copyright © 2015 Criteo
The good news (c’ed)
• A lot of data is available
• Interactions with banners : clicks
• Interacti...
Copyright © 2015 Criteo
Now what?
Copyright © 2015 Criteo
Challenges in online advertising 1/3
• The technical debt of large-scale machine learning systems
...
Copyright © 2015 Criteo
Challenges in online advertising 2/3
• We want to provide a better online advertising experience
•...
Copyright © 2015 Criteo
Challenges in online advertising 3/3
• Credit assignment and incrementality
• Several clicks might...
Copyright © 2015 Criteo
Machine learning to the rescue
• Offline metrics – counterfactual analysis
• Optimal bidding strat...
Copyright © 2015 Criteo
Machine learning to the rescue
• Offline metrics – counterfactual analysis
• Optimal bidding strat...
Copyright © 2015 Criteo
Offline metrics – counterfactual analysis
• Option 1 : run a controlled experiment (AB test)
• How...
Copyright © 2015 Criteo
Optimal bidding strategies
• A user is seen more than 20 times a day on average
• Each action we t...
Copyright © 2015 Criteo
Conclusions
• Machine learning applies well to online advertising at scale
• New algorithms, new i...
Copyright © 2015 Criteo
Thanks! Questions?
o.koch@criteo.com
Dataset released: http://bit.ly/criteodata
Prochain SlideShare
Chargement dans…5
×

New challenges for scalable machine learning in online advertising

765 vues

Publié le

Presentation of the new challenges for scalable machine learning in online advertising, with a focus on counterfactual analysis and optimal bidding strategies.

Publié dans : Technologie
  • Soyez le premier à commenter

New challenges for scalable machine learning in online advertising

  1. 1. Copyright © 2015 Criteo New challenges for scalable machine learning in online advertising Olivier Koch Engineering Program Manager, Criteo ICML Online Advertising Systems Workshop June 24, 2016
  2. 2. Copyright © 2015 Criteo What we do 2 Advertiser Publisher
  3. 3. Copyright © 2015 Criteo Machine learning applications at Criteo • Bidding (2nd price auctions) • Product recommendation • Banner look and feel selection
  4. 4. Copyright © 2015 Criteo Machine learning at Criteo • Supervised learning using standard regression methods / optimization algorithms (SGD, L-BFGS) • Distribution on Hadoop (MapReduce, Spark) • 3B displays / day • 40 PB of data -- 15,000 servers • 7 data centers worldwide
  5. 5. Copyright © 2015 Criteo The good news • New generations of algorithms • NLP (word embeddings), reinforcement learning, policy learning, deep networks • Releases of ML infrastructures • Caffe on Spark, TensorFlow, Torch, PhotonML, GPUs inside clusters → strong traction in the academic/industrial community
  6. 6. Copyright © 2015 Criteo The good news (c’ed) • A lot of data is available • Interactions with banners : clicks • Interactions with products/advertisers : sales, baskets, home views, listings, visit history • New data is coming • Mobile, cross-device, (offline)
  7. 7. Copyright © 2015 Criteo Now what?
  8. 8. Copyright © 2015 Criteo Challenges in online advertising 1/3 • The technical debt of large-scale machine learning systems • AB tests = snapshots. Are we missing long term effects? • Some models become hard to improve. Are we overfitting or using the wrong metrics? • We need to deal with a growing number of models – e.g. automate feature engineering
  9. 9. Copyright © 2015 Criteo Challenges in online advertising 2/3 • We want to provide a better online advertising experience • Personalized • Cross-device • Long tail (new users, new products)
  10. 10. Copyright © 2015 Criteo Challenges in online advertising 3/3 • Credit assignment and incrementality • Several clicks might be needed to generate a sale • We should probably optimize a series of bids as opposed to single bids • What is the optimal credit assignment scheme? • We optimize what clients give us • Attributed sales may not be the right target • Global sales increase are noisy
  11. 11. Copyright © 2015 Criteo Machine learning to the rescue • Offline metrics – counterfactual analysis • Optimal bidding strategies under uncertainty -- reinforcement learning • Classification/prediction of time series • Long tail (users, products) -- transfer learning, factorization • Probabilistic match of devices
  12. 12. Copyright © 2015 Criteo Machine learning to the rescue • Offline metrics – counterfactual analysis • Optimal bidding strategies under uncertainty -- reinforcement learning • Classification/prediction of time series • Long tail (users, products) -- transfer learning, factorization • Probabilistic match of devices
  13. 13. Copyright © 2015 Criteo Offline metrics – counterfactual analysis • Option 1 : run a controlled experiment (AB test) • How would the system behave if I replaced model M by model M*? • Takes time to conclude • Costs money if M* is worse than M (often) • Does not measure long-term effects • Option 2 : use counter-factual analysis • How would the system have performed if, when the data was collected, we had replaced model M by model M∗? • Requires real-time randomization -- cost/exploration trade-off • Works best when M* is close to M • Trades time for computation and storage • Ignores future users’ and advertisers’ reactions Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising, Bottou et al.
  14. 14. Copyright © 2015 Criteo Optimal bidding strategies • A user is seen more than 20 times a day on average • Each action we take has an impact on the user, the advertiser and the competition • Option 1 : model the environment and bid accordingly • Cannot go beyond the proxy being optimized • Option 2 : no model, randomized experiments • Hard problem : very high-dimensional state space and very sparse rewards
  15. 15. Copyright © 2015 Criteo Conclusions • Machine learning applies well to online advertising at scale • New algorithms, new infrastructures and more data are coming • A number of challenges remain unresolved… • … come help us solve them!
  16. 16. Copyright © 2015 Criteo Thanks! Questions? o.koch@criteo.com Dataset released: http://bit.ly/criteodata

×