Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Copyright © 2014 Criteo
Making Advertising Personal
Large Scale Real-Time Product Recommendation at Criteo
Olivier Koch & ...
Copyright © 2014 Criteo
Performance Advertising
We buy
• Inventory ! (ad spaces)
• Billions of times a day
• All over the ...
Copyright © 2013. Confidential
International infrastructure
Key figures
Traffic
550 k HTTP requests / sec (peak activity)
...
Copyright © 2012. Confidential 4
Arbitrage
•Should we bid?
•At which price?
Recommendation
•Which products should
we displ...
Copyright © 2014 Criteo
Building ads with
personalized content in realtime
Recommendation for Advertising
2.5 billion prod...
Copyright © 2014 Criteo
Data Sources
• Catalog data
• Feed provided by the merchants
• User behavior data
• Large scale in...
Copyright © 2012. Confidential
7
Recommendation execution flow
Candidates
Generation
• Get candidates from all
sources usi...
Copyright © 2014 Criteo
Issue #1: Retrieving user-specific products
• The need: storing [user → interesting products] vect...
Copyright © 2014 Criteo
Issue #2: Scoring products
• Need to fuse data from several sources
• Product-specific
• User-spec...
Copyright © 2014 Criteo
Issue #3: Picking the right products
• Several questions:
• User-product fatigue
• Independent pro...
Copyright © 2014 Criteo
Issue #4: 8ms response time ! Tweaking for performance
• CTR / Sales prediction takes ~40 µs per c...
Copyright © 2014 Criteo
Upcoming challenges
• Long(er)-term user profiles
• More and better product information (images, s...
Copyright © 2014 Criteo
Fancy a try ?
13
On your own:
With us !
http://www.criteo.com/careers/
• Our 1st public dataset is...
Copyright © 2014 Criteo
Questions?
Copyright © 2014 Criteo
Thank you !
o.koch@criteo.com
r.lerallut@criteo.com
Prochain SlideShare
Chargement dans…5
×

Making advertising personal, 4th NL Recommenders Meetup

381 vues

Publié le

We present the Criteo recommendation stack

Publié dans : Technologie
  • Soyez le premier à commenter

Making advertising personal, 4th NL Recommenders Meetup

  1. 1. Copyright © 2014 Criteo Making Advertising Personal Large Scale Real-Time Product Recommendation at Criteo Olivier Koch & Romain Lerallut May 19th, 2015
  2. 2. Copyright © 2014 Criteo Performance Advertising We buy • Inventory ! (ad spaces) • Billions of times a day • All over the Internet • For 95% of the population Where is the need for tech ? We sell • Clicks ! • (that convert) • (that convert a lot) We take the risk You pay only for what you get
  3. 3. Copyright © 2013. Confidential International infrastructure Key figures Traffic 550 k HTTP requests / sec (peak activity) 23000 impressions /sec (peak activity) 180 k requests / sec on RTB (average) Less than 10 ms to process an RTB request 3 Figures of May2013 Physical infrastructure 6 Data centers on 3 continents operated and conceived in- house ~ 12000 servers, largest Hadoop cluster in Europe Availability / Uptime >99.95% More than 20 PB of storage Big Data
  4. 4. Copyright © 2012. Confidential 4 Arbitrage •Should we bid? •At which price? Recommendation •Which products should we display? Graphical optimization •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data Global Engine Architecture
  5. 5. Copyright © 2014 Criteo Building ads with personalized content in realtime Recommendation for Advertising 2.5 billion products ~ 8ms response time
  6. 6. Copyright © 2014 Criteo Data Sources • Catalog data • Feed provided by the merchants • User behavior data • Large scale intent data • All visits to merchant websites • Page views, basket, sales events • Ad display data • Displayed and clicked ads 6
  7. 7. Copyright © 2012. Confidential 7 Recommendation execution flow Candidates Generation • Get candidates from all sources using user historical products and bestofs Candidates Aggregation • Remove duplicates • Aggregate features Preselection • Call degraded prediction to decrease number of candidates Scoring • Call full prediction model to score each candidate Winner Selector • Select N products on score, randomization Glup Logging • Log recommended products, prediction variables
  8. 8. Copyright © 2014 Criteo Issue #1: Retrieving user-specific products • The need: storing [user → interesting products] vectors • Difficult to store and retrieve at scale (900+ mln users) • Hard to keep up to date • Using seen products as a proxy • Store the [user → viewed products] vectors • Easier to maintain • Store [viewed product → interesting products] vectors • Based on aggregated user behavior data • Computable offline • Final ranking by a ML model 8
  9. 9. Copyright © 2014 Criteo Issue #2: Scoring products • Need to fuse data from several sources • Product-specific • User-specific • User-product interactions • Display-specific • 1st solution: regression model • Predict P(product click then sale) • Easy to evaluate • 2nd solution: ranking model • More appropriate for our needs • Still maximizing post-click sales • Can be evaluated only on multi-product banners 9
  10. 10. Copyright © 2014 Criteo Issue #3: Picking the right products • Several questions: • User-product fatigue • Independent product choice assumption • Explore / exploit • Solution 1: Randomization in the banners • Keep independent products assumption • Separate optimization process to shuffle displayed items • Solution 2: A better scoring model • Score a full banner, not independent products • Store all product display counts 10
  11. 11. Copyright © 2014 Criteo Issue #4: 8ms response time ! Tweaking for performance • CTR / Sales prediction takes ~40 µs per candidate using in-house library • 2-step prediction: • A fast pass to remove most of the candidates • A slow pass to score accurately the remaining candidates And the technical fine print: • All real-time code in C# • Async I/O for better efficiency • HAProxy to scale the front-end • Memcached to store all required data in memory 11
  12. 12. Copyright © 2014 Criteo Upcoming challenges • Long(er)-term user profiles • More and better product information (images, semantic, NLP) • Vertical-optimized engine • Classifieds (catalog-free recommendation ?) • Travel • Instant-update of similarities • (because batch computation is soooo last year) 12
  13. 13. Copyright © 2014 Criteo Fancy a try ? 13 On your own: With us ! http://www.criteo.com/careers/ • Our 1st public dataset is online: http://bit.ly/1vgw2XC • 4GB display and click data, Kaggle challenge in 2014 • NEW : 1TB dataset released a few weeks ago • Hosted on Microsoft Azure, just waiting for you
  14. 14. Copyright © 2014 Criteo Questions?
  15. 15. Copyright © 2014 Criteo Thank you ! o.koch@criteo.com r.lerallut@criteo.com

×