Explain the role of a Software Engineer in a tech company like Criteo for students of last year (graduate degree M2) at Grenoble INP - Ensimag, a top french computer engineer school in order to choose his/her career professional path.
See https://ensimag.grenoble-inp.fr/
2. 2 •
What is Criteo ?
The leading advertising platform for the open internet
Open Internet AI* Engine E-commerce
Dataset
Criteo was founded in 2005
+2700 employees with +650 in R&D
See more details
AMERICAS EMEA
APAC
Publisher
Access
Advertiser
Platform
*: Artificial Intelligence
Source Criteo in 2019
30 locations in the world
with Paris (FR), Grenoble (FR), Ann-Arbor (USA)
as R&D offices
See more details
3. 3 •
" I am Software Engineer @Criteo R&D*
in the Criteo AI Lab, more precisely in UC** Team "
https://ailab.criteo.com
*: Research & Development
**: Universal Catalog
4. 4 •
2016
2014
2017
2020
Information Systems
Engineering specialization
Startup experience
as Web Software Engineer
Post master's degree in data
science and big data
Software Reliability Engineer
Software Engineer
at Criteo AI Lab
6. 6 •
Criteo demo
Ad choices
" Here an online ad "
Publisher website
Advertiser website
7. 7 •
Who are the top acting companies of the online ad world?
go, Go, GO!
Source SimilarTech* online data
*Bing for Microsoft, Double click is Google, Taboola bought Outbrain, Amazon new actors should be present
for 10K sites (data viewed in 2020-10 but should in ~2018)
Ads market share
8. 8 •
How works of the online ads ?
source ad-exchange.fr
DSP: Demand-Side Platforms
SSP: Supply-Side Platforms
Cash flow
go, Go, GO!
CTR* 1%
x2 in relation to competitor's
average
*: Click Through Rate
Cash flow
9. 9 •
What are products provided by Criteo ?
https://marketing.criteo.com
Advertiser
Platform
Criteo is a full DSP, our main business
partners are advertisers:
• Import products
• Manage campaigns
with budget & audience rules
• Analyze results
• Create ads
https://pmc.criteo.com
Publisher
Access
Our business partners are
publishers, but Criteo can use other
SSPs to provides ads.
11. 11 •
How the users interact with the online ads ?
RTB: Real Time Bidding CAS: Criteo Ads Server CAT : Criteo Ads Targeting
CRITEO
INTERNET
Billing
Views
Displays Clicks
Events
View, List, Basket, Sale
Auctions &
Biddings
Loading script
Browsing
Open Internet
Won auction
static.criteo.net
12. 12 •
How the universal catalog is used ?
Publisher Direct Access
& SSPs
RTB
Render
Ads Creator Reco*
Universal Catalog
+ 12B products
Audience Budget
*: Recommendation
User Web Client
Arbitrage
CAS CAT
Campaign
Internal Criteo Network
The Internet
Advertiser
14. 14 •
What and who is the Criteo AI Lab ?
• R&D department
• Machine Learning, said ML
• Researchers & Software Engineers
Infrastructure
Product Engineering
Site Reliability Engineering
Product
Engineering
Engineering
Pprogram
Management
Research & Development
Product
Engineering
15. 15 •
How and why is the Criteo AI Lab ?
• 4 groups of teams
• Provide ML state-of-art for Criteo
• Academic contributions & visibility
Criteo AI Lab Structure
Product
Engineering
Research CAML**
ML Platform
Recommendation
**: Criteo Applied Machine Learning
*: Universal Catalog
UC* Team
16. 16 •
A yearly kick-off for the Criteo strategy. We have a 9
months plan, several Objective Key Results (OKRs) per
quarter, bi-weeks scrum sprint, and daily tasks.
Organization of a team
" Every team is owner of its own daily organizations with a common culture "
Team members
EPM*
Manager
Team lead
*: Engineering Program Manager
Software Engineer
Product
Owner
17. 17 •
Workday
8h-10h Start
• Development: Single/Pair/Mob programming for maintenance,
tech debt, features, hot fixes
• Meeting: Demo, Sharing Knowledge, Brainstorming, Project,
1:1 team lead or manager
• Communication: Email/Slack Questions, News
• Documentation: User/Developer/Design/Code/Organization
• Event: Social, Conference, CAIL/R&D All Hands, CTF*, Hackathon
• Learning: Online courses, blog articles reading, competition
• Break: coffee, lunch
17h-21h End
*: Capture The Flag
18. 18 •
Used Tools
Instant messaging Code versioning
Presentation, Email,
Calendar management
Programming language
Online meetings
platform
Ticketing
management
Documentation
management
Feedback platform
Award platform
Integrated
Development
Environment
19. 19 •
Software Engineer Skills
Feedback processes 2 times in the year:
middle of year and end of year by your peers, from a
matrix of levels (junior, senior, staff, senior staff, principal, …) based
on these 10 skills.
Hard skills
Soft skills
20. 20 •
Interactions
Research
*: Engineering Program Manager
Software Engineer
Data scientist
Software Engineer
Site Reliability Engineer
Product Analyst
Manager
EPM* Product Owner
Users
22. 22 •
Our mission ?
Outcome
Universal Catalog
+ 12B enriched products
Advertiser catalogs
+30K catalogs
Merge and unify all advertiser catalogs to a universal catalog.
23. 23 •
How to build this universal catalog ?
Product Model Prediction
Enriched
Product
Simple processing
Build the universal catalog for Criteo business
with machine learning and data processing algorithms.
24. 24 •
What are the features of an enriched product ?
Outcome
Provided features
vendor
id
title
description
category
brand
price
universal brand
universal category
gender
price in euros
price range
Product Enriched Product
vendor
id
title
description
category
brand
price
Enrichments
25. 25 •
What is our data ?
Universal Catalog
+ 30K products catalogs
+ 12B products
12 languages
Outcome
Product Universal
Categories
+5K
Product Universal
Brands
+60K
E-commerce
Dataset
26. 26 •
What is the universal category model ?
AI Engine
Deep Learning model
title
description
Product
Predicted universal
leaf category
Supervised model for classification with K classes
27. 27 •
What is the technical environment ?
annotate
products
Import catalogs
meta store
ML labs & experiments
models
metadata
deploy model
sample
products
enrich the products
with predictions
or simple processes
feed
data sets
get data sets
Annotation API & UI
Jobs scheduler
Advertiser catalogs
data sets
AI Engine
data warehouse
28. 28 •
What are our components ?
• Scheduler with a Spark job
• Web Application
• Machine learning lab
29. 29 •
Build
Tools Server
CI/CD*
Server
Review
Server
Gerrit server
Artifact
stores
Deployment
Server
Container
platform
*: Continous Integration/Continous Delivery
Workstation
What is the development cycle and the pipeline for “go to (pre-)production” ?
Preprod or prod?
Datacenter(s)?
.pex
.jar
30. 30 •
What is the production environment ?
Container
platform
meta store
Container
platform
models
data warehouse
metadata
universal catalog
databases
Spark job
enricher
Jobs
scheduler
31. 31 •
What is the technical stack ?
Jobs
Web applications
ML labs & experiments
Analytics Monitoring
Container platforms Storages
33. 33 •
" We are recruiting ! "
Already +20 graduates here
" Join us ✌️ "
Criteo Tech blog
Criteo Open Positions
criteo.com
34. Q & A
go, Go, GO!
g.legoux@criteo.com
@gilleslegoux
35. 35 •
Criteoers* contribute and create regularly
open source projects , but we have some internal
projects to keep advance on our competitors!
*: name for the employees of Criteo
Criteo GitHub
Open source projects
See more details
Criteo Gitlab
Experiment internal projects
Criteo Gerrit
Production internal projects
What's about open source?
" We love Open Source projects "
36. 36 •
One situation by location, but remote work is "strongly
advised" until June 2021 for Paris and Grenoble. We have
a small impact on our business due to Covid-19.
What's happen with Covid-19 ?
" Everyone is safe , business is good "
Covid-19 vs Criteo
See more details
37. 37 •
Each team has a part of this common tech stack,
and can use any tech for experiments.
What's about your technical stack ?
" It depends on your team and mission, but
we have a common tech stack! "
Criteo Tech Stack
See more details
38. 38 •
We have 1 kickoff, 1 hackathon (3 days) and 2 conferences per
year, an onboarding with datacenter visit, paying external
trainings or internal trainings, tool licenses, matrix levels (SRE,
SDE, ML ENG, ...), 3 voyager programs, peer feedbacks every 6
months with promotion process … See working in R&D to join us
What's about professional career and experience life at Criteo ?
" Become a complete happy engineer! "
Criteo Experience life
See more details
41. 41 •
" We are sensible at these questions "
Criteo is also a society project, not only a company
for the open internet! See our values and cares .
Save environment
See more details
Respect private data
See more details
42. 42 •
" Criteo in digits ? "
See more details
The development team of the future at Criteo
Here are a few figures, because we like data, yes indeed we do:
• 15 datacenters (9 with computing capacity + 6 dedicated to network connectivity)
across US, EU, APAC
• More than 35K servers, running a mix of Linux and Windows
• One of the largest Hadoop clusters in Europe with close to 171 PB of storage and 42.000 cores
• 250B HTTP requests and close to 4B unique banners displayed per day
• 130Gbps of bandwidth, half of it through peering exchanges
• Respond to bids in 80ms or less, 24/7
• Close to 4M HTTP requests per second handled during peak times
• Less than 10ms on average to select optimal campaign
• 10ms to find best product in catalogue of hundreds of millions of products
• Tens of TB of new data stored daily
• Largest public Machine Learning Dataset in the world with over 4 billion lines and over 1TB in size
•Technologies: Hadoop, Couchbase, Redis, Mesos, Kafka, Storm, Cassandra, Spark, Vertica, Druid, …
Source Criteo in 2019
43. 43 •
" What are the Criteo datacenters ? "
Source Criteo in 2020
44. 44
" How a data center is installed at Criteo ? "
You can visit it !
go, Go, GO!