Large-scale recommendation, a random point of view

•Télécharger en tant que PPTX, PDF•

0 j'aime•15 vues

This is an invited talk I gave at a Graduation ceremony for the Ecole Polytechnique EXed Data science program, July 2d 2019.

Données & analyses

Anne-Marie Tousch
Senior Research Scientist, Criteo AI Lab
July 2d, 2019
Large-scale recommendation
a random point of view
@amy8492

Large-scale recommendation
a random point of view

4 •
Retargeting
User browses for
products

5 •
Retargeting
User browses
Criteo buys the ad
placement

6 •
Retargeting
User browses
Criteo buys the ad
placement
Client pays for clicks

9 •
Large scale recommendation : cross-advertiser case

10 •
Large scale recommendation in a nutshell

17 •
Randomized SVD
Halko et al. "Finding structure with randomness:
Probabilistic algorithms for constructing approximate
matrix decompositions." SIAM review 2011

18 •
Randomized SVD
Halko et al. "Finding structure with randomness:
Probabilistic algorithms for constructing approximate
matrix decompositions." SIAM review 2011

19 •
Randomized SVD
Halko et al. "Finding structure with randomness:
Probabilistic algorithms for constructing approximate
matrix decompositions." SIAM review 2011

20 •
Randomized SVD
Halko et al. "Finding structure with randomness:
Probabilistic algorithms for constructing approximate
matrix decompositions." SIAM review 2011

24 •
The Johnson-Lindenstrauss Lemma (1984)

25 •
log 𝑛
The Johnson-Lindenstrauss Lemma (1984)
𝜀−2

26 • Source: https://scikit-learn.org/stable/auto_examples/plot_johnson_lindenstrauss_bound.html
Johnson-Lindenstrauss

27 •
JL embeddings
Source: https://scikit-learn.org/stable/auto_examples/plot_johnson_lindenstrauss_bound.html

28 •
JL embeddings
Source: https://scikit-learn.org/stable/auto_examples/plot_johnson_lindenstrauss_bound.html

29 •
• Dimensionality reduction
• Sketching
• Approximate nearest neighbors
• Random projection trees
• Kernel approximations
• Newton sketches for optimization
• Linear programming
• …
Many applications
Googling for sketching was the best idea :-)

31 •
Sparse random projections
…
Dasgupta, et al. "A sparse Johnson_Lindenstrauss
transform." STOC, 2010.
Achlioptas. “Database-friendly random projections: Johnson-Lindenstrauss
with binary coins”. JCSS, 2003.
Li et al, "Very Sparse Random Projections" , KDD 2006

32 •
Normalized Hadamard matrices:
𝐻2 =
1
2
1 1
1 −1
𝐻2𝑛 =
1
2
𝐻𝑛 𝐻𝑛
𝐻𝑛 −𝐻𝑛
HX = Fast Hadamard Transform of X.
Hadamard matrices

33 •
The Fast-JLT transform
= x x
Ailon and Chazelle. “Approximate nearest neighbors and the fast Johnson-
Lindenstrauss transform”. STOC 2006

34 •
Orthogonalize: 𝐺 = 𝑄𝑅
Rescale: 𝐺𝑂𝑅𝐹 =
1
𝜎
𝑆𝑄
with 𝑆 ∼ 𝑑𝑖𝑎𝑔(𝜒𝑑)
Orthogonal random features

35 •
First used for practical spherical LSH
Structured orthogonal random features
𝐺𝑆𝑂𝑅𝐹 =
𝑑
𝜎
𝐻𝐷1𝐻𝐷2𝐻𝐷3
Andoni et al. "Practical and optimal LSH for angular distance." Advances in
NeurIPS. 2015.
Choromanski et al. "The unreasonable effectiveness of structured random
orthogonal embeddings." NeurIPS. 2017.

36 •
Define a family of hash functions 𝐹 such
that:
• Define a hash function ℎ from
ℎ1, … , ℎ𝑘 ∈ 𝐻𝑘
, eg:
ℎ 𝑥 = sgn < 𝑎, 𝑥 > with ai ∼ 𝑁(0,1)
• Use 𝐿 hash tables
LSH: Locality Sensitive Hashing
Indyk and Motwani, “Approximate nearest neighbors: towards removing the
curse of dimensionality”, STOC, 1998
Charikar, “Similarity estimation techniques from rounding algorithms”,
STOC, 2002

39 •
A dense layer
Image from http://cs231n.github.io/neural-networks-1/

40 •
Adaptive Fastfood – Deep Fried Convnets
Source: Yang et al, ICCV 2015

“The only principle that does not inhibit
progress is: anything goes.”
Paul Feyerabend, Against Method

Contenu connexe

Similaire à Large-scale recommendation, a random point of view

Big Data Real Time Training in ChennaiVijay Susheedran C G

Big Data 101 - An introductionNeeraj Tewari

Streaming Outlier Analysis for Fun and Scalability DataWorks Summit/Hadoop Summit

Big data week 2018 - Graph Analytics on Big DataChristos Hadjinikolis

Learning Systems for ScienceIan Foster

Collaborative Metric Learning (WWW'17)承剛謝

Exploration Strategies in Reinforcement LearningDongmin Lee

Wimmics Research Team 2015 Activity ReportFabien Gandon

Intent-Aware Diversification Using a Constrained PLSAJacek Wasilewski

Talk at MCubed London about Manifold Learning and ApplicationsStefan Kühn

Recommender SystemsAlan Said

Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ObjRecog2-17 (1).pptxssuserc074dd

Mechanical LibrarianAndre Vellino

Master Thesis: Conformal multi-material mesh generation from labelled medical...Christian Kehl

"Where Far Can Be Close": Finding Distant Neighbors In Recommendation SystemsVikas Kumar

Using Networks to Measure Influence and ImpactYunhao Zhang

Scalable Recommendation Algorithms with LSHMaruf Aytekin

Exploratory Visual Analysis in Large High-Resolution Displayslio889

Evidently: New Humanities ScholarshipDeb Verhoeven

Similaire à Large-scale recommendation, a random point of view (20)

Big Data Real Time Training in Chennai

Big Data 101 - An introduction

Streaming Outlier Analysis for Fun and Scalability

Big data week 2018 - Graph Analytics on Big Data

Learning Systems for Science

Collaborative Metric Learning (WWW'17)

Exploration Strategies in Reinforcement Learning

Wimmics Research Team 2015 Activity Report

Intent-Aware Diversification Using a Constrained PLSA

Talk at MCubed London about Manifold Learning and Applications

Recommender Systems

Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...

ObjRecog2-17 (1).pptx

Mechanical Librarian

Master Thesis: Conformal multi-material mesh generation from labelled medical...

"Where Far Can Be Close": Finding Distant Neighbors In Recommendation Systems

Using Networks to Measure Influence and Impact

Scalable Recommendation Algorithms with LSH

Exploratory Visual Analysis in Large High-Resolution Displays

Evidently: New Humanities Scholarship

Plus de Anne-Marie Tousch

From DevOps to MLOps: practical steps for a smooth transitionAnne-Marie Tousch

Why am I doing this???Anne-Marie Tousch

On Machine Learning ReadinessAnne-Marie Tousch

Data is beautiful, please don't ruin itAnne-Marie Tousch

Large Scale Recommendation: a view from the TrenchesAnne-Marie Tousch

PyParis -- How we used Python to introduce teenagers to the fun of programmingAnne-Marie Tousch

Plus de Anne-Marie Tousch (6)

From DevOps to MLOps: practical steps for a smooth transition

Why am I doing this???

On Machine Learning Readiness

Data is beautiful, please don't ruin it

Large Scale Recommendation: a view from the Trenches

PyParis -- How we used Python to introduce teenagers to the fun of programming

Dernier

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Midocean dropshipping via API with DroFxolyaivanovalion

Mature dropshipping via API with DroFx.pptxolyaivanovalion

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

Ukraine War presentation: KNOW THE BASICSAishani27

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

April 2024 - Crypto Market Report's Analysismanisha194592

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Halmar dropshipping via API with DroFxolyaivanovalion

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

Dernier (20)

VidaXL dropshipping via API with DroFx.pptx

Midocean dropshipping via API with DroFx

Mature dropshipping via API with DroFx.pptx

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一

Ukraine War presentation: KNOW THE BASICS

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati

RA-11058_IRR-COMPRESS Do 198 series of 1998

April 2024 - Crypto Market Report's Analysis

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

Smarteg dropshipping via API with DroFx.pptx

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Halmar dropshipping via API with DroFx

Edukaciniai dropshipping via API with DroFx

Ravak dropshipping via API with DroFx.pptx

Large-scale recommendation, a random point of view

1. Anne-Marie Tousch Senior Research Scientist, Criteo AI Lab July 2d, 2019 Large-scale recommendation a random point of view @amy8492

2. 2 • Who am I?

3. Large-scale recommendation a random point of view

4. 4 • Retargeting User browses for products

5. 5 • Retargeting User browses Criteo buys the ad placement

6. 6 • Retargeting User browses Criteo buys the ad placement Client pays for clicks

8. 8 • Large scale recommendation ?

9. 9 • Large scale recommendation : cross-advertiser case

10. 10 • Large scale recommendation in a nutshell

11. 11 • Classical Recommendation Systems

12. 12 • Classical Recommendation Systems

13. 13 • Classical Recommendation Systems

14. 14 • Classical Recommendation Systems

15. 15 • Classical Recommendation Systems

16.

17. 17 • Randomized SVD Halko et al. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions." SIAM review 2011

18. 18 • Randomized SVD Halko et al. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions." SIAM review 2011

19. 19 • Randomized SVD Halko et al. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions." SIAM review 2011

20. 20 • Randomized SVD Halko et al. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions." SIAM review 2011

21.

22. 22 • Did you mean…

23. The origin

24. 24 • The Johnson-Lindenstrauss Lemma (1984)

25. 25 • log 𝑛 The Johnson-Lindenstrauss Lemma (1984) 𝜀−2

26. 26 • Source: https://scikit-learn.org/stable/auto_examples/plot_johnson_lindenstrauss_bound.html Johnson-Lindenstrauss

27. 27 • JL embeddings Source: https://scikit-learn.org/stable/auto_examples/plot_johnson_lindenstrauss_bound.html

28. 28 • JL embeddings Source: https://scikit-learn.org/stable/auto_examples/plot_johnson_lindenstrauss_bound.html

29. 29 • • Dimensionality reduction • Sketching • Approximate nearest neighbors • Random projection trees • Kernel approximations • Newton sketches for optimization • Linear programming • … Many applications Googling for sketching was the best idea :-)

30. Fast JL transforms

31. 31 • Sparse random projections … Dasgupta, et al. "A sparse Johnson_Lindenstrauss transform." STOC, 2010. Achlioptas. “Database-friendly random projections: Johnson-Lindenstrauss with binary coins”. JCSS, 2003. Li et al, "Very Sparse Random Projections" , KDD 2006

32. 32 • Normalized Hadamard matrices: 𝐻2 = 1 2 1 1 1 −1 𝐻2𝑛 = 1 2 𝐻𝑛 𝐻𝑛 𝐻𝑛 −𝐻𝑛 HX = Fast Hadamard Transform of X. Hadamard matrices

33. 33 • The Fast-JLT transform = x x Ailon and Chazelle. “Approximate nearest neighbors and the fast Johnson- Lindenstrauss transform”. STOC 2006

34. 34 • Orthogonalize: 𝐺 = 𝑄𝑅 Rescale: 𝐺𝑂𝑅𝐹 = 1 𝜎 𝑆𝑄 with 𝑆 ∼ 𝑑𝑖𝑎𝑔(𝜒𝑑) Orthogonal random features

35. 35 • First used for practical spherical LSH Structured orthogonal random features 𝐺𝑆𝑂𝑅𝐹 = 𝑑 𝜎 𝐻𝐷1𝐻𝐷2𝐻𝐷3 Andoni et al. "Practical and optimal LSH for angular distance." Advances in NeurIPS. 2015. Choromanski et al. "The unreasonable effectiveness of structured random orthogonal embeddings." NeurIPS. 2017.

36. 36 • Define a family of hash functions 𝐹 such that: • Define a hash function ℎ from ℎ1, … , ℎ𝑘 ∈ 𝐻𝑘 , eg: ℎ 𝑥 = sgn < 𝑎, 𝑥 > with ai ∼ 𝑁(0,1) • Use 𝐿 hash tables LSH: Locality Sensitive Hashing Indyk and Motwani, “Approximate nearest neighbors: towards removing the curse of dimensionality”, STOC, 1998 Charikar, “Similarity estimation techniques from rounding algorithms”, STOC, 2002

37. 37 • Random Kitchen Sinks

38. Large scale neural networks

39. 39 • A dense layer Image from http://cs231n.github.io/neural-networks-1/

40. 40 • Adaptive Fastfood – Deep Fried Convnets Source: Yang et al, ICCV 2015

41. Randomizing is good for you

42. image source

43.

44. “The only principle that does not inhibit progress is: anything goes.” Paul Feyerabend, Against Method

45. Next steps

46. Thanks. Questions? @amy8492

Notes de l'éditeur

Retargeting: User browse an e-commerce website Moves on to a publisher website Criteo buys ad placements Criteo is paid if the ad is clicked
Retargeting: User browse an e-commerce website Moves on to a publisher website Criteo buys ad placements Criteo is paid if the ad is clicked
Retargeting: User browse an e-commerce website Moves on to a publisher website Criteo buys ad placements Criteo is paid if the ad is clicked
For each user, and for each client, compute offline recommendations with different algorithms. Append all these « sources » with the last historical products & you have a short list of products to score online, where you can estimate probability of click with logistic regression.
Item/User Nearest neighbors Collaborative filtering Neural networks
One technique is to compute a vector space for products to be able to compute nearest neighbors between products
The classical way to compute vectors is to factorize the interaction matrix, usually through a singular value decomposition (SVD). This is called collaborative filtering.
Both dimensions may grow to infinity
Image: https://blogs.ethz.ch/kowalski/2008/09/25/buffons-needle/
1. Intuition = there exists a transform with low distortion into O(log(n)) dimension (indep d!) 2. Get drawings and formulas from https://scikit-learn.org/stable/auto_examples/plot_johnson_lindenstrauss_bound.html => 3-4 slides
The ideal setting is when n is large, and of course d > log(n)
First idea: use a sparse projection matrix with +/- 1 Successive improvements But issues with sparse inputs Achlioptas. “Database-friendly random projections: Johnson-Lindenstrauss with binary coins”. JCSS, 2003. Li et al, "Very Sparse Random Projections" , KDD 2006 Dasgupta, et al. "A sparse Johnson_Lindenstrauss transform." STOC, 2010.
Fourier transform idea: from uncertainty principle, if data and spectrum can’t be both sparse => work on spectrum. Randomize selection of hadamard rows. Rerandomize to ensure non-sparsity. Now P can be sparse gaussian as in original PHD, but results have been improved since and a simple coordinate sampling matrix is enough.
Ailon and Chazelle. “Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform”. STOC 2006 A step further is taken by Clarkson & Woodruff, “Low rank approximation and regression in input sparsity time”, STOC 2013, where they actually used the CountMin to sample the matrix. However, it no longer has the JL properties.
These are not faster, but have better properties: Originally JL proof used orthogonal features, dropped by Indyk et al. for LSH Recently shown to yield lower variance kernel estimators with RFF (see later section on RFF)
While we are currently not aware how to prove rigorously that such pseudo-random rotations perform as well as the fully random ones, empirical evaluations show that three applications of HDi are exactly equivalent to applying a true random rotation (when d tends to infinity). We note that only two applications of HDi are not sufficient.
A well-known application to the approximate nearest neighbors problem which you might find useful in real life.
This is a very nice application to approximating kernels. https://www.youtube.com/watch?v=Qi1Yry33TQE
More modern applications
3 more examples coming
SGD everywhere.
Bandit algorithm & how children learn
The exploration-exploitation trade-off exists everywhere, eg in scientific research.

Large-scale recommendation, a random point of view

Recommandé

Recommandé

Contenu connexe

Similaire à Large-scale recommendation, a random point of view

Similaire à Large-scale recommendation, a random point of view (20)

Plus de Anne-Marie Tousch

Plus de Anne-Marie Tousch (6)

Dernier

Dernier (20)

Large-scale recommendation, a random point of view

Notes de l'éditeur