SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
On Sampling Strategies for Neural
Network-based Collaborative Filtering
Ting Chen, Yizhou Sun, Yue Shi, Liangjie Hong
Outlines
• Neural Network-based Collaborative Filtering
• Computation Challenges and Limitations of Existing
Methods
• Two Sampling Strategies and Their Combination
• Empirical Evaluations
Content-based Recommendation
Problem
Neural Network-based Collaborative
filtering
Functional Embedding
ruv = f(xu)T
g(xv)
Embedding functions
Interaction function
fu, gv 2 RdEmbeddings:
• If we have no additional features for users and
items (reduced to conventional MF)
Embedding Functions
• We have text features for items
ruv = uT
u vv
ruv = uT
u g(xv)
Neural networks
Embedding vector
uu = f(xu) = WT
xu
id-based one-hot vector
Text Embedding Function g(.)
[Y. Kim, AAAI’14]
Convolutional Neural Networks
Recurrent Neural Networks (LSTM)
[Christopher Olah]
Implicit Feedbacks and Loss Functions
• We define loss based on implicit feedbacks [Hu’08, Rendle’09]
• Interactions are positive
• Non-interactions are treated as negative
(user, item)
as a data point
(user, item+, item-)
as a data point
Training Procedure
Have different
sampling strategies
Outlines
• Neural Network-based Collaborative Filtering
• Computation Challenges and Limitations of Existing
Methods
• Two Sampling Strategies and Their Combination
• Empirical Evaluations
Computation Cost Using Different
Embedding Functions
Computation cost is dominated by the neural network
computation (forward / backward) for items/texts.
Major Computation Cost Breakdown
User function
computation
Item function
computation
Interaction function
(dot product) computation
tf tg ti
10 100 1
(both forward/backward)
Very rough order of magnitude estimate of time units
(depending on specific configurations)
Computation Cost in a Graph View
The loss functions are defined over interactions/links,
but the major computation burden are on nodes.
Pointwise Loss Pairwise Loss
Mini-batch Sampling Matters
• Since certain data points (links/interactions) share
the same computations (on nodes).
• Different mini-batch sampling can result in different
computations.
Existing Mini-batch Sampling
Approaches
• IID Sampling [Bottou’10]
• Draw positive links uniformly at random
• Draw negative links according to negative distribution
• Negative Sampling [Rendle’09, Mikolov’13]
• Draw positive links uniformly at random
• Draw k negative links for each positive link by
replacing items
• Assuming we sample a batch of b positive links,
and k negative links for each positive link.
Cost Model Analysis for
IID and Negative Sampling
tf tg ti are unit computation costs for user/item/interaction functions
Computation: almost the same
Limitations of Existing Approaches
• IID sampling assumes computation costs are
independent among data points (links).
• So the computation cost cannot be amortized,
and thus very intensive.
• Negative sampling cannot do better since item
function computation is the most expensive
Outlines
• Neural Network-based Collaborative Filtering
• Computation Challenges and Limitations of Existing
Methods
• Two Sampling Strategies and Their Combination
• Empirical Evaluations
The Proposed Strategies
• Strategy one: Stratified Sampling.
• Grouping loss function terms by shared “heavy-
lifting” node, i.e. amortized the computation cost
• Strategy two: Negative Sharing.
• Once a batch of (user, item) tuples are sampled, we
add additional links with not much additional costs.
• The two strategies can be further combined.
Proposed Strategy 1: Stratified Sampling
• Node computation cost can be amortized if we
have multiple links sharing the same node when we
sample a mini-batch.
• That is to group links according to certain “heavy-
lifting” nodes (i.e. loss function terms).
• We first draw items, then draw associated positive
and negative links.
Proposed Strategy 1: Stratified Sampling
• Assuming we sample a batch of b positive links,
and k negative links for each positive link.
Cost Model Analysis for
Stratified Sampling
tf tg ti are unit computation costs for user/item/interaction functions
Speedup: ~(1+k)s times
Proposed Strategy 2: Negative Sharing
• Interaction computation is much cheaper than
(item) node computation (according to our
assumption).
• Once user/item nodes are given in a batch, adding
more links among them may not increase
computation cost much.
• Only need to draw positive links!
Proposed Strategy 2: Negative Sharing
Implementation detail: use efficient matrix multiplication operation for complete interactions
• Assuming we sample a batch of b positive links,
and k negative links for each positive link.
Cost Model Analysis for
Negative Sharing
tf tg ti are unit computation costs for user/item/interaction functions
Speedup: (1+k) times
Much more negative links
Limitations of Both Proposed
Strategies
• Stratified sampling:
• Cannot work well with ranking-based loss functions
• Negative sharing:
• Too much negative interactions, diminishing return
• Have-your-cake-and-eat-it solution:
• Combine both strategies to overcome their shortcomings, while
keeping their advantages.
• Draw positive links using Stratified Sampling, generate negative
links using Negative Sharing.
Proposed Hybrid Strategy:
Stratified Sampling with Batch Sharing
• Assuming we sample a batch of b positive links,
and k negative links for each positive link.
Cost Model Analysis for
Stratified Sampling with Negative Sharing
tf tg ti are unit computation costs for user/item/interaction functions
Speedup: (1+k)s times
Much more negative links
Summary of Cost Model Analysis
• Computation cost estimation (using b=256, k=20, t_f=10, t_g=100, t_i=1, s=2)
• IID sampling: 597k
• Negative sampling: 546k
• Stratified sampling (by item): 72k
• Negative Sharing: 28k
• Stratified sampling with negative sharing: 16k
(all in time units)
Convergence Analysis
Outlines
• Neural Network-based Collaborative Filtering
• Computation Challenges and Limitations of Existing
Methods
• Two Sampling Strategies and Their Combination
• Empirical Evaluations
Datasets and Setup
• We use CiteULike and Yahoo News data sets.
• Test data consists of texts never seen before.
Speed-up Comparisons
Total speedup = speedup per iter * speedup of # iter
Recommendation Performance
Convergence Curves
Converges faster, and performs better!
Number of Negative Examples
More negative examples helps, with diminishing return.
Number of Positive Links per Stratum
Conclusions
• We propose a functional embedding framework with neural
networks for collaborative filtering, which generalizes
several STOA models.
• We establish the connection between the loss functions
and the user-item interaction graph, which introduces
computation cost dependency between links (i.e. loss
function terms).
• Based on the understanding, we propose three novel mini-
batch sampling strategies, that speedup model training
significantly, at the same time improve the performance.
Thank You!
code is also available @ https://github.com/chentingpc/nncf.

Contenu connexe

Tendances

Recommendation system
Recommendation systemRecommendation system
Recommendation systemDing Li
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesIRJET Journal
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsSangmin Woo
 
Revenue Maximization in Incentivized Social Advertising
Revenue Maximization in Incentivized Social AdvertisingRevenue Maximization in Incentivized Social Advertising
Revenue Maximization in Incentivized Social AdvertisingCigdem Aslay
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataIOSR Journals
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionKishor Datta Gupta
 
Modelling Accessibility Performance in LTE networks, An Analytics Methodology
Modelling Accessibility Performance in LTE networks, An Analytics MethodologyModelling Accessibility Performance in LTE networks, An Analytics Methodology
Modelling Accessibility Performance in LTE networks, An Analytics Methodologyalien_gmx
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptationtaeseon ryu
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...Mirsaeid Abolghasemi
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science researchDing Li
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXBenjamin Bengfort
 
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Kishor Datta Gupta
 

Tendances (20)

Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene Graphs
 
Revenue Maximization in Incentivized Social Advertising
Revenue Maximization in Incentivized Social AdvertisingRevenue Maximization in Incentivized Social Advertising
Revenue Maximization in Incentivized Social Advertising
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online Data
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
 
Modelling Accessibility Performance in LTE networks, An Analytics Methodology
Modelling Accessibility Performance in LTE networks, An Analytics MethodologyModelling Accessibility Performance in LTE networks, An Analytics Methodology
Modelling Accessibility Performance in LTE networks, An Analytics Methodology
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
One shot learning
One shot learningOne shot learning
One shot learning
 
InfoGAIL
InfoGAIL InfoGAIL
InfoGAIL
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
01 Introduction to Machine Learning
01 Introduction to Machine Learning01 Introduction to Machine Learning
01 Introduction to Machine Learning
 
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
 

Similaire à On Sampling Strategies for Sampling Strategies-based Collaborative Filtering

Scalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduceScalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReducesscdotopen
 
Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Surveymobilizer1000
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...ssuser4b1f48
 
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationMaruf Aytekin
 
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsCold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsMatthias Braunhofer
 
master defense hyun-wong choi_2019_05_14_rev19
master defense hyun-wong choi_2019_05_14_rev19master defense hyun-wong choi_2019_05_14_rev19
master defense hyun-wong choi_2019_05_14_rev19Hyun Wong Choi
 
defense hyun-wong choi_2019_05_14_rev18
defense hyun-wong choi_2019_05_14_rev18defense hyun-wong choi_2019_05_14_rev18
defense hyun-wong choi_2019_05_14_rev18Hyun Wong Choi
 
master defense hyun-wong choi_2019_05_14_rev19
master defense hyun-wong choi_2019_05_14_rev19master defense hyun-wong choi_2019_05_14_rev19
master defense hyun-wong choi_2019_05_14_rev19Hyun Wong Choi
 
master defense hyun-wong choi_2019_05_14_rev19
master defense hyun-wong choi_2019_05_14_rev19master defense hyun-wong choi_2019_05_14_rev19
master defense hyun-wong choi_2019_05_14_rev19Hyun Wong Choi
 
Final edited master defense-hyun_wong choi_2019_05_23_rev21
Final edited master defense-hyun_wong choi_2019_05_23_rev21Final edited master defense-hyun_wong choi_2019_05_23_rev21
Final edited master defense-hyun_wong choi_2019_05_23_rev21Hyun Wong Choi
 
A Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationA Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationSurendra Gadwal
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin ChengPinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin ChengCeph Community
 
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Sc Huang
 
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...IEEEMEMTECHSTUDENTPROJECTS
 
2014 IEEE DOTNET DATA MINING PROJECT Similarity preserving snippet based visu...
2014 IEEE DOTNET DATA MINING PROJECT Similarity preserving snippet based visu...2014 IEEE DOTNET DATA MINING PROJECT Similarity preserving snippet based visu...
2014 IEEE DOTNET DATA MINING PROJECT Similarity preserving snippet based visu...IEEEMEMTECHSTUDENTSPROJECTS
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Daniel Valcarce
 
Ire presentation
Ire presentationIre presentation
Ire presentationRaj Patel
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINERaj Patel
 

Similaire à On Sampling Strategies for Sampling Strategies-based Collaborative Filtering (20)

Scalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduceScalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduce
 
Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Survey
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
 
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in Recommendation
 
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsCold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
 
master defense hyun-wong choi_2019_05_14_rev19
master defense hyun-wong choi_2019_05_14_rev19master defense hyun-wong choi_2019_05_14_rev19
master defense hyun-wong choi_2019_05_14_rev19
 
defense hyun-wong choi_2019_05_14_rev18
defense hyun-wong choi_2019_05_14_rev18defense hyun-wong choi_2019_05_14_rev18
defense hyun-wong choi_2019_05_14_rev18
 
master defense hyun-wong choi_2019_05_14_rev19
master defense hyun-wong choi_2019_05_14_rev19master defense hyun-wong choi_2019_05_14_rev19
master defense hyun-wong choi_2019_05_14_rev19
 
master defense hyun-wong choi_2019_05_14_rev19
master defense hyun-wong choi_2019_05_14_rev19master defense hyun-wong choi_2019_05_14_rev19
master defense hyun-wong choi_2019_05_14_rev19
 
Final edited master defense-hyun_wong choi_2019_05_23_rev21
Final edited master defense-hyun_wong choi_2019_05_23_rev21Final edited master defense-hyun_wong choi_2019_05_23_rev21
Final edited master defense-hyun_wong choi_2019_05_23_rev21
 
Query processing System
Query processing SystemQuery processing System
Query processing System
 
A Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationA Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence Maximization
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin ChengPinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
 
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
 
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...
 
2014 IEEE DOTNET DATA MINING PROJECT Similarity preserving snippet based visu...
2014 IEEE DOTNET DATA MINING PROJECT Similarity preserving snippet based visu...2014 IEEE DOTNET DATA MINING PROJECT Similarity preserving snippet based visu...
2014 IEEE DOTNET DATA MINING PROJECT Similarity preserving snippet based visu...
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
 
Ire presentation
Ire presentationIre presentation
Ire presentation
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
 

Dernier

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 

Dernier (20)

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

On Sampling Strategies for Sampling Strategies-based Collaborative Filtering

  • 1. On Sampling Strategies for Neural Network-based Collaborative Filtering Ting Chen, Yizhou Sun, Yue Shi, Liangjie Hong
  • 2. Outlines • Neural Network-based Collaborative Filtering • Computation Challenges and Limitations of Existing Methods • Two Sampling Strategies and Their Combination • Empirical Evaluations
  • 5. Functional Embedding ruv = f(xu)T g(xv) Embedding functions Interaction function fu, gv 2 RdEmbeddings:
  • 6. • If we have no additional features for users and items (reduced to conventional MF) Embedding Functions • We have text features for items ruv = uT u vv ruv = uT u g(xv) Neural networks Embedding vector uu = f(xu) = WT xu id-based one-hot vector
  • 7. Text Embedding Function g(.) [Y. Kim, AAAI’14] Convolutional Neural Networks Recurrent Neural Networks (LSTM) [Christopher Olah]
  • 8. Implicit Feedbacks and Loss Functions • We define loss based on implicit feedbacks [Hu’08, Rendle’09] • Interactions are positive • Non-interactions are treated as negative (user, item) as a data point (user, item+, item-) as a data point
  • 10. Outlines • Neural Network-based Collaborative Filtering • Computation Challenges and Limitations of Existing Methods • Two Sampling Strategies and Their Combination • Empirical Evaluations
  • 11. Computation Cost Using Different Embedding Functions Computation cost is dominated by the neural network computation (forward / backward) for items/texts.
  • 12. Major Computation Cost Breakdown User function computation Item function computation Interaction function (dot product) computation tf tg ti 10 100 1 (both forward/backward) Very rough order of magnitude estimate of time units (depending on specific configurations)
  • 13. Computation Cost in a Graph View The loss functions are defined over interactions/links, but the major computation burden are on nodes. Pointwise Loss Pairwise Loss
  • 14. Mini-batch Sampling Matters • Since certain data points (links/interactions) share the same computations (on nodes). • Different mini-batch sampling can result in different computations.
  • 15. Existing Mini-batch Sampling Approaches • IID Sampling [Bottou’10] • Draw positive links uniformly at random • Draw negative links according to negative distribution • Negative Sampling [Rendle’09, Mikolov’13] • Draw positive links uniformly at random • Draw k negative links for each positive link by replacing items
  • 16. • Assuming we sample a batch of b positive links, and k negative links for each positive link. Cost Model Analysis for IID and Negative Sampling tf tg ti are unit computation costs for user/item/interaction functions Computation: almost the same
  • 17. Limitations of Existing Approaches • IID sampling assumes computation costs are independent among data points (links). • So the computation cost cannot be amortized, and thus very intensive. • Negative sampling cannot do better since item function computation is the most expensive
  • 18. Outlines • Neural Network-based Collaborative Filtering • Computation Challenges and Limitations of Existing Methods • Two Sampling Strategies and Their Combination • Empirical Evaluations
  • 19. The Proposed Strategies • Strategy one: Stratified Sampling. • Grouping loss function terms by shared “heavy- lifting” node, i.e. amortized the computation cost • Strategy two: Negative Sharing. • Once a batch of (user, item) tuples are sampled, we add additional links with not much additional costs. • The two strategies can be further combined.
  • 20. Proposed Strategy 1: Stratified Sampling • Node computation cost can be amortized if we have multiple links sharing the same node when we sample a mini-batch. • That is to group links according to certain “heavy- lifting” nodes (i.e. loss function terms). • We first draw items, then draw associated positive and negative links.
  • 21. Proposed Strategy 1: Stratified Sampling
  • 22. • Assuming we sample a batch of b positive links, and k negative links for each positive link. Cost Model Analysis for Stratified Sampling tf tg ti are unit computation costs for user/item/interaction functions Speedup: ~(1+k)s times
  • 23. Proposed Strategy 2: Negative Sharing • Interaction computation is much cheaper than (item) node computation (according to our assumption). • Once user/item nodes are given in a batch, adding more links among them may not increase computation cost much. • Only need to draw positive links!
  • 24. Proposed Strategy 2: Negative Sharing Implementation detail: use efficient matrix multiplication operation for complete interactions
  • 25. • Assuming we sample a batch of b positive links, and k negative links for each positive link. Cost Model Analysis for Negative Sharing tf tg ti are unit computation costs for user/item/interaction functions Speedup: (1+k) times Much more negative links
  • 26. Limitations of Both Proposed Strategies • Stratified sampling: • Cannot work well with ranking-based loss functions • Negative sharing: • Too much negative interactions, diminishing return • Have-your-cake-and-eat-it solution: • Combine both strategies to overcome their shortcomings, while keeping their advantages. • Draw positive links using Stratified Sampling, generate negative links using Negative Sharing.
  • 27. Proposed Hybrid Strategy: Stratified Sampling with Batch Sharing
  • 28. • Assuming we sample a batch of b positive links, and k negative links for each positive link. Cost Model Analysis for Stratified Sampling with Negative Sharing tf tg ti are unit computation costs for user/item/interaction functions Speedup: (1+k)s times Much more negative links
  • 29. Summary of Cost Model Analysis • Computation cost estimation (using b=256, k=20, t_f=10, t_g=100, t_i=1, s=2) • IID sampling: 597k • Negative sampling: 546k • Stratified sampling (by item): 72k • Negative Sharing: 28k • Stratified sampling with negative sharing: 16k (all in time units)
  • 31. Outlines • Neural Network-based Collaborative Filtering • Computation Challenges and Limitations of Existing Methods • Two Sampling Strategies and Their Combination • Empirical Evaluations
  • 32. Datasets and Setup • We use CiteULike and Yahoo News data sets. • Test data consists of texts never seen before.
  • 33. Speed-up Comparisons Total speedup = speedup per iter * speedup of # iter
  • 35. Convergence Curves Converges faster, and performs better!
  • 36. Number of Negative Examples More negative examples helps, with diminishing return.
  • 37. Number of Positive Links per Stratum
  • 38. Conclusions • We propose a functional embedding framework with neural networks for collaborative filtering, which generalizes several STOA models. • We establish the connection between the loss functions and the user-item interaction graph, which introduces computation cost dependency between links (i.e. loss function terms). • Based on the understanding, we propose three novel mini- batch sampling strategies, that speedup model training significantly, at the same time improve the performance.
  • 39. Thank You! code is also available @ https://github.com/chentingpc/nncf.