SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
When Recommenders Met Big Data
An Architectural Proposal and Evaluation
Daniel Valcarce Javier Parapar ´Alvaro Barreiro
CERI 2014
3rd Spanish Conference on Information Retrieval
A Coru˜na, June 2014
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Table of Contents
Introduction
Motivation
Recommender Systems
Recommender System Architecture
Overview
Front-end
Recommendation engine
Storage
Experiments and results
Rating Insertion
Recommendation Generation
Recommendation Serving
Conclusions and Future Work
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Motivation
According to Shareaholic, in 2013...
web traffic generated by search engines dropped 6%
social networks increased more than 100%
Users...
used to query what they want
want personalised recommendations
1 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommender Systems
Objective
Predict user preferences over items
Approaches
Content-based: uses properties of the items
Collaborative filtering: based on similar users
Hybrid approaches: combination of both
2 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommender Systems
Objective
Predict user preferences over items
Approaches
Content-based: uses properties of the items
Collaborative filtering: based on similar users
Hybrid approaches: combination of both
2 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Table of Contents
Introduction
Motivation
Recommender Systems
Recommender System Architecture
Overview
Front-end
Recommendation engine
Storage
Experiments and results
Rating Insertion
Recommendation Generation
Recommendation Serving
Conclusions and Future Work
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our work
Recommender architecture proposal for Big Data
Detail specific technologies for each component
Efficiency study of MySQL Cluster and Cassandra as alternatives for
storing ratings and recommendations in the proposed architecture
3 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Generic Recommender System Architecture
Front-end
Storage
Recommendation
engine
4 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our goals
Scalability
More machines → more computational power
Big Data capable
High availability
Fault-tolerance
No single point of failure
5 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
6 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our proposal: Front-end
Use cases
Search items
Emit ratings
Get recommendations
Proposed architecture
Distributed web application (Django)
Redundant load balancers (Perlbal)
Two levels of cache
Reverse proxy cache (Varnish)
Distributed memory cache (Memcached)
7 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
8 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our proposal: Recommendation Engine
Recommendations are precalculated and stored
A batch process refreshes the suggestions regularly
Use of MapReduce distributed model
State-of-the-art paradigm for large-scale data processing
Hadoop: MapReduce open source implementation
Mahout: scalable machine learning library
9 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
10 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our proposal: Storage Component I
Information to be stored
Common web application data (e.g., user profiles)
Manage large amount of ratings and recommendations
Data about items
Requirements
Read-scalable and fault-tolerance (replication)
Write-scalable (sharding)
Linear scalability with the number of nodes
11 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our proposal: Storage Component II
Proposed technologies
Relational database (MySQL Cluster)
NoSQL column store (Cassandra)
Inverted indexes (Solr)
12 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
13 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Table of Contents
Introduction
Motivation
Recommender Systems
Recommender System Architecture
Overview
Front-end
Recommendation engine
Storage
Experiments and results
Rating Insertion
Recommendation Generation
Recommendation Serving
Conclusions and Future Work
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Experiment: storing ratings and recomendations
Candidates
MySQL Cluster
Cassandra
Netflix Prize Dataset
100M ratings
480k users
17.7k films
Cluster configuration
Number of machines: 4
Replication factor: 2
14 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Rating Insertion
Figure: Average insertion rate obtained by inserting from 10 to 100 million
ratings using 8 concurrent petitions
0.00
0.05
0.10
0.15
0.20
0.25
0.30
1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08
miliseconds/insertion
# ratings
MySQL Cluster 8 Cassandra 8
15 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Rating Insertion
Figure: Average insertion rate obtained by inserting from 10 to 100 million
ratings using 8, 16, 32 and 64 concurrent petitions
0.00
0.05
0.10
0.15
0.20
0.25
0.30
1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08
miliseconds/insertion
# ratings
MySQL Cluster 8
MySQL Cluster 16
MySQL Cluster 32
MySQL Cluster 64
Cassandra 8
Cassandra 16
Cassandra 32
Cassandra 64
15 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Rating Insertion
Figure: Average insertion rate obtained by inserting from 10 to 100 million
ratings using 8, 16, 32 and 64 concurrent petitions
0.00
0.05
0.10
0.15
0.20
0.25
0.30
1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08
miliseconds/insertion
# ratings
MySQL Cluster 8
MySQL Cluster 16
MySQL Cluster 32
MySQL Cluster 64
Cassandra 8
Cassandra 16
Cassandra 32
Cassandra 64
15 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommendation Generation
Table: Times for Mahout’s Item-based Collaborative Filtering algorithm
reading and writing directly to/from the database
Storage Time Time per
system (min) recommendation (ms)
Cassandra 68.85 8.6
MySQL Cluster crash! crash!
16 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommendation Generation
Table: Times for Mahout’s Item-based Collaborative Filtering algorithm
Storage Time Time per
system (min) recommendation (ms)
Cassandra 68.85 8.6
MySQL Cluster * 274.73 34.3
* Using Sqoop, a tool for transferring bulk data between Hadoop
Distributed File System and relational databases.
17 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommendation Serving
Figure: Average serving rate obtained by querying the top 10 recommended
items for 25 million users using 8, 16, 32 and 64 concurrent petitions
8 16 32 64
# threads
0.00
0.05
0.10
0.15
0.20
0.25
0.30
miliseconds/serving
MySQL Cluster
Cassandra
18 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Table of Contents
Introduction
Motivation
Recommender Systems
Recommender System Architecture
Overview
Front-end
Recommendation engine
Storage
Experiments and results
Rating Insertion
Recommendation Generation
Recommendation Serving
Conclusions and Future Work
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Conclusions and Future Work
We have proposed a highly scalable and fault-tolerant platform for
recommender systems.
We have benchmarked Cassandra and MySQL Cluster in the context
of recommender systems.
Future: study and benchmark more parts of the proposed platform.
Future: develop more effective recommender algorithms on the plat-
form.
19 of 19
When Recommenders Met Big Data
An Architectural Proposal and Evaluation
Daniel Valcarce Javier Parapar ´Alvaro Barreiro
CERI 2014
3rd Spanish Conference on Information Retrieval
A Coru˜na, June 2014

Contenu connexe

Similaire à When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

B2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingB2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingSteve Feldman
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Databricks
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataEMC
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101QuantUniversity
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...MLconf
 
Top 3 design patterns in Map Reduce
Top 3 design patterns in Map ReduceTop 3 design patterns in Map Reduce
Top 3 design patterns in Map ReduceEdureka!
 
Dentsply Sirona Sinks their Teeth into Oracle Hyperion Performance Management
Dentsply Sirona Sinks their Teeth into Oracle Hyperion Performance ManagementDentsply Sirona Sinks their Teeth into Oracle Hyperion Performance Management
Dentsply Sirona Sinks their Teeth into Oracle Hyperion Performance ManagementDatavail
 
An IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search EngineAn IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search EngineMasud Rahman
 

Similaire à When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides] (20)

B2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingB2 2006 sizing_benchmarking
B2 2006 sizing_benchmarking
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Managing a Multi-Tenant Data Lake
Managing a Multi-Tenant Data LakeManaging a Multi-Tenant Data Lake
Managing a Multi-Tenant Data Lake
 
Proposal with sdlc
Proposal with sdlcProposal with sdlc
Proposal with sdlc
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big Data
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
cametrics-report-final
cametrics-report-finalcametrics-report-final
cametrics-report-final
 
An Analytics Platform for Connected Vehicles
An Analytics Platform for Connected VehiclesAn Analytics Platform for Connected Vehicles
An Analytics Platform for Connected Vehicles
 
Top 3 design patterns in Map Reduce
Top 3 design patterns in Map ReduceTop 3 design patterns in Map Reduce
Top 3 design patterns in Map Reduce
 
Dentsply Sirona Sinks their Teeth into Oracle Hyperion Performance Management
Dentsply Sirona Sinks their Teeth into Oracle Hyperion Performance ManagementDentsply Sirona Sinks their Teeth into Oracle Hyperion Performance Management
Dentsply Sirona Sinks their Teeth into Oracle Hyperion Performance Management
 
An IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search EngineAn IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search Engine
 
rerngvit_phd_seminar
rerngvit_phd_seminarrerngvit_phd_seminar
rerngvit_phd_seminar
 

Plus de Daniel Valcarce

Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesDaniel Valcarce
 
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...Daniel Valcarce
 
LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]
LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]
LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]Daniel Valcarce
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Daniel Valcarce
 
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...Daniel Valcarce
 
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...Daniel Valcarce
 
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...Daniel Valcarce
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Daniel Valcarce
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Daniel Valcarce
 
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]Daniel Valcarce
 

Plus de Daniel Valcarce (10)

Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
 
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
 
LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]
LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]
LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
 
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...
 
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
 
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
 

Dernier

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 

Dernier (20)

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 

When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

  • 1. When Recommenders Met Big Data An Architectural Proposal and Evaluation Daniel Valcarce Javier Parapar ´Alvaro Barreiro CERI 2014 3rd Spanish Conference on Information Retrieval A Coru˜na, June 2014
  • 2. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Table of Contents Introduction Motivation Recommender Systems Recommender System Architecture Overview Front-end Recommendation engine Storage Experiments and results Rating Insertion Recommendation Generation Recommendation Serving Conclusions and Future Work
  • 3. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Motivation According to Shareaholic, in 2013... web traffic generated by search engines dropped 6% social networks increased more than 100% Users... used to query what they want want personalised recommendations 1 of 19
  • 4. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Recommender Systems Objective Predict user preferences over items Approaches Content-based: uses properties of the items Collaborative filtering: based on similar users Hybrid approaches: combination of both 2 of 19
  • 5. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Recommender Systems Objective Predict user preferences over items Approaches Content-based: uses properties of the items Collaborative filtering: based on similar users Hybrid approaches: combination of both 2 of 19
  • 6. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Table of Contents Introduction Motivation Recommender Systems Recommender System Architecture Overview Front-end Recommendation engine Storage Experiments and results Rating Insertion Recommendation Generation Recommendation Serving Conclusions and Future Work
  • 7. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Our work Recommender architecture proposal for Big Data Detail specific technologies for each component Efficiency study of MySQL Cluster and Cassandra as alternatives for storing ratings and recommendations in the proposed architecture 3 of 19
  • 8. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Generic Recommender System Architecture Front-end Storage Recommendation engine 4 of 19
  • 9. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Our goals Scalability More machines → more computational power Big Data capable High availability Fault-tolerance No single point of failure 5 of 19
  • 10. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work 6 of 19
  • 11. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Our proposal: Front-end Use cases Search items Emit ratings Get recommendations Proposed architecture Distributed web application (Django) Redundant load balancers (Perlbal) Two levels of cache Reverse proxy cache (Varnish) Distributed memory cache (Memcached) 7 of 19
  • 12. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work 8 of 19
  • 13. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Our proposal: Recommendation Engine Recommendations are precalculated and stored A batch process refreshes the suggestions regularly Use of MapReduce distributed model State-of-the-art paradigm for large-scale data processing Hadoop: MapReduce open source implementation Mahout: scalable machine learning library 9 of 19
  • 14. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work 10 of 19
  • 15. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Our proposal: Storage Component I Information to be stored Common web application data (e.g., user profiles) Manage large amount of ratings and recommendations Data about items Requirements Read-scalable and fault-tolerance (replication) Write-scalable (sharding) Linear scalability with the number of nodes 11 of 19
  • 16. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Our proposal: Storage Component II Proposed technologies Relational database (MySQL Cluster) NoSQL column store (Cassandra) Inverted indexes (Solr) 12 of 19
  • 17. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work 13 of 19
  • 18. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Table of Contents Introduction Motivation Recommender Systems Recommender System Architecture Overview Front-end Recommendation engine Storage Experiments and results Rating Insertion Recommendation Generation Recommendation Serving Conclusions and Future Work
  • 19. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Experiment: storing ratings and recomendations Candidates MySQL Cluster Cassandra Netflix Prize Dataset 100M ratings 480k users 17.7k films Cluster configuration Number of machines: 4 Replication factor: 2 14 of 19
  • 20. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Rating Insertion Figure: Average insertion rate obtained by inserting from 10 to 100 million ratings using 8 concurrent petitions 0.00 0.05 0.10 0.15 0.20 0.25 0.30 1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08 miliseconds/insertion # ratings MySQL Cluster 8 Cassandra 8 15 of 19
  • 21. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Rating Insertion Figure: Average insertion rate obtained by inserting from 10 to 100 million ratings using 8, 16, 32 and 64 concurrent petitions 0.00 0.05 0.10 0.15 0.20 0.25 0.30 1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08 miliseconds/insertion # ratings MySQL Cluster 8 MySQL Cluster 16 MySQL Cluster 32 MySQL Cluster 64 Cassandra 8 Cassandra 16 Cassandra 32 Cassandra 64 15 of 19
  • 22. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Rating Insertion Figure: Average insertion rate obtained by inserting from 10 to 100 million ratings using 8, 16, 32 and 64 concurrent petitions 0.00 0.05 0.10 0.15 0.20 0.25 0.30 1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08 miliseconds/insertion # ratings MySQL Cluster 8 MySQL Cluster 16 MySQL Cluster 32 MySQL Cluster 64 Cassandra 8 Cassandra 16 Cassandra 32 Cassandra 64 15 of 19
  • 23. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Recommendation Generation Table: Times for Mahout’s Item-based Collaborative Filtering algorithm reading and writing directly to/from the database Storage Time Time per system (min) recommendation (ms) Cassandra 68.85 8.6 MySQL Cluster crash! crash! 16 of 19
  • 24. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Recommendation Generation Table: Times for Mahout’s Item-based Collaborative Filtering algorithm Storage Time Time per system (min) recommendation (ms) Cassandra 68.85 8.6 MySQL Cluster * 274.73 34.3 * Using Sqoop, a tool for transferring bulk data between Hadoop Distributed File System and relational databases. 17 of 19
  • 25. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Recommendation Serving Figure: Average serving rate obtained by querying the top 10 recommended items for 25 million users using 8, 16, 32 and 64 concurrent petitions 8 16 32 64 # threads 0.00 0.05 0.10 0.15 0.20 0.25 0.30 miliseconds/serving MySQL Cluster Cassandra 18 of 19
  • 26. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Table of Contents Introduction Motivation Recommender Systems Recommender System Architecture Overview Front-end Recommendation engine Storage Experiments and results Rating Insertion Recommendation Generation Recommendation Serving Conclusions and Future Work
  • 27. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work Conclusions and Future Work We have proposed a highly scalable and fault-tolerant platform for recommender systems. We have benchmarked Cassandra and MySQL Cluster in the context of recommender systems. Future: study and benchmark more parts of the proposed platform. Future: develop more effective recommender algorithms on the plat- form. 19 of 19
  • 28. When Recommenders Met Big Data An Architectural Proposal and Evaluation Daniel Valcarce Javier Parapar ´Alvaro Barreiro CERI 2014 3rd Spanish Conference on Information Retrieval A Coru˜na, June 2014