Slides of the CERI 2014 paper:
Daniel Valcarce, Javier Parapar, Álvaro Barreiro. When Recommenders Met Big Data: an Architectural Proposal and Evaluation. Proceedings of the 3rd Spanish Conference on Information Retrieval, CERI 2014, pp. 73-84, A Coruña, Spain, 19 - 20 June, 2014. ISBN 978-84-9749-591-2.
http://www.dc.fi.udc.es/~dvalcarce/pubs/valcarce-etal-ceri2014.pdf
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]
1. When Recommenders Met Big Data
An Architectural Proposal and Evaluation
Daniel Valcarce Javier Parapar ´Alvaro Barreiro
CERI 2014
3rd Spanish Conference on Information Retrieval
A Coru˜na, June 2014
2. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Table of Contents
Introduction
Motivation
Recommender Systems
Recommender System Architecture
Overview
Front-end
Recommendation engine
Storage
Experiments and results
Rating Insertion
Recommendation Generation
Recommendation Serving
Conclusions and Future Work
3. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Motivation
According to Shareaholic, in 2013...
web traffic generated by search engines dropped 6%
social networks increased more than 100%
Users...
used to query what they want
want personalised recommendations
1 of 19
4. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommender Systems
Objective
Predict user preferences over items
Approaches
Content-based: uses properties of the items
Collaborative filtering: based on similar users
Hybrid approaches: combination of both
2 of 19
5. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommender Systems
Objective
Predict user preferences over items
Approaches
Content-based: uses properties of the items
Collaborative filtering: based on similar users
Hybrid approaches: combination of both
2 of 19
6. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Table of Contents
Introduction
Motivation
Recommender Systems
Recommender System Architecture
Overview
Front-end
Recommendation engine
Storage
Experiments and results
Rating Insertion
Recommendation Generation
Recommendation Serving
Conclusions and Future Work
7. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our work
Recommender architecture proposal for Big Data
Detail specific technologies for each component
Efficiency study of MySQL Cluster and Cassandra as alternatives for
storing ratings and recommendations in the proposed architecture
3 of 19
8. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Generic Recommender System Architecture
Front-end
Storage
Recommendation
engine
4 of 19
9. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our goals
Scalability
More machines → more computational power
Big Data capable
High availability
Fault-tolerance
No single point of failure
5 of 19
11. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our proposal: Front-end
Use cases
Search items
Emit ratings
Get recommendations
Proposed architecture
Distributed web application (Django)
Redundant load balancers (Perlbal)
Two levels of cache
Reverse proxy cache (Varnish)
Distributed memory cache (Memcached)
7 of 19
13. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our proposal: Recommendation Engine
Recommendations are precalculated and stored
A batch process refreshes the suggestions regularly
Use of MapReduce distributed model
State-of-the-art paradigm for large-scale data processing
Hadoop: MapReduce open source implementation
Mahout: scalable machine learning library
9 of 19
15. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our proposal: Storage Component I
Information to be stored
Common web application data (e.g., user profiles)
Manage large amount of ratings and recommendations
Data about items
Requirements
Read-scalable and fault-tolerance (replication)
Write-scalable (sharding)
Linear scalability with the number of nodes
11 of 19
16. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our proposal: Storage Component II
Proposed technologies
Relational database (MySQL Cluster)
NoSQL column store (Cassandra)
Inverted indexes (Solr)
12 of 19
18. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Table of Contents
Introduction
Motivation
Recommender Systems
Recommender System Architecture
Overview
Front-end
Recommendation engine
Storage
Experiments and results
Rating Insertion
Recommendation Generation
Recommendation Serving
Conclusions and Future Work
19. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Experiment: storing ratings and recomendations
Candidates
MySQL Cluster
Cassandra
Netflix Prize Dataset
100M ratings
480k users
17.7k films
Cluster configuration
Number of machines: 4
Replication factor: 2
14 of 19
20. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Rating Insertion
Figure: Average insertion rate obtained by inserting from 10 to 100 million
ratings using 8 concurrent petitions
0.00
0.05
0.10
0.15
0.20
0.25
0.30
1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08
miliseconds/insertion
# ratings
MySQL Cluster 8 Cassandra 8
15 of 19
21. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Rating Insertion
Figure: Average insertion rate obtained by inserting from 10 to 100 million
ratings using 8, 16, 32 and 64 concurrent petitions
0.00
0.05
0.10
0.15
0.20
0.25
0.30
1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08
miliseconds/insertion
# ratings
MySQL Cluster 8
MySQL Cluster 16
MySQL Cluster 32
MySQL Cluster 64
Cassandra 8
Cassandra 16
Cassandra 32
Cassandra 64
15 of 19
22. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Rating Insertion
Figure: Average insertion rate obtained by inserting from 10 to 100 million
ratings using 8, 16, 32 and 64 concurrent petitions
0.00
0.05
0.10
0.15
0.20
0.25
0.30
1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08
miliseconds/insertion
# ratings
MySQL Cluster 8
MySQL Cluster 16
MySQL Cluster 32
MySQL Cluster 64
Cassandra 8
Cassandra 16
Cassandra 32
Cassandra 64
15 of 19
23. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommendation Generation
Table: Times for Mahout’s Item-based Collaborative Filtering algorithm
reading and writing directly to/from the database
Storage Time Time per
system (min) recommendation (ms)
Cassandra 68.85 8.6
MySQL Cluster crash! crash!
16 of 19
24. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommendation Generation
Table: Times for Mahout’s Item-based Collaborative Filtering algorithm
Storage Time Time per
system (min) recommendation (ms)
Cassandra 68.85 8.6
MySQL Cluster * 274.73 34.3
* Using Sqoop, a tool for transferring bulk data between Hadoop
Distributed File System and relational databases.
17 of 19
25. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommendation Serving
Figure: Average serving rate obtained by querying the top 10 recommended
items for 25 million users using 8, 16, 32 and 64 concurrent petitions
8 16 32 64
# threads
0.00
0.05
0.10
0.15
0.20
0.25
0.30
miliseconds/serving
MySQL Cluster
Cassandra
18 of 19
26. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Table of Contents
Introduction
Motivation
Recommender Systems
Recommender System Architecture
Overview
Front-end
Recommendation engine
Storage
Experiments and results
Rating Insertion
Recommendation Generation
Recommendation Serving
Conclusions and Future Work
27. Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Conclusions and Future Work
We have proposed a highly scalable and fault-tolerant platform for
recommender systems.
We have benchmarked Cassandra and MySQL Cluster in the context
of recommender systems.
Future: study and benchmark more parts of the proposed platform.
Future: develop more effective recommender algorithms on the plat-
form.
19 of 19
28. When Recommenders Met Big Data
An Architectural Proposal and Evaluation
Daniel Valcarce Javier Parapar ´Alvaro Barreiro
CERI 2014
3rd Spanish Conference on Information Retrieval
A Coru˜na, June 2014