Summary of a Recommender Systems Survey paper

Recommender Systems Survey
(Summary)
Changsung Moon
North Carolina State University

CONTENTS
1-1. Fundamentals
1. Foundations
1-2. Cold-start
1-3. Similarity Measures
2. Hybrid CBF/CF
2-1. Challenges ofCBF and CF
2-2. Hybrid Approaches
3. Trends
3-1. Introduction
3-2. Location-aware RS
3-3. Bio-inspired approaches
3-4. Conclusions
03
08
11
2-3. Social Filtering
4. References15

1. RS Foundations1-1. Fundamentals
Process is based on the following considerations
Considerations
The rest
sparsity level
performance of the system
Objective sought
predictions
top N recommendations
Employed tech
probabilistic approaches
Bayesian networks
nearest neighbors algorithm
Filtering algorithm
demographic
content-based
collaborative
Type of data
ratings
features
content
social relationship
location-aware info
social-based
context-aware
hybrid
neural networks
genetic algorithms
fuzzy models
SVD
Model
memory-based
model-based
desired quality of results

Filtering algorithms
Content-based filtering Collaborative filtering Demographic filtering
l Based on info about item itself,
usually keywords or phrases
occurring in the item
l Similarity btw two content items
is measured by measuring
similarity associated with their
term vectors
l User’s profile can be developed
by analyzing set of content the
user interacted with
l Enable you to compute the
similarities btw a user and
an item
l Common personal attributes
(sex, age, country, etc.) have
common preferences
l Based on interactions of users
l Users rate items, and CF finds
patterns in the way items have
been rated by the user and other
users to find additional items of
interest for a user
l Match a user’s metadata to that
of other similar users and
recommend items liked by them
l Two main approaches
l Memory-based
l Model-based

Two main approaches in Collaborative Filtering (CF)
Memory-based Model-based
l Use the matrix of user ratings for items of
the entire database to find users that are
similar to the active user, and use their
preferences to predict ratings for the active user
l Advantage
l Quality of predictions are rather good
l Relatively simple algorithm to implement for any situation
l New data can be added easily and incrementally
l Need not consider content of items
l Disadvantage
l It depends on human ratings
l Performance decreases when data gets sparse
l Prevent scalability and have problems with large datasets
l Find patterns based on training data, and
these are used to make predictions for real data
l Extract some info from dataset, and use that as
a “model” to make recommendations without
having to use complete dataset every time
l Advantage
l Handle sparsity better than memory based ones
l Scalable with large datasets
l Improve prediction speed
l Disadvantage
l Expensive model building
l Can lose useful info due to reduction models
l Approaches
l Linear algebra, Probabilistic methods, Neural networks,
Clustering, Latent classes, and so on

1. RS Foundations1-2. Cold-start
Cold-start problem
Cold-start
l New items and new users can cause the cold-start problem,
as there will be insufficient data on these new entries for CF
to work accurately
l Hybrid Filtering Researches
l Leung et al. [135]
- cross-levelassociation rules to integrate content info about domains items
l Kim et al. [118]
- use collaborative tagging by crawling the delicious site
l Weng et al. [228]
- combine implicit relations btw users’items preferences and additional
taxonomic preferences
l Loh et al. [140]
- present user’s profiles with info extracted from users’scientific publications
l Martinez et al. [148]
- hybrid RS which combines CF with knowledge-based one
l Chen and He [56]
- a number of common terms / term frequency (NCT/TF) CF based on
demographic vector
l Saranya and Atsuhiro [199]
- utilize latent features extracted from items
l Park et al. [173]
- use filterbots, and surrogate users that rate items based only on user or
item attributes

1. RS Foundations1-3. Similarity Measures
Similarity Measures (SM)
Memory-based Model-based Deal with cold-start
l Traditional
l Pearson correlation, Cosine, Euclidean,
Adjusted cosine, Constrained correlation,
Mean Squared Differences
l Researches
l Bobadilla et al. [31]
l Jaccard Mean Squared Differences
- use non-numerical info besides
using numerical info from ratings
l Ortega et al. [169]
l use Pareto dominance to eliminate
less representative users from
k-neighbor selection process
l SING (singularities)
- use info contained in votes of all
users, instead of restricting it to
ratings of two users compared or
two items compared
l Advantage
l Increase in accuracy, in performance
(time consuming) or in both
l Disadvantage
l Model must be regularly updated
in order to consider most recently
entered set of ratings
l Researches
l GEN – use genetic algorithms
l Researches
l Ahn [6]
l PIP – heuristic SM
l Heung-Nam et al. [98]
l UERROR – predict first actual
ratings and subsequently identify
prediction errors for each user
l NCS – based on neural learning
(model-based CF) and adapted
for new user cold-start situations
• (user to user) similarity btw pairs of users: compare ratings of all the items rated by two users
• (item to item) similarity btw pairs of items: compare ratings of all users who have rated two items

2. Hybrid CBF / CF2-1. Challenges
Challenges of CBF and CF
CBF CF
l Cannot predict quality of item
l How popular the item is?
l How a user will like the item?
l Difficult to acquire feedback from users because with CBF,
users do not typically rate items
l Limited content analysis
l In certain domains (e.g., music, blogs, and videos), it is a
complicated task to generate the attributes for items
l Overspecialization
l Users only receive recommendations for items that are very
similar to items they liked or prefered
l Data sparsity
l Many commercial RSs are based on large datasets. As a
result, the user-item matrix used for CF could be extremely
large and sparse
l Researches
- Dimensionality reduction techniques [202]
The reduction methods are based on
Matrix Factorization
- combine model-based tech Latent Semantic Index
(LSI) and reduction method Singular Value
Decomposition (SVD)
l Cold-start problem
l See the 1-2 slide, “1-2. Cold-start”
l Synonyms
l Same or very similar items having different names or entries
l Topic Modeling (like Latent Dirichlet Allocation tech) could
solve this by grouping different words belonging to the same
topic
l Shilling attacks
l People may give positive ratings for their own items and
negative ratings for their competitors

2. Hybrid CBF / CF2-2. Hybrid Approaches
Methods, Advantages and Trends
Methods Advantages Trend in CBF
l CF solves CBF's problems
l It can function in any domain
l It is less affected by overspecialization
l It acquires feedback from users
l CBF adds qualities to CF
l Improvement to quality of the
predictions, because they are calculated
with more information, and reduced
impact from cold-start and sparsity
problems
l Add social info to items
attributes such as tags,
comments, opinion and
social network sharing
l Tag RS
- RS tags attempt to provide
personalized item recommendations to
users through the most representative
tags
- combine clustering-based CBF with
CF to suggest new tags to users [130]
l Use of tags in the recommendation
process
- allows tags to be incorporated to
standard CF [219]
- incorporate tags and other metadata
into hybrid CBF/CF [39]
- combine graph-based tag
recommendations with user-based CF
and item-based CF [83]
- use tags to express which features of
an item users like or dislike [81]
- predict user preferences by only
using tagging history [82]
l Calculate CBF and CF separately and
subsequently combine them
l Incorporate CBF characteristics into CF
l Construct a unified model with both CBF
and CF characteristics
l Incorporate CF characteristics into CBF

2. Hybrid CBF / CF2-3. Social Filtering
Current Researches
Improvement in RS Create or enable RS Trust and Reputation
l Use social info to create or
enable RS
l Researches
l Siersdorfer and Sergei [210]
- predict utility of items, users or groups
based on multi-dimensional social
environment of a given user
- do a mining of rich set of structures
and social relationships that provides
folksonomies
l Li and Chen [137]
- blog recommendation that combines
trust model, social relation and
semantic analysis
l Jason [111]
- discover social networks between
mobile users
l Jyun and Chui [115]
- use trading relationship to calculate
level of recommendation for trusted
online auction sellers
l Dell’amico and Capra [69]
- users’trustworthiness has been
measured - two criteria:
taste similarity and social ties
l User trust
l calculate credibility of users through
info of rest of users or social network
l Item reputation
l calculate reputation of items through
feedback of users or studying how
users work with these items
l Researches
l Yuan et al. [239]
- choose trust aware RS to
demonstrate advantages by making
use of small-world nature of trust
network
l Li and Kao [138]
- RS based on trust of social
networks to enhance the quality of
peer production services
l Ma et al. [145]
- probabilistic factor analysis
framework, combining ratings
and trusted friends
- this framework can be applied to
pure user-item rating matrix
l Most of research work aims to obtain
improvements in the recommendations
made by referring to extra info provided
social info used
l Researches
• Woerndl and Groh [231]
- use social networks to enhance CF
• Arazy et al. [13]
- use data from online social networks
and electronic communication tools
• Xin et al. [233]
- exploit learners note taking activity
to enrich and extend the user profile
• Bonhard and Sasse [41]
- similarity and familiarity btw the user
and persons who have rated the
items can aid decision making
• Fengkun and Hong [75]
- incorporate users’preference ratings
and their social relationships into CF
• Carmagnola et al. [52]
- recommending content in social RS
based on social network structure and
influence relationship among users
• Ramaswamy et al. [189]
- analyze info such as address books
to estimate level of social affinity

3. Trends3-1. Introduction
Recommender systems trends
Trends
Shilling attack
generate many positive ratings for a product
Privacy and security
Knowledge-based filtering
use knowledge about users and products
to generate recommendations, reasoning
about what products meet the user’s
requirementsHybrid approach
use current databases to
simultaneously incorporate
memory-based, social and
content-based info
Workflow
user model is based on
“users-roles-tasks reference
Information”
Collection of implicit info
Peer-to-peer (P2P) networks
Incorporation of different types of info
e.g., explicit ratings, social relations, user contents,
locations, use trends, knowledge-based info
access to web sites, food purchased,
Use of public transport systems, etc
tradeoffs between accuracy and privacy
user info is based on distributed info

3. Trends3-2. Location-aware RS
Location-aware recommender systems
Geographic CF RSs Researches
l RS
l Traditional RS without using geographical info
l RS + G
• Traditional RS which contributes item’s geographical position
• Geographic Info does not play a part in recommendation
process
l GRS
l Geographic RS
l Ratings are made in a traditional way, whilst recommendations
are made by considering the geographical position of the user
l GRS+
l Users establish ratings on items by weighting the distance
between them and the items rated
l Researches
l Martinez et al. [149]
- examples of RS + G group
l Schlieder [205]
- modeling collaborative semantics of geographic
folksonomies based on analysis of tags that users
assign to composite objects
l Wan-Shiou et al. [225]
- hybrid content based/geographic RS that analyzes
a customer’s history and position so vendor info can
be ranked according to the match with preferences
of a customer
l Matyas and Schlieder [152]
- users’ratings are taken based on photos they have
downloaded and uploaded them to the same Web
(the photos have a GPS address associated to them)
- after this, search of k-neighborhoods based on this
data is carried out
l Travel GPS traces can be reinforced with social information
based on friends (GRS+)

3. Trends3-3. Bio-inspired approaches
Bio-inspired approaches (Model-based RS)
Genetic Algorithms (GA) Neural Networks (NN)
l GA have mainly been used in two aspects
l Clustering
- use common genetic clustering algorithms
such as GA-based K-means
l Hybrid user models
- chromosome structure can contain demographic
charateristics and/or those related to content-based
filtering
l Researches
• Dao et al. [68]
- Model-based CF using GA for location-based
advertisement
• Bobadilla et al. [33]
- use GA to create a similarity metric, weighting a set
of very simple similarity measures
• Hwang et al. [106]
- GA to learn personal preferences of customers
l Focus on hybrid RS, in which NNs are used to learn users
profiles, and have been used in clustering processes of some RS
l Researches
l Ren et al. [192]
- use Widrow-Hoff [229] algorithm to learn each user’s
profile from contents of rated items
l Christakou and Stafylopatis [62]
- use combination of CBF / CF RS
l Lee and Woo [133]
- all users are segmented by demographic characteristics
and users in each segment are clustered according to
preference of items using Self-Organizing Map(SOM) NN
Kohonon’s SOMs are a type of unsupervised learning
l Huang et al. [103]
- use training back-propagation NN for generating
association rules that are mined from transactional DB
l Roh et al. [193]
- combine CF with SOM and Case Based Reasoning
(CBR) by changing unsupervised clustering problem
into supervised user preference reasoning problem
l Sevarac et al. [207]
- use Neuro-fuzzy inference to create pedagogical
rules in e-learning
- new cold-start similarity measure has been perfected
using optimization based on neural learning
l Acilar and Arslan [2]
- CF based on Artificial immune network algorithm (aiNet)

3. Trends3-4. Conclusions
Genernations of RS
1st Generation
2nd Generation
l Use traditional websites to collect info from
l Content-based data from purchased or used products
l Demographic data collected in user’s records
l Memory-based data collected from user’s item preferences
l Focus on improving accuracy through filtering
l Extensively use web 2.0 by gathering social info
3rd Generation
l Will use web 3.0 through info provided by
integrated devices on the Internet
l Incorporate location info into existing
recommendation algorithms
Future Research
l Advancing existing methods and algorithms to improve quality of RS
l New lines of research
l Proper combination of existing recommendation methods that use different types of available information
l To get maximum use of individual potential of various sensors and devices on the Internet of Things
l Acquisition and integration of trends related to habits, consumption and tastes of individual users
l Data mining from RS databases for non-recommendation uses
(e.g., market research, general trends, visualization of differential characteristics of demographic groups)
l Enabling security and privacy for RS process
l New evaluation measures and developing a standard for non-standardized measures
l Designing flexible frameworks for automated analysis of heterogeneous data

4. References
References
[1] J. Bobadilla, F. Ortega, A. Hernando and A. Gutierrez, “Recommender Systems Survey,”
Knowledge Based Systems, Vol. 26, 2013, pp. 109-132.
[2] Book: Collective Intelligence in Action
[3] en.wikipedia.org/wiki/Collaborative_filtering
[4] www.cs.carleton.edu/cs_comps/0607/recommend/recommender/memorybased.html
[5] www.cs.carleton.edu/cs_comps/0607/recommend/recommender/modelbased.html

Summary of a Recommender Systems Survey paper

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Summary of a Recommender Systems Survey paper

Similaire à Summary of a Recommender Systems Survey paper (20)

Dernier

Dernier (20)

Summary of a Recommender Systems Survey paper