Recommender.system.presentation.pjug.01.21.2014

Introduction to Recommender Systems
Bob Brehm

What is a recommender?
● Wikipedia [3]:
● A subclass of [an] information filtering system that seek to
predict the 'rating' or 'preference' that user would give to
an item
● My addition: A subclass of machine-learning.
● Recommender model [2]:
● Users
● Items
● Ratings
● Community

What is a recommender [2]
● Simple use case
● Tripadviser hotel recommender. Hotels
ratings are averaged, and then the user
examines the ratings.
● More complex use case
● News article recommender. A user model
is built of user preferences. The items
attributes are ranked. A matrix operation
is performed to predict recommendations
for a given user.

History of recommenders
● Konstan - Of Ants and Caveman [1]
● Ants will follow a chemical trail – similar to
recommenders
● Paleolithic ancestors would follow
recommendations on edible items vs.
poisonous items
● Suggests psychological and/or biological
need to follow recommendations

● Manual filtering
● Usenet – recommenders based on TFIDF
and user profile. [2]
● PARC tapestry (1992) - database of
comments and contents. [1]
● Active CF (1005) – forwarding content to
relevant readers. [1]

● Automatic filtering [2]
● Grouplens project (1994) – User ratings for
Usenet, Nearest Neigbor algorithm
● Commercial era
– Movielens.org
– Phoaks system – helped users locate
information on web using collaborative
filtering.
– Cdnow – dot.bomb purchased by Amazon

● Automatic filtering [3]
● Netflix Prize (2006-2009) – sought to
award a prize for improving the predictive
ability to match a user's preferences to
movie selections by 10%.
● Neflix awared a $1M prize to “BellKor’s
Pragmatic Chaos” - a team from Bell
Labs.
● BellKor’s Pragmatic Chaos blended earlier
work with better predictive models to win.

Recommender types
● Non-personalized recommender [2]
● Simple average of item ratings
● Can be misleading lacking context. What if
favorite sauce is ketchup and you order
ice cream?
● If X then Y recommenders can be
improved by considering X! then Y
● Example: Zagat restaurant ratings,
Tripadvisor hotel ratings

Recommender types
● Content-based filtering (user-item) [2]
● Model items by attribute keywords. Each item then has
a position in the keyword vector space.
● Model user test profile by attribute keywords. The user
profile also has a position in the keyword vector
space.
● The relevance ranking is the cosine between these
vectors.
● Factor in item ratings by threshold, +/- weight, etc.
● Term Frequency – Inverse Document Frequency (TF-
IDF) to represent items in vector.
● Multi-linear regression for analysis.

Recommender types
● Collaborative filtering (user-user, item-item) [2]
● User-user CF is used extensively for social media
friends linking – Facebook, LinkedIn, etc. [1][4]

Recommender types
● Hybrid Recommender [3]
● Combination of collaborative and content-
based filtering[3]
● Netflix uses a hybrid system – they use
collaborative filtering to find similar user
habits and content filtering to find similar
items. [2]
● Hybrids exist to overcome inherent
difficulties such as the cold-start problem
which is how to deal with a new user or
new item. [4]

Algorithms
● Simple averages (Non-personalized)
● Cosine similarity (Content-based)
● TF-IDF (Content-based)
● Multi-linear regression (Content-based)
● Pearson Correlation (Collaborative)
● K-nearest neighbor (Collaborative)

Lenskit
● Lenskit is a recommender system open-source tool suite
that can be used for production but is primarily useful for
research and prototyping IMHO.
● Features of lenskit:
● Mavenized project including goals and archetypes.
● Data Access Objects (DAOs) and cursors.
● ItemScorer – implement this however you want.
● RatingPredictor – output is in the desired scale.
● ItemRecommender – provides Top-N
recommendations.

Lenskit
● Features of Lenskit (cont.):
● Handy annotation classes.
● Support for Groovy.
● Post processing in R.
● MovieLens data sets (through Grouplens
Research)
● Support for sparse matrices.
● Speed optimizations and profiling.

Mahout
● Started as a subproject of Lucene in 2008.
● Idea behind Mahout is that is provides a
framework for the development and
deployment of Machine Learning
algorithms.
● Currently it has three distinct branches:
● Classification
● Clustering
● Recommenders

Mahout
● Support for recommenders include:
● Data model – provides connections to data
● UserSimilarity – provides similarity to users
● ItemSimilarity – provides similarity to items
● UserNeighborhood – find a neighborhood (mini cluster) of
like-minded users.
● Recommender – the producer of recommendations.
● Algorithms!

Implementing Recommenders
● Decide whether you want to make or buy.
There are commercial companies out there
that already do this.
● If you decide to make some hints:
● Get your user to login.
● Build the user's profile explicitly through preferences
and implicitly through logging.
● Choose the simplest algorithms that get the job done.
● Test, test, test.

Thanks
● A special thanks to Professor Joseph
Konstan of the University of Minnesota who
has put together an excellent MOOC called
“Intro to Recommenders” through
Coursera. Some of the material in this
presentation is based on that class.
● Thanks for your time!

References
● [1] Introduction to recommender systems. Joseph Konstan. Sigmod
2008.
● [2] Intro to recommendations. Coursera. Retrieved from
https://class.coursera.org.
● [3] Recommender system. Wikipedia.
● [4] An Algorithmic Framework for Performing Collaborative Filtering.
● [5] Hybrid Web Recommender Systems.

Recommender.system.presentation.pjug.01.21.2014

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à Recommender.system.presentation.pjug.01.21.2014

Similaire à Recommender.system.presentation.pjug.01.21.2014 (20)

Dernier

Dernier (20)

Recommender.system.presentation.pjug.01.21.2014