2. Group Members
• Nerin George
▫ Goal Models + Presentation
• Deepan Murugan
▫ Domain Models + Presentation
• Thach Tran
▫ Strategies + Presentation
3. Outline
• Introduction
• Item Representation
• User Profiles
▫ Manual Recommendation Methods
• Learning A User Model
▫ Classification Learning Algorithms
Decision Trees and Rule Induction
Nearest Neighbour Methods
• Conclusions
• Q & A
4. Introduction
• The WWW is growing exponentially. Many
websites become enormous in term of size and
complexity
• Users need help in finding items that are in
accordance with their interests
• Recommendation
– Content-based recommendation: recommend an
item to a user based upon a description of the
item and a profile of the user’s interests
6. Related Research
• Recommender systems
▫ present items (e.g., movies, books, music, images,
web pages, news, etc.) that are likely of interest to the
user
▫ compare the user’s profile to some reference
characteristics to predict whether the user would be
interested in an unseen item
▫ Reference characteristics
Information about the unseen item content-based
approach
User’s social environment collaborative filtering
approach
7. Item Representation
• Items stored in a database table
• Structured data
▫ Small number of attributes
▫ Each item is described by the same set of attributes
▫ Known set of values that the attributes may have
• Straightforward to work with
▫ User’s profile contains positive rating for 1001, 1002, 1003
▫ Would the user be interested in say Oscars (French
cuisine, table service)?
ID Name Cuisine Service Cost
1001 Mike’s Pizza Italian Counter Low
1002 Chris’s Café French Table Medium
1003 Jacques Bistro French Table High
8. Item Representation
• Information about item could also be free text;
e.g., text description or review of the restaurant,
or news articles
• Unstructured data
▫ No attribute names with well-defined values
▫ Natural language complexity
Same word with different meanings
Different words with same meaning
• Need to impose structure on free text before it
can be used in recommendation algorithm
9. TF*IDF Weighting
• First, stemming is applied to get the root forms
of words
▫ “compute”, “computation”, “computer”,
“computes”, etc., are represented by one term
• Compute a weight for each term that represents
the importance or relevance of that term
10. TF*IDF Weighting
• Term frequency tft,d of a term t in a document d
• Inverse document frequency idft of a term t
• TF*IDF weighting
∑
=
k
dk
dt
dt
n
n
tf
,
,
,
=
t
t
df
N
idf log
( ) tdt idftfdtw ×= ,,
11. TF*IDF Weighting
• The term with highest weight occur more often in
that document than in other documents more
central to the topic of the document
• Limitations
▫ This method does not capture the context in which
a word is used
▫ “This restaurant does not serve vegetarian dishes”
12. User Profiles
• A profile of the user’s interests is used by most
recommendation systems
• This profile consists of two main types of
information
▫ A model of the user’s preferences. E.g., a function
that for any item predicts the likelihood that the
user is interested in that item
▫ User’s interaction history. E.g., items viewed by a
user, items purchased by a user, search queries,
etc.
13. User Profiles
• User’s history will be used as training data for a
machine learning algorithm that creates a user
model
• “Manual” recommending approaches
▫ User customisation
Provide “check box” interface that let the users
construct their own profiles of interests
A simple database matching process is used to find
items that meet the specified criteria and
recommend these to users.
14. User Profiles
• Limitations
▫ Require efforts from users
▫ Cannot cope with changes in
user’s interests
▫ Do not provide a way to
determine order among
recommending items
15. User Profiles
• “Manual” recommending approaches
▫ Rule-based Recommendation
The system has rules to recommend other products
based on user history
Rule to recommend sequel to a book or movie to
customers who purchased the previous item in the
series
Can capture common reasons for making
recommendations
16. Learning a User Model
• Creating a model of the user’s preference from the
user history is a form of classification learning
• The training data (i.e., user’s history) could be
captured through explicit feedback (e.g., user rates
items) or implicit observing of user’s interactions
(e.g., user bought an item and later returned it is a
sign of user doesn’t like the item)
• Implicit method can collect large amount of data but
could contains noise while data collected through
explicit method is perfect but the amount collected
could be limited
17. Learning a User Model
• Next, a number of classification learning
algorithms are reviewed
• The main goal of these classification learning
algorithms is to learn a function that model the
user’s interests
▫ Applying the function on a new item can give the
probability that a user will like this item or a
numeric value indicating the degree of interest in
this item
18. Decision Trees and Rule Induction
• Given the history of user’s interests as training data,
build a decision tree which represents the user’s
profile of interest
• Will the user like an inexpensive Mexican
restaurant?
Cuisine Service Cost Rating
Italian Counter Low Negative
French Table Med Positive
French Counter Low Positive
… … … …
19. Decision Trees and Rule Induction
• Well-suited for structured data
• In unstructured data, the number of attributes
becomes too enormous and consequently, the
tree becomes too large to provide sufficient
performance
• RIPPER: a rule induction algorithm based on the
same principles but provide better performance
in classifying text
20. Nearest Neighbour Methods
• Simply store all the training data in memory
• To classify a new item, compare it to all stored
items using a similarity function and determine
the “nearest neighbour” or the k nearest
neighbours.
• The class or numeric score of the previously
unseen item can then be derived from the class
of the nearest neighbour.
21. Nearest Neighbour Methods
• unseen item needed to be
classified
• positive rated items
• negative rated items
• k = 3: negative
• k = 5: positive
22. Nearest Neighbour Methods
• The similarity function depends on the type of
data
• Structured data: Euclidean distance metric
• Unstructured data (i.e., free text): cosine
similarity function
23. Euclidean Distance Metric
• Distance between A and B
• Attributes which are not measured quantitatively
need to be labeled by numbers representing
their categories
▫ Cuisine attribute: 1=Frech, 2=Italian, 3=Mexican.
Item Attr. X Attr. Y Attr. Z
A XA YA ZA
B XB YB ZB
( ) ( ) ( ) ( )222
, BABABA zzyyxxBAd −+−+−=
24. Cosine Similarity Function
• Vector space model
▫ An item or a document d is represented as a
vector
▫ wt,d is the tf*idf weight of a term t in a document d
• The similarity between two items can then be
computed by the cosine of the angle between
two vectors
[ ]T
dNddd www ,,2,1 ,,, =v
21
21
vv
vv ⋅
=θcos
25. Nearest Neighbour Methods
• Despite the simplicity of the algorithm, its
performance has been shown to be competitive
with more complex algorithms
27. Conclusions
• Can only be effective in limited circumstances. It is
not straightforward to recognise the subtleties in
content
• Depend entirely on previous selected items and
therefore cannot make predictions about future
interests of users
• These shortcomings can be addressed by
collaborative filtering (CF) techniques
• CF is the dominant technique nowadays thanks to
the popularity of Web 2.0/Social Web concept
• Many recommendation system utilise a hybrid of
content-based and collaborative filtering approaches
28. Summary
• Content-based Recommendation
• Item Representation
• User Profiles
▫ Manual Recommendation Methods
• Learning A User Model
Decision Trees and Rule Induction
Nearest Neighbour Methods
nt,d is term count of t in d
N is number of documents in the collection
dft is number of documents that contains term t
The term “vegetarian” might still have significant weight according to the method and the restaurant might get classified into a group of restaurants which serve vegetarian food.
RIPPER is a rule induction algorithm closely related to decision trees that operates in a similar fashion to the recursive data partitioning approach described above. Despite the problematic inductive bias, however, RIPPER performs competitively with other state-of-the-art text classification algorithms. In part, the performance can be attributed to a sophisticated post-pruning algorithm that optimizes the fit of the induced rule set with respect to the training data as a whole. Furthermore, RIPPER supports multi-valued attributes, which leads to a natural representation for text classification tasks, i.e., the individual words of a text document can be represented as multiple feature values for a single feature. While this is essentially a representational convenience if rules are to be learned from unstructured text documents, the approach can lead to more powerful classifiers for semi-structured text documents. For example, the text contained in separate fields of an email message, such as sender, subject, and body text, can be represented as separate multi-valued features, which allows the algorithm to take advantage of the document’s structure in a natural fashion.
These are more complex methods which have been described in the paper but we don’t have time to cover them in this presentation