SlideShare une entreprise Scribd logo
1  sur  29
Content-based Recommendation
Systems
Group: Tippy
Group Members
• Nerin George
▫ Goal Models + Presentation
• Deepan Murugan
▫ Domain Models + Presentation
• Thach Tran
▫ Strategies + Presentation
Outline
• Introduction
• Item Representation
• User Profiles
▫ Manual Recommendation Methods
• Learning A User Model
▫ Classification Learning Algorithms
 Decision Trees and Rule Induction
 Nearest Neighbour Methods
• Conclusions
• Q & A
Introduction
• The WWW is growing exponentially. Many
websites become enormous in term of size and
complexity
• Users need help in finding items that are in
accordance with their interests
• Recommendation
– Content-based recommendation: recommend an
item to a user based upon a description of the
item and a profile of the user’s interests
Introduction
• Pazzani, M. J., & Billsus, D. (2007). Content-Based R
Related Research
• Recommender systems
▫ present items (e.g., movies, books, music, images,
web pages, news, etc.) that are likely of interest to the
user
▫ compare the user’s profile to some reference
characteristics to predict whether the user would be
interested in an unseen item
▫ Reference characteristics
 Information about the unseen item  content-based
approach
 User’s social environment  collaborative filtering
approach
Item Representation
• Items stored in a database table
• Structured data
▫ Small number of attributes
▫ Each item is described by the same set of attributes
▫ Known set of values that the attributes may have
• Straightforward to work with
▫ User’s profile contains positive rating for 1001, 1002, 1003
▫ Would the user be interested in say Oscars (French
cuisine, table service)?
ID Name Cuisine Service Cost
1001 Mike’s Pizza Italian Counter Low
1002 Chris’s Café French Table Medium
1003 Jacques Bistro French Table High
Item Representation
• Information about item could also be free text;
e.g., text description or review of the restaurant,
or news articles
• Unstructured data
▫ No attribute names with well-defined values
▫ Natural language complexity
 Same word with different meanings
 Different words with same meaning
• Need to impose structure on free text before it
can be used in recommendation algorithm
TF*IDF Weighting
• First, stemming is applied to get the root forms
of words
▫ “compute”, “computation”, “computer”,
“computes”, etc., are represented by one term
• Compute a weight for each term that represents
the importance or relevance of that term
TF*IDF Weighting
• Term frequency tft,d of a term t in a document d
• Inverse document frequency idft of a term t
• TF*IDF weighting
∑
=
k
dk
dt
dt
n
n
tf
,
,
,






=
t
t
df
N
idf log
( ) tdt idftfdtw ×= ,,
TF*IDF Weighting
• The term with highest weight occur more often in
that document than in other documents  more
central to the topic of the document
• Limitations
▫ This method does not capture the context in which
a word is used
▫ “This restaurant does not serve vegetarian dishes”
User Profiles
• A profile of the user’s interests is used by most
recommendation systems
• This profile consists of two main types of
information
▫ A model of the user’s preferences. E.g., a function
that for any item predicts the likelihood that the
user is interested in that item
▫ User’s interaction history. E.g., items viewed by a
user, items purchased by a user, search queries,
etc.
User Profiles
• User’s history will be used as training data for a
machine learning algorithm that creates a user
model
• “Manual” recommending approaches
▫ User customisation
 Provide “check box” interface that let the users
construct their own profiles of interests
 A simple database matching process is used to find
items that meet the specified criteria and
recommend these to users.
User Profiles
• Limitations
▫ Require efforts from users
▫ Cannot cope with changes in
user’s interests
▫ Do not provide a way to
determine order among
recommending items
User Profiles
• “Manual” recommending approaches
▫ Rule-based Recommendation
 The system has rules to recommend other products
based on user history
 Rule to recommend sequel to a book or movie to
customers who purchased the previous item in the
series
 Can capture common reasons for making
recommendations
Learning a User Model
• Creating a model of the user’s preference from the
user history is a form of classification learning
• The training data (i.e., user’s history) could be
captured through explicit feedback (e.g., user rates
items) or implicit observing of user’s interactions
(e.g., user bought an item and later returned it is a
sign of user doesn’t like the item)
• Implicit method can collect large amount of data but
could contains noise while data collected through
explicit method is perfect but the amount collected
could be limited
Learning a User Model
• Next, a number of classification learning
algorithms are reviewed
• The main goal of these classification learning
algorithms is to learn a function that model the
user’s interests
▫ Applying the function on a new item can give the
probability that a user will like this item or a
numeric value indicating the degree of interest in
this item
Decision Trees and Rule Induction
• Given the history of user’s interests as training data,
build a decision tree which represents the user’s
profile of interest
• Will the user like an inexpensive Mexican
restaurant?
Cuisine Service Cost Rating
Italian Counter Low Negative
French Table Med Positive
French Counter Low Positive
… … … …
Decision Trees and Rule Induction
• Well-suited for structured data
• In unstructured data, the number of attributes
becomes too enormous and consequently, the
tree becomes too large to provide sufficient
performance
• RIPPER: a rule induction algorithm based on the
same principles but provide better performance
in classifying text
Nearest Neighbour Methods
• Simply store all the training data in memory
• To classify a new item, compare it to all stored
items using a similarity function and determine
the “nearest neighbour” or the k nearest
neighbours.
• The class or numeric score of the previously
unseen item can then be derived from the class
of the nearest neighbour.
Nearest Neighbour Methods
• unseen item needed to be
classified
• positive rated items
• negative rated items
• k = 3: negative
• k = 5: positive
Nearest Neighbour Methods
• The similarity function depends on the type of
data
• Structured data: Euclidean distance metric
• Unstructured data (i.e., free text): cosine
similarity function
Euclidean Distance Metric
• Distance between A and B
• Attributes which are not measured quantitatively
need to be labeled by numbers representing
their categories
▫ Cuisine attribute: 1=Frech, 2=Italian, 3=Mexican.
Item Attr. X Attr. Y Attr. Z
A XA YA ZA
B XB YB ZB
( ) ( ) ( ) ( )222
, BABABA zzyyxxBAd −+−+−=
Cosine Similarity Function
• Vector space model
▫ An item or a document d is represented as a
vector
▫ wt,d is the tf*idf weight of a term t in a document d
• The similarity between two items can then be
computed by the cosine of the angle between
two vectors
[ ]T
dNddd www ,,2,1 ,,, =v
21
21
vv
vv ⋅
=θcos
Nearest Neighbour Methods
• Despite the simplicity of the algorithm, its
performance has been shown to be competitive
with more complex algorithms
Other Classification Learning
Algorithms
• Relevance Feedback and Rocchio’s Algorithm
• Linear Classifiers
• Probabilistic Methods and Naïve Bayes
Conclusions
• Can only be effective in limited circumstances. It is
not straightforward to recognise the subtleties in
content
• Depend entirely on previous selected items and
therefore cannot make predictions about future
interests of users
• These shortcomings can be addressed by
collaborative filtering (CF) techniques
• CF is the dominant technique nowadays thanks to
the popularity of Web 2.0/Social Web concept
• Many recommendation system utilise a hybrid of
content-based and collaborative filtering approaches
Summary
• Content-based Recommendation
• Item Representation
• User Profiles
▫ Manual Recommendation Methods
• Learning A User Model
 Decision Trees and Rule Induction
 Nearest Neighbour Methods
Q & A

Contenu connexe

Tendances

Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
Georgian Micsa
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
Liang Xiang
 

Tendances (20)

Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation System
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation system
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Recommender Systems - A Review and Recent Research Trends
Recommender Systems  -  A Review and Recent Research TrendsRecommender Systems  -  A Review and Recent Research Trends
Recommender Systems - A Review and Recent Research Trends
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 

En vedette

En vedette (15)

Sql developer usermanual_en
Sql developer usermanual_enSql developer usermanual_en
Sql developer usermanual_en
 
Introduction to recommendation system
Introduction to recommendation systemIntroduction to recommendation system
Introduction to recommendation system
 
Database development connection steps
Database development connection stepsDatabase development connection steps
Database development connection steps
 
Android ui layouts ,cntls,webservices examples codes
Android ui layouts ,cntls,webservices examples codesAndroid ui layouts ,cntls,webservices examples codes
Android ui layouts ,cntls,webservices examples codes
 
Combining content based and collaborative filtering
Combining content based and collaborative filteringCombining content based and collaborative filtering
Combining content based and collaborative filtering
 
Full xml
Full xmlFull xml
Full xml
 
Chapter 2 research methodlogy
Chapter 2 research methodlogyChapter 2 research methodlogy
Chapter 2 research methodlogy
 
Rest hello world_tutorial
Rest hello world_tutorialRest hello world_tutorial
Rest hello world_tutorial
 
Cs548 s15 showcase_web_mining
Cs548 s15 showcase_web_miningCs548 s15 showcase_web_mining
Cs548 s15 showcase_web_mining
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Tomcat + other things
Tomcat + other thingsTomcat + other things
Tomcat + other things
 
Sun==big data analytics for health care
Sun==big data analytics for health careSun==big data analytics for health care
Sun==big data analytics for health care
 
Big data-analytics-2013-peer-research-report
Big data-analytics-2013-peer-research-reportBig data-analytics-2013-peer-research-report
Big data-analytics-2013-peer-research-report
 
Android chapter18 c-internet-web-services
Android chapter18 c-internet-web-servicesAndroid chapter18 c-internet-web-services
Android chapter18 c-internet-web-services
 
Personalizing the web building effective recommender systems
Personalizing the web building effective recommender systemsPersonalizing the web building effective recommender systems
Personalizing the web building effective recommender systems
 

Similaire à Content based recommendation systems

Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
Salil Navgire
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
Ernesto Mislej
 

Similaire à Content based recommendation systems (20)

Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation Systems
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
 
Recommender system
Recommender system Recommender system
Recommender system
 
User personalization
User personalizationUser personalization
User personalization
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
 
Data Mining-2023 (2).ppt
Data Mining-2023 (2).pptData Mining-2023 (2).ppt
Data Mining-2023 (2).ppt
 
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppte3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”
 
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.pptweek1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
 
DIY ERM (Do-It-Yourself Electronic Resources Management) for the Small Library
DIY ERM (Do-It-Yourself Electronic Resources Management) for the Small LibraryDIY ERM (Do-It-Yourself Electronic Resources Management) for the Small Library
DIY ERM (Do-It-Yourself Electronic Resources Management) for the Small Library
 
Recommendation Systems : Selection vs Fulfillment
Recommendation Systems : Selection vs FulfillmentRecommendation Systems : Selection vs Fulfillment
Recommendation Systems : Selection vs Fulfillment
 
Data Mining Lecture_1.pptx
Data Mining Lecture_1.pptxData Mining Lecture_1.pptx
Data Mining Lecture_1.pptx
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
 

Dernier

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Dernier (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 

Content based recommendation systems

  • 2. Group Members • Nerin George ▫ Goal Models + Presentation • Deepan Murugan ▫ Domain Models + Presentation • Thach Tran ▫ Strategies + Presentation
  • 3. Outline • Introduction • Item Representation • User Profiles ▫ Manual Recommendation Methods • Learning A User Model ▫ Classification Learning Algorithms  Decision Trees and Rule Induction  Nearest Neighbour Methods • Conclusions • Q & A
  • 4. Introduction • The WWW is growing exponentially. Many websites become enormous in term of size and complexity • Users need help in finding items that are in accordance with their interests • Recommendation – Content-based recommendation: recommend an item to a user based upon a description of the item and a profile of the user’s interests
  • 5. Introduction • Pazzani, M. J., & Billsus, D. (2007). Content-Based R
  • 6. Related Research • Recommender systems ▫ present items (e.g., movies, books, music, images, web pages, news, etc.) that are likely of interest to the user ▫ compare the user’s profile to some reference characteristics to predict whether the user would be interested in an unseen item ▫ Reference characteristics  Information about the unseen item  content-based approach  User’s social environment  collaborative filtering approach
  • 7. Item Representation • Items stored in a database table • Structured data ▫ Small number of attributes ▫ Each item is described by the same set of attributes ▫ Known set of values that the attributes may have • Straightforward to work with ▫ User’s profile contains positive rating for 1001, 1002, 1003 ▫ Would the user be interested in say Oscars (French cuisine, table service)? ID Name Cuisine Service Cost 1001 Mike’s Pizza Italian Counter Low 1002 Chris’s Café French Table Medium 1003 Jacques Bistro French Table High
  • 8. Item Representation • Information about item could also be free text; e.g., text description or review of the restaurant, or news articles • Unstructured data ▫ No attribute names with well-defined values ▫ Natural language complexity  Same word with different meanings  Different words with same meaning • Need to impose structure on free text before it can be used in recommendation algorithm
  • 9. TF*IDF Weighting • First, stemming is applied to get the root forms of words ▫ “compute”, “computation”, “computer”, “computes”, etc., are represented by one term • Compute a weight for each term that represents the importance or relevance of that term
  • 10. TF*IDF Weighting • Term frequency tft,d of a term t in a document d • Inverse document frequency idft of a term t • TF*IDF weighting ∑ = k dk dt dt n n tf , , ,       = t t df N idf log ( ) tdt idftfdtw ×= ,,
  • 11. TF*IDF Weighting • The term with highest weight occur more often in that document than in other documents  more central to the topic of the document • Limitations ▫ This method does not capture the context in which a word is used ▫ “This restaurant does not serve vegetarian dishes”
  • 12. User Profiles • A profile of the user’s interests is used by most recommendation systems • This profile consists of two main types of information ▫ A model of the user’s preferences. E.g., a function that for any item predicts the likelihood that the user is interested in that item ▫ User’s interaction history. E.g., items viewed by a user, items purchased by a user, search queries, etc.
  • 13. User Profiles • User’s history will be used as training data for a machine learning algorithm that creates a user model • “Manual” recommending approaches ▫ User customisation  Provide “check box” interface that let the users construct their own profiles of interests  A simple database matching process is used to find items that meet the specified criteria and recommend these to users.
  • 14. User Profiles • Limitations ▫ Require efforts from users ▫ Cannot cope with changes in user’s interests ▫ Do not provide a way to determine order among recommending items
  • 15. User Profiles • “Manual” recommending approaches ▫ Rule-based Recommendation  The system has rules to recommend other products based on user history  Rule to recommend sequel to a book or movie to customers who purchased the previous item in the series  Can capture common reasons for making recommendations
  • 16. Learning a User Model • Creating a model of the user’s preference from the user history is a form of classification learning • The training data (i.e., user’s history) could be captured through explicit feedback (e.g., user rates items) or implicit observing of user’s interactions (e.g., user bought an item and later returned it is a sign of user doesn’t like the item) • Implicit method can collect large amount of data but could contains noise while data collected through explicit method is perfect but the amount collected could be limited
  • 17. Learning a User Model • Next, a number of classification learning algorithms are reviewed • The main goal of these classification learning algorithms is to learn a function that model the user’s interests ▫ Applying the function on a new item can give the probability that a user will like this item or a numeric value indicating the degree of interest in this item
  • 18. Decision Trees and Rule Induction • Given the history of user’s interests as training data, build a decision tree which represents the user’s profile of interest • Will the user like an inexpensive Mexican restaurant? Cuisine Service Cost Rating Italian Counter Low Negative French Table Med Positive French Counter Low Positive … … … …
  • 19. Decision Trees and Rule Induction • Well-suited for structured data • In unstructured data, the number of attributes becomes too enormous and consequently, the tree becomes too large to provide sufficient performance • RIPPER: a rule induction algorithm based on the same principles but provide better performance in classifying text
  • 20. Nearest Neighbour Methods • Simply store all the training data in memory • To classify a new item, compare it to all stored items using a similarity function and determine the “nearest neighbour” or the k nearest neighbours. • The class or numeric score of the previously unseen item can then be derived from the class of the nearest neighbour.
  • 21. Nearest Neighbour Methods • unseen item needed to be classified • positive rated items • negative rated items • k = 3: negative • k = 5: positive
  • 22. Nearest Neighbour Methods • The similarity function depends on the type of data • Structured data: Euclidean distance metric • Unstructured data (i.e., free text): cosine similarity function
  • 23. Euclidean Distance Metric • Distance between A and B • Attributes which are not measured quantitatively need to be labeled by numbers representing their categories ▫ Cuisine attribute: 1=Frech, 2=Italian, 3=Mexican. Item Attr. X Attr. Y Attr. Z A XA YA ZA B XB YB ZB ( ) ( ) ( ) ( )222 , BABABA zzyyxxBAd −+−+−=
  • 24. Cosine Similarity Function • Vector space model ▫ An item or a document d is represented as a vector ▫ wt,d is the tf*idf weight of a term t in a document d • The similarity between two items can then be computed by the cosine of the angle between two vectors [ ]T dNddd www ,,2,1 ,,, =v 21 21 vv vv ⋅ =θcos
  • 25. Nearest Neighbour Methods • Despite the simplicity of the algorithm, its performance has been shown to be competitive with more complex algorithms
  • 26. Other Classification Learning Algorithms • Relevance Feedback and Rocchio’s Algorithm • Linear Classifiers • Probabilistic Methods and Naïve Bayes
  • 27. Conclusions • Can only be effective in limited circumstances. It is not straightforward to recognise the subtleties in content • Depend entirely on previous selected items and therefore cannot make predictions about future interests of users • These shortcomings can be addressed by collaborative filtering (CF) techniques • CF is the dominant technique nowadays thanks to the popularity of Web 2.0/Social Web concept • Many recommendation system utilise a hybrid of content-based and collaborative filtering approaches
  • 28. Summary • Content-based Recommendation • Item Representation • User Profiles ▫ Manual Recommendation Methods • Learning A User Model  Decision Trees and Rule Induction  Nearest Neighbour Methods
  • 29. Q & A

Notes de l'éditeur

  1. nt,d is term count of t in d N is number of documents in the collection dft is number of documents that contains term t
  2. The term “vegetarian” might still have significant weight according to the method and the restaurant might get classified into a group of restaurants which serve vegetarian food.
  3. RIPPER is a rule induction algorithm closely related to decision trees that operates in a similar fashion to the recursive data partitioning approach described above. Despite the problematic inductive bias, however, RIPPER performs competitively with other state-of-the-art text classification algorithms. In part, the performance can be attributed to a sophisticated post-pruning algorithm that optimizes the fit of the induced rule set with respect to the training data as a whole. Furthermore, RIPPER supports multi-valued attributes, which leads to a natural representation for text classification tasks, i.e., the individual words of a text document can be represented as multiple feature values for a single feature. While this is essentially a representational convenience if rules are to be learned from unstructured text documents, the approach can lead to more powerful classifiers for semi-structured text documents. For example, the text contained in separate fields of an email message, such as sender, subject, and body text, can be represented as separate multi-valued features, which allows the algorithm to take advantage of the document’s structure in a natural fashion.
  4. These are more complex methods which have been described in the paper but we don’t have time to cover them in this presentation