3. • Sentiment Analysis
Clustering
• Collaborative-based filtering
• Item based
• User Based
Recommendation
Similarity
Measurement–Pearson, Tanimoto
Algorithm - K-means
Similarity Measurement - Euclidean
Classification NLP
• Content-based filtering
• Regression
• Decision Tree
• SVM
• NN
• Voice Recognition
• Video Analytics
4. Content Based, Collaborative
Filtering[CF] and Hybrid
Recommendation System
• Content Based systems focus on properties of items.
Similarity of items is determined by measuring the
similarity in their properties.
Needs History Data.
• Collaborative-Filtering systems focus on the
relationship between users and items. Similarity of
items is determined by the similarity of the ratings of
those items by the users who have rated both items.
Source-http://infolab.stanford.edu/~ullman/mmds/ch9.pdf
5. How users are similar?
CF - User Similarity
Similarity Notion
User
Neighborhood
User Based
Recommender
#1 #2 #3
User Id Item Id Rating
Data Model
6. CF - Item Similarity
How items are similar?
Similarity Notion
Item Based
Recommender#1 #2 #3
User Id Item Id Rating
Data Model
Item-neighborhood
Source-http://www.theregister.co.uk/2006/08/15/beer_diapers/
7. Similarity Notion
• Pearson Correlation - measures the tendency of the numbers[User
Preferences] to move together proportionally. When this tendency is high, the correlation is
close to 1
• Spearman Correlation – Rank based on user preference
• Euclidean Distance - based on the distance between users. Smaller the
distance, more similarity in users.
• Tanimoto Coefficient – based on number of items in common
• LogLikelihood Similarity
How to code?
8. How Similarity Definition affects
Neighborhood formation?
Source: http://www.slideshare.net/Cataldo/apache-mahout-tutorial-recommendation-20132014
Mahout In Action
Threshold based
neighborhood
9. Evaluation
• Evaluate Top n Recommendations
• Precision and Recall
Relevant Non Relevant
Search Result Shown
True Positive False Positive
Search result Not Shown
False Negative True Negative
Source-https://en.wikipedia.org/wiki/Precision_and_recall
10. System Solutioning - More than
Algorithm Accuracy
• Business Goal Injection
• Novelty – avoiding repeated recommendations
• Diversity – How diverse are recommended items?
Does it include all sub topics?
• Positive Feedback
• Negative Feedback
source: http://www.slideshare.net/Zhenv5/diversity-and-novelty-for-recommendation-system
11. Technology
• Mahout –
Hadoop(optional), Java.
Lot of stable algorithms.
• R
Rhadoop
Lot of Statistics packages.
• Spark
Emerging Technology
Algorithms are getting added
Notes de l'éditeur
Recommendation borrows lot of statistical methods and ML techniques. So same algo can occur in clustering or new one. Algo we disccuss today and frequqnetly used one.
How many of you like gmail, android and google search. Tech Talk, Technical Blogs, Books. I do not know buying behavior, but social behavior.
Facebook news. Tech talkl, blogs, reading books. Next certification cource.
Baby items. Beer and diaper. Evaluate both User Similarity vs Item Similarity. How to keep items in same isle.
School of thought. Preference values. Rank.
Baby items for old age people, or gap between two kids, two laptop purshases. Google searches.