11. 电子商务推荐系统输出
建议(Suggestion)
– 单个建议(Single Item)
– 未排序建议列表(Unordered List)
– 排序建议列表(Ordered List)
预言(Prediction):系统对给定项目的总体评分
个体评分(Individual R i ):
个体评分(I di id l Rating):输出其他客户对商品的个体
评分
评论(Review):
评论(R i ):输出其他客户对商品的文本评价
11
2009-12-20
38. 基于User-Based的用户相似度算法
余弦相似性
n
r r
r r
uv ∑R ui Rvi
sim(u , v) = cos(u , v) = uu ur =
r i =1
u×v n n
∑R ∑R
i =1
ui
2
i =1
vi
2
相关相似性(Pearson相关系数 )
sim(u , v) =
∑ i∈Iuv
( Rui − Ru )( Rui − Rv )
∑ i∈I uv
( Rui − Ru ) 2 ∑ i∈I uv
( Rvi − Rv ) 2
修正的余弦相似性
sim(u , v) =
∑ i∈I uv
( Rui − Ri )( Rui − Ri )
∑ i∈I uv
( Rui − Ri ) 2 ∑ i∈Iuv
( Rvi − Ri ) 2
38
39. User-Based的相似性算法-余弦相似性
Similarity between items i & j is computed
by isolating the users who have rated them
and then applying a similarity computation
i i i i i
technique.
Cosine-based Si il it – it
C i b d Similarity items are vectors
t
in the m dimensional user space
(difference in rating scale between users is
not taken into account).
39
40. User-Based的相似性算法-相关相似性
Correlation-based Similarity - using the
Pearson-r correlation (used only in cases
where the uses rated both item I & item j).
R(u,i) = rating of user u on item i.
R(i) = average rating of the i-th item.
40
41. User-Based的相似性算法-修正的余弦相似性
Adjusted Cosine Similarity – each pair in the
co-rated set corresponds to a different user.
p
(takes care of difference in rating scale).
R(u,i) = rating of user u on item i.
R(u) = average of the u-th user.
41
48. 协同过滤推荐系统常见问题
Cold Start: There needs to be enough other users
already in the system to find a match.
Sparsity: If there are many items to be
recommended, even if there are many users, the
user/ratings matrix is sparse, and it is hard to
find users that have rated the same items.
First Rater: Cannot recommend an item that has not
been previously rated.
– New items
– Esoteric items
Popularity Bias: Cannot recommend items to someone
with unique tastes.
– Tends to recommend popular items.
48 48
60. 参考资料
Wiki:
– http://en.wikipedia.org/wiki/Collaborative_filt
ering
– http://en.wikipedia.org/wiki/Web_analytics
– http://en.wikipedia.org/wiki/Recommendation_sys
tem
图书
– Programming Collective Intelligence: Building Smart Web 2.0
Applications
– Web Analytics: An Hour a Day
– Data Mining:Concepts and Techniques
– Mining the Web:Transforming Customer Data into Cutomer Value
– Data Mining Techniques: For Marketing, Sales, and Customer
Relationship Management
60
61. 参考资料
开源项目
– Open Source Collaborative Filtering Written in
Java
– Carrot2 Clustering Engine
– Weka 3: Data Mining Software in Java
– Taste
61
62. 参考资料
Blog
– http://glinden.blogspot.com/
– http://www.kaushik.net/avinash
– http://guwendong.cn/
– http://www.weigend.com/
– http://www.chinawebanalytics.cn/
– 数学之美系列
– Mining Social Data for Fun and Insight
62