This document summarizes the development of a collaborative filtering algorithm to provide skin care product recommendations based on analyzing over 180,000 reviews from 32,000 reviewers of 1,200 products across 80 brands on Sephora.com. The algorithm uses Pearson's correlation coefficients to measure pairwise similarity between products based on user reviews. It provides accurate recommendations (86.3% on cross validation) and insights into customer loyalty patterns between brands.
3. What makes it so hard? Overwhelming information
So many products… So many reviews…
4. What makes it so hard? Overwhelming information
So many products… So many reviews…
Reviews can be so long…
5. What makes it so hard? Overwhelming information
So many products… So many reviews…
Reviews can be so long…
So many ingredients…
6. So many products… So many reviews…
Time
spent
Money
wasted
Happiness
What makes it so hard? Overwhelming information
Reviews can be so long…
So many ingredients…
7. 32k Reviewers
• w/ 2+ reviews
~1200 Products
• ~80 brands
• 8 categories
184k Reviews
• Rating [1-5]
• Review text
• Quick take
Collaborative Filter using User Reviews from Sephora.com
Product
X Y …
Reviewers
1 …
2 …
3 …
… …
… …
N …
Algorithm:
• Item-centric collaborative filter
• Pearson’s correlation coefficients
to measure pairwise similarity
8. 32k Reviewers
• w/ 2+ reviews
~1200 Products
• ~80 brands
• 8 categories
184k Reviews
• Rating [1-5]
• Review text
• Quick take
Collaborative Filter using User Reviews from Sephora.com
Product
X Y …
Reviewers
1 …
2 …
3 …
… …
… …
N …
Algorithm:
• Item-centric collaborative filter
• Pearson’s correlation coefficients
to measure pairwise similarity
Similarity = cXY =
(Xi - X)(Yi -Y )
N
å
i=1
N
å (X- X)2
(Y-Y )2
i i i=1
N
å
i=1
M
å / cij
recommendation scoreui = rujcij
j
9. 32k Reviewers
• w/ 2+ reviews
~1200 Products
• ~80 brands
• 8 categories
184k Reviews
• Rating [1-5]
• Review text
• Quick take
Collaborative Filter using User Reviews from Sephora.com
Product
X Y …
Reviewers
1 …
2 …
3 …
… …
… …
N …
Algorithm:
• Item-centric collaborative filter
• Pearson’s correlation coefficients
to measure pairwise similarity
Similarity = cXY =
(Xi - X)(Yi -Y )
N
å
i=1
N
å (X- X)2
(Y-Y )2
i i i=1
N
å
M
å / cij
Cross Validation
• 5-fold for reviewer
• Leave-one-out for product
• Accuracy = 86.3% ± 1%
i=1
recommendation scoreui = rujcij
j
10. Visualize the similarity matrix
White = high similarity
Black = low similarity
Sorted by brands
alphabetically
11. White in a square
=
Users reviews are similar
for all products in a brand
=
Strong customer loyalty
There are structures!
13. There are structures! For example…
Expensive!
“Organic
& Natural”
Cost effective
Actionable Insights
For Sephora.com:
Send marketing emails to
new customers of brands
with stronger customer
loyalty!
14. Chang Liu
PhD. in Civil Engineering @CMU
J8D8L5@gmail.com
linkedin.com/in/changliucmu
github.com/R4trtry
15. Is the rating a good measure of reviewers’ perspective?
• Trained a NaïveBaysian classifier for
sentiment analysis
• W/ 250 thousand reviews from
Birchbox.com
• A website that sends out free
samples from smaller brands and
gathers massive user reviews
Most common words Most informative feature
Word Count Negative Positive
skin 91349 re-wash Penny
product 82481 garbage hook
use 64044 mediocre gorgeous
love 55691 ketchup perk
feel 47879 trash stock
face 42615 unimpressive glowing
like 41427 survey splurge
great 34155 ineffective effortless
really 31672 gag Christmas
smell 27621 worthless happily
text quick take
Precision 95.3% 85.4%
Recall 89.8% 93.1%
Worth
every
penny!
Another Validation
17. Product X
Algorithm: Item-centric collaborative filter
similarity
87.4%
Product Y
Product X
Product Y
1
1
1
1
1
1
1
1
1
1
1
1
Reviewers
Product
X Y …
1 …
2 …
3 …
… …
… …
N …
M products reviewed by N reviewers
Pairwise similarities are measured
by Pearson's correlation coefficients:
cXY =
(Xi - X)(Yi -Y )
N
å
i=1
N
å (X- X)2
(Y-Y )2
i i i=1
N
å
i=1
Then weight the ratings
based on the correlation coefficients:
Scorei =
cijr uj
M
å
j
| cij |
ruj : User u's preference on item j
Notes de l'éditeur
Hi My name is Chang. I created true fit skin care, a web-app that recommend skin care products for you. I’m not an expert
As you can see, the background is crowded with skin care products in boxes, bottles and jars. This is what it looks like in out bathroom. My wife
Estee lauder, is not doing so well, it’s a bit expensive, so there are actually very small number of reviews per product.
FAB, instead, is very cost effective, therefore has pretty good customer loyalty.
Origins, on the other hand, makes products with organic and natural ingredients. Therefore, customer who likes their product are paying for this natural concept.
Estee lauder, is not doing so well, it’s a bit expensive, so there are actually very small number of reviews per product.
FAB, instead, is very cost effective, therefore has pretty good customer loyalty.
Origins, on the other hand, makes products with organic and natural ingredients. Therefore, customer who likes their product are paying for this natural concept.
And this is me, just finiwww.linkedin.com/in/changliucmu/shed phd in civil engineering at carnegie Mellon University. I studied pipe monitoring using data driven approach. The image here shows the transmission pipe lines across the US.