GreenMonster

Predict which music artists a user has listened
to
Yangjun Wang, Yang Zhao
Department of Information and Computer Science
Aalto University, School of Science and Technology
yangjun.wang@aalto.ﬁ, yang.zhao@aalto.ﬁ

Predict which music artists a user has listened to
May 12, 2014
2/13
Algorithms we tried
Content based algorithm
Collaborative ﬁltering
KNN

May 12, 2014
3/13
Main Idea
There are various genres in music, such as jazz, rock, pop...
A user has a special preference to certain genres (θ)
Each song has weights of being diﬀerent genres (x) namely,
the metadata.
Based on this, we can build a linear regression model
y = θT
x
where y represents how may times the user has listened to this
song.

May 12, 2014
4/13
Example
The weights(x) of a song is Xi = (0.8, 0.2, 0)T .
A user’s preference to these three genres is Θi = (10, 0, 0)T .
The probability that this user has listened the artist would be
yi = θT
i xi = 8
So this user has listened to this song for 8 times.

May 12, 2014
5/13
More details
How do we do with missing values?
We ignore them when we are computing the error
The complete model is
E =
i
(θT
i xi − yi )
2
ri
where ri = 1 if that value is not missing.

May 12, 2014
6/13
Result
Does this work?
We don’t know.
Because in metadata, each song has more than 10k weights
But we have only 4k observations for each user
It’s impossible to solve these equations

May 12, 2014
7/13
Main idea
Genres provided in metadata may be too much
Maybe none of those song belongs to that genre
Why don’t we construct an imaginary metadata?

May 12, 2014
8/13
Implementation
Randomize m k-dimensional vector. k is the number of genres
we suppose these songs will have. m is number of songs.
Randomize n k-dimensional vector. Each vector is a
preference for a user. n stands for the number of users.
Compute gradients:
∂E
∂Θ
,
∂E
∂X
Update Θ, X separately.
Check if error function converges.

May 12, 2014
9/13
More details
To prevent over-fitting, we also used regularization factors in
error function.
Since the final submission requires us to submit 0,1 data, we
have to set a threshold
So we have to set 3 magic numbers, k: number of genres, λ:
regularization coefficient, t: threshold value
Since we know the ration of 0 vs 1 is 4:6, we set the threshold
to let the final prediction has this ratio.

May 12, 2014
10/13
Comments
This method builds a profile for each user, it doesn’t consider
the similarity between users.
The error function focus on global error, namely it assigns
equal importance to every observations. However, it’s
different if a user listened to a song for 100 times from 1 time.
There are so many zeros in the training data. That is a strong
interfere. This method fits prediction like movie ratings better,
namely, rating can only be 1-5 stars. In other words, when the
rating can not be zero and the scale of the rating is similar.

May 12, 2014
11/13
Result

May 12, 2014
12/13
Logistic Regression
It is similar with Linear Regression.
The cost function and gradient is diﬀerent.
y =
1
1 + e−θT x

May 12, 2014
13/13
KNN
Denotation
Y : the given user-artists matrix (Y(Y>1)=1)
K : the size of neighbors set
Main idea
For each row r of Y, get the K-nearest neighbors(ignore the
missing variables).
Set the missing variables in r as the average of the
corresponding variables(same column) in its K-neighbors.

GreenMonster

Recommandé

Recommandé

Contenu connexe

Similaire à GreenMonster

Similaire à GreenMonster (7)

GreenMonster