SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
Predict which music artists a user has listened
to
Yangjun Wang, Yang Zhao
Department of Information and Computer Science
Aalto University, School of Science and Technology
yangjun.wang@aalto.fi, yang.zhao@aalto.fi
Predict which music artists a user has listened to
May 12, 2014
2/13
Algorithms we tried
Content based algorithm
Collaborative filtering
KNN
Predict which music artists a user has listened to
May 12, 2014
3/13
Content based algorithm
Main Idea
There are various genres in music, such as jazz, rock, pop...
A user has a special preference to certain genres (θ)
Each song has weights of being different genres (x) namely,
the metadata.
Based on this, we can build a linear regression model
y = θT
x
where y represents how may times the user has listened to this
song.
Predict which music artists a user has listened to
May 12, 2014
4/13
Content based algorithm
Example
The weights(x) of a song is Xi = (0.8, 0.2, 0)T .
A user’s preference to these three genres is Θi = (10, 0, 0)T .
The probability that this user has listened the artist would be
yi = θT
i xi = 8
So this user has listened to this song for 8 times.
Predict which music artists a user has listened to
May 12, 2014
5/13
Content based algorithm
More details
How do we do with missing values?
We ignore them when we are computing the error
The complete model is
E =
i
(θT
i xi − yi )
2
ri
where ri = 1 if that value is not missing.
Predict which music artists a user has listened to
May 12, 2014
6/13
Content based algorithm
Result
Does this work?
We don’t know.
Because in metadata, each song has more than 10k weights
But we have only 4k observations for each user
It’s impossible to solve these equations
Predict which music artists a user has listened to
May 12, 2014
7/13
Collaborative filtering
Main idea
Genres provided in metadata may be too much
Maybe none of those song belongs to that genre
Why don’t we construct an imaginary metadata?
Predict which music artists a user has listened to
May 12, 2014
8/13
Collaborative filtering
Implementation
Randomize m k-dimensional vector. k is the number of genres
we suppose these songs will have. m is number of songs.
Randomize n k-dimensional vector. Each vector is a
preference for a user. n stands for the number of users.
Compute gradients:
∂E
∂Θ
,
∂E
∂X
Update Θ, X separately.
Check if error function converges.
Predict which music artists a user has listened to
May 12, 2014
9/13
Collaborative filtering
More details
To prevent over-fitting, we also used regularization factors in
error function.
Since the final submission requires us to submit 0,1 data, we
have to set a threshold
So we have to set 3 magic numbers, k: number of genres, λ:
regularization coefficient, t: threshold value
Since we know the ration of 0 vs 1 is 4:6, we set the threshold
to let the final prediction has this ratio.
Predict which music artists a user has listened to
May 12, 2014
10/13
Collaborative filtering
Comments
This method builds a profile for each user, it doesn’t consider
the similarity between users.
The error function focus on global error, namely it assigns
equal importance to every observations. However, it’s
different if a user listened to a song for 100 times from 1 time.
There are so many zeros in the training data. That is a strong
interfere. This method fits prediction like movie ratings better,
namely, rating can only be 1-5 stars. In other words, when the
rating can not be zero and the scale of the rating is similar.
Predict which music artists a user has listened to
May 12, 2014
11/13
Collaborative filtering
Result
Predict which music artists a user has listened to
May 12, 2014
12/13
Logistic Regression
It is similar with Linear Regression.
The cost function and gradient is different.
y =
1
1 + e−θT x
Predict which music artists a user has listened to
May 12, 2014
13/13
KNN
Denotation
Y : the given user-artists matrix (Y(Y>1)=1)
K : the size of neighbors set
Main idea
For each row r of Y, get the K-nearest neighbors(ignore the
missing variables).
Set the missing variables in r as the average of the
corresponding variables(same column) in its K-neighbors.

Contenu connexe

Similaire à GreenMonster (7)

auto_playlist
auto_playlistauto_playlist
auto_playlist
 
Metric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target PlaylistsMetric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target Playlists
 
Emofy
Emofy Emofy
Emofy
 
Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics
Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyricsVisualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics
Visualizing+music+artists’+main+topics+and+overall+sentiment+by+analyzing+lyrics
 
Visualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics
VisualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyricsVisualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics
Visualizingmusicartistsmaintopicsandoverallsentimentbyanalyzinglyrics
 
Poster vega north
Poster vega northPoster vega north
Poster vega north
 
Walk the Talk: Analyzing the relation between implicit and explicit feedback ...
Walk the Talk: Analyzing the relation between implicit and explicit feedback ...Walk the Talk: Analyzing the relation between implicit and explicit feedback ...
Walk the Talk: Analyzing the relation between implicit and explicit feedback ...
 

GreenMonster

  • 1. Predict which music artists a user has listened to Yangjun Wang, Yang Zhao Department of Information and Computer Science Aalto University, School of Science and Technology yangjun.wang@aalto.fi, yang.zhao@aalto.fi
  • 2. Predict which music artists a user has listened to May 12, 2014 2/13 Algorithms we tried Content based algorithm Collaborative filtering KNN
  • 3. Predict which music artists a user has listened to May 12, 2014 3/13 Content based algorithm Main Idea There are various genres in music, such as jazz, rock, pop... A user has a special preference to certain genres (θ) Each song has weights of being different genres (x) namely, the metadata. Based on this, we can build a linear regression model y = θT x where y represents how may times the user has listened to this song.
  • 4. Predict which music artists a user has listened to May 12, 2014 4/13 Content based algorithm Example The weights(x) of a song is Xi = (0.8, 0.2, 0)T . A user’s preference to these three genres is Θi = (10, 0, 0)T . The probability that this user has listened the artist would be yi = θT i xi = 8 So this user has listened to this song for 8 times.
  • 5. Predict which music artists a user has listened to May 12, 2014 5/13 Content based algorithm More details How do we do with missing values? We ignore them when we are computing the error The complete model is E = i (θT i xi − yi ) 2 ri where ri = 1 if that value is not missing.
  • 6. Predict which music artists a user has listened to May 12, 2014 6/13 Content based algorithm Result Does this work? We don’t know. Because in metadata, each song has more than 10k weights But we have only 4k observations for each user It’s impossible to solve these equations
  • 7. Predict which music artists a user has listened to May 12, 2014 7/13 Collaborative filtering Main idea Genres provided in metadata may be too much Maybe none of those song belongs to that genre Why don’t we construct an imaginary metadata?
  • 8. Predict which music artists a user has listened to May 12, 2014 8/13 Collaborative filtering Implementation Randomize m k-dimensional vector. k is the number of genres we suppose these songs will have. m is number of songs. Randomize n k-dimensional vector. Each vector is a preference for a user. n stands for the number of users. Compute gradients: ∂E ∂Θ , ∂E ∂X Update Θ, X separately. Check if error function converges.
  • 9. Predict which music artists a user has listened to May 12, 2014 9/13 Collaborative filtering More details To prevent over-fitting, we also used regularization factors in error function. Since the final submission requires us to submit 0,1 data, we have to set a threshold So we have to set 3 magic numbers, k: number of genres, λ: regularization coefficient, t: threshold value Since we know the ration of 0 vs 1 is 4:6, we set the threshold to let the final prediction has this ratio.
  • 10. Predict which music artists a user has listened to May 12, 2014 10/13 Collaborative filtering Comments This method builds a profile for each user, it doesn’t consider the similarity between users. The error function focus on global error, namely it assigns equal importance to every observations. However, it’s different if a user listened to a song for 100 times from 1 time. There are so many zeros in the training data. That is a strong interfere. This method fits prediction like movie ratings better, namely, rating can only be 1-5 stars. In other words, when the rating can not be zero and the scale of the rating is similar.
  • 11. Predict which music artists a user has listened to May 12, 2014 11/13 Collaborative filtering Result
  • 12. Predict which music artists a user has listened to May 12, 2014 12/13 Logistic Regression It is similar with Linear Regression. The cost function and gradient is different. y = 1 1 + e−θT x
  • 13. Predict which music artists a user has listened to May 12, 2014 13/13 KNN Denotation Y : the given user-artists matrix (Y(Y>1)=1) K : the size of neighbors set Main idea For each row r of Y, get the K-nearest neighbors(ignore the missing variables). Set the missing variables in r as the average of the corresponding variables(same column) in its K-neighbors.