Algorithmic Music Recommendations at Spotify

Chris Johnson
Chris JohnsonEngineering Manager - Recommendations and Personalization à Spotify
Algorithmic Music
Discovery at Spotify
Chris Johnson
@MrChrisJohnson
January 13, 2014

Monday, January 13, 14
Who am I??
•Chris Johnson

– Machine Learning guy from NYC
– Focused on music recommendations
– Formerly a graduate student at UT Austin

Monday, January 13, 14
What is Spotify?

•
•

On demand music streaming service
“iTunes in the cloud”

Monday, January 13, 14

3
Section name

Monday, January 13, 14

4
Data at Spotify....
• 20 Million songs
• 24 Million active users
• 6 Million paying users
• 8 Million daily active users
• 1 TB of compressed data generated from users per day
• 700 node Hadoop Cluster
• 1 Million years worth of music streamed
• 1 Billion user generated playlists

Monday, January 13, 14

5
Challenge: 20 Million songs... how do we
recommend music to users?

Monday, January 13, 14

6
Recommendation Features
• Discover (personalized recommendations)
• Radio
• Related Artists
• Now Playing

Monday, January 13, 14

7
8

How can we find good
recommendations?
• Manual Curation

• Manually Tag Attributes

• Audio Content,
Metadata, Text Analysis

• Collaborative Filtering

Monday, January 13, 14
Collaborative Filtering - “The Netflix Prize”

Monday, January 13, 14

9
Collaborative Filtering

10

Hey,
I like tracks P, Q, R, S!
Well,
I like tracks Q, R, S, T!

Then you should check out
track P!

Nice! Btw try track T!

Image via Erik Bernhardsson
Monday, January 13, 14
Section name

Monday, January 13, 14

11
Difference between movie and music recs

•

Scale of catalog

60,000 movies

Monday, January 13, 14

20,000,000 songs

12
Difference between movie and music recs

•

Repeated consumption

Monday, January 13, 14

13
Difference between movie and music recs

•

Music is more niche

Monday, January 13, 14

14
“The Netflix Problem” Vs “The Spotify Problem

•Netflix:

Users explicitly “rate” movies

•Spotify:

Feedback is implicit through streaming behavior

Monday, January 13, 14

15
Section name

Monday, January 13, 14

16
Explicit Matrix Factorization

•Users explicitly rate a subset of the movie catalog
•Goal: predict how users will rate new movies
Movies

Users
Chris
Inception

Monday, January 13, 14

17
Explicit Matrix Factorization

18

•Approximate ratings matrix by the product of lowdimensional user and movie matrices
Minimize RMSE (root mean squared error)

•

?
1
2
?
5

•
•
•

3
?
?
?
2

5
?
3
?
?

?
1
2
5
4

= user
= user

rating for movie
latent factor vector

= item

latent factor vector

Monday, January 13, 14

X

Y
Inception
Chris

•
•
•

= bias for user
= bias for item
= regularization parameter
Implicit Matrix Factorization

19

•Replace Stream counts with binary labels
– 1 = streamed, 0 = never streamed

•Minimize weighted RMSE (root mean squared error) using a
function of stream counts as weights

10001001
00100100
10100011
01000100
00100100
10001001

•
•
•
•

= 1 if user
= user
=i tem

Monday, January 13, 14

streamed track
latent factor vector
latent factor vector

X

else 0

Y

•
•
•

= bias for user
= bias for item
= regularization parameter
Alternating Least Squares

• Initialize user and item vectors to random noise

• Fix item vectors and solve for optimal user vectors

– Take the derivative of loss function with respect to user’s vector, set
–

equal to 0, and solve
Results in a system of linear equations with closed form solution!

• Fix user vectors and solve for optimal item vectors
• Repeat until convergence
code: https://github.com/MrChrisJohnson/implicitMF
Monday, January 13, 14

20
Alternating Least Squares

• Note that:
• Then, we can pre-compute
–
–

once per iteration

and
only contain non-zero elements for tracks that
the user streamed
Using sparse matrix operations we can then compute each user’s
vector efficiently in
time where
is the number of
tracks the user streamed

code: https://github.com/MrChrisJohnson/implicitMF
Monday, January 13, 14

21
Alternating Least Squares

code: https://github.com/MrChrisJohnson/implicitMF
Monday, January 13, 14

22
How do we use the learned vectors?

•User-Item score is the dot product

•Item-Item similarity is the cosine similarity

•Both operations have trivial complexity based on the number of
latent factors

Monday, January 13, 14

23
Latent Factor Vectors in 2 dimensions

Monday, January 13, 14

24
Section name

Monday, January 13, 14

25
Scaling up Implicit Matrix Factorization
with Hadoop

Monday, January 13, 14

26
Hadoop at Spotify 2009

Monday, January 13, 14

27
Hadoop at Spotify 2014
700 Nodes in our London data center

Monday, January 13, 14

28
Implicit Matrix Factorization with Hadoop
Map step

29

Reduce step

item vectors
item%L=0

item vectors
item%L=1

user vectors
u%K=0

u%K=0
i%L=0

u%K=0
i%L=1

...

u%K=0
i % L = L-1

u%K=0

user vectors
u%K=1

u%K=1
i%L=0

u%K=1
i%L=1

...

...

u%K=1

...

...

...

...

u % K = K-1
i%L=0

...

...

u % K = K-1
i % L = L-1

user vectors
u % K = K-1

item vectors
i % L = L-1

u % K = K-1

all log entries
u%K=1
i%L=1

Figure via Erik Bernhardsson
Monday, January 13, 14
Implicit Matrix Factorization with Hadoop

30

One map task
Distributed
cache:
All user vectors
where u % K = x
Distributed
cache:
All item vectors
where i % L = y

Mapper

Emit contributions

Reducer

New vector!

Map input:
tuples (u, i, count)
where
u%K=x
and
i%L=y

Figure via Erik Bernhardsson
Monday, January 13, 14
Implicit Matrix Factorization with Spark

31

Spark

Vs
Hadoop

http://www.slideshare.net/Hadoop_Summit/spark-and-shark
Monday, January 13, 14
Section name

Monday, January 13, 14

32
Approximate Nearest Neighbors

code: https://github.com/Spotify/annoy
Monday, January 13, 14

33
Ensemble of Latent Factor Models

34

Figure via Erik Bernhardsson
Monday, January 13, 14
AB-Testing Recommendations

Monday, January 13, 14

35
Open Problems

•How to go from predictive model to related artists? (learning

to rank?)
How do you learn from user feedback?
How do you deal with observation bias in the user feedback?
(active learning?)
How to factor in temporal information?
How much value in content based recommendations?
How to best evaluate model performance?
How to best train an ensemble?

•
•
•
•
•
•

Monday, January 13, 14

36
Section name

37

Thank You!

Monday, January 13, 14
Section name

Monday, January 13, 14

38
Section name

Monday, January 13, 14

39
Section name

Monday, January 13, 14

40
Section name

Monday, January 13, 14

41
Section name

Monday, January 13, 14

42
1 sur 42

Recommandé

Music Personalization At Spotify par
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At SpotifyVidhya Murali
7.1K vues35 diapositives
Collaborative Filtering at Spotify par
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at SpotifyErik Bernhardsson
92.8K vues63 diapositives
Scala Data Pipelines for Music Recommendations par
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsChris Johnson
163.6K vues50 diapositives
Building Data Pipelines for Music Recommendations at Spotify par
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyVidhya Murali
5.2K vues58 diapositives
CF Models for Music Recommendations At Spotify par
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyVidhya Murali
1.6K vues30 diapositives
From Idea to Execution: Spotify's Discover Weekly par
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyChris Johnson
270.5K vues50 diapositives

Contenu connexe

Tendances

Personalized Playlists at Spotify par
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at SpotifyRohan Agrawal
904 vues20 diapositives
Machine Learning and Big Data for Music Discovery at Spotify par
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyChing-Wei Chen
20.6K vues46 diapositives
Personalizing the listening experience par
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experienceMounia Lalmas-Roelleke
2.1K vues35 diapositives
Music Personalization : Real time Platforms. par
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Esh Vckay
1.7K vues44 diapositives
Music recommendations @ MLConf 2014 par
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Erik Bernhardsson
28.6K vues44 diapositives
Homepage Personalization at Spotify par
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at SpotifyOguz Semerci
3.5K vues26 diapositives

Tendances(20)

Personalized Playlists at Spotify par Rohan Agrawal
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
Rohan Agrawal904 vues
Machine Learning and Big Data for Music Discovery at Spotify par Ching-Wei Chen
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
Ching-Wei Chen20.6K vues
Music Personalization : Real time Platforms. par Esh Vckay
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
Esh Vckay1.7K vues
Homepage Personalization at Spotify par Oguz Semerci
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
Oguz Semerci3.5K vues
Machine learning @ Spotify - Madison Big Data Meetup par Andy Sloane
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
Andy Sloane17.1K vues
Big data and machine learning @ Spotify par Oscar Carlsson
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
Oscar Carlsson4.1K vues
Scala Data Pipelines @ Spotify par Neville Li
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
Neville Li51.4K vues
Spotify Discover Weekly: The machine learning behind your music recommendations par Sophia Ciocca
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendations
Sophia Ciocca2K vues
Big Data At Spotify par Adam Kawa
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
Adam Kawa18.6K vues
Spotify Machine Learning Solution for Music Discovery par Karthik Murugesan
Spotify Machine Learning Solution for Music DiscoverySpotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music Discovery
Karthik Murugesan1.9K vues
Recommendation at Netflix Scale par Justin Basilico
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico21.6K vues
An introduction to Recommender Systems par David Zibriczky
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
David Zibriczky2.3K vues

En vedette

Becoming Rhizomatic? par
Becoming Rhizomatic?Becoming Rhizomatic?
Becoming Rhizomatic?Mark Ingham
6.2K vues171 diapositives
Big Practical Recommendations with Alternating Least Squares par
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresData Science London
4.3K vues17 diapositives
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015 par
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Till Rohrmann
3.7K vues32 diapositives
Fast ALS-based matrix factorization for explicit and implicit feedback datasets par
Fast ALS-based matrix factorization for explicit and implicit feedback datasetsFast ALS-based matrix factorization for explicit and implicit feedback datasets
Fast ALS-based matrix factorization for explicit and implicit feedback datasetsGravity - Rock Solid Recommendations
3.6K vues36 diapositives
Microservices at Spotify par
Microservices at SpotifyMicroservices at Spotify
Microservices at SpotifyKevin Goldsmith
21.7K vues43 diapositives
Amazon.com: the Hidden Empire - Update 2013 par
Amazon.com: the Hidden Empire - Update 2013Amazon.com: the Hidden Empire - Update 2013
Amazon.com: the Hidden Empire - Update 2013Fabernovel
1.2M vues84 diapositives

En vedette(6)

Becoming Rhizomatic? par Mark Ingham
Becoming Rhizomatic?Becoming Rhizomatic?
Becoming Rhizomatic?
Mark Ingham6.2K vues
Big Practical Recommendations with Alternating Least Squares par Data Science London
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least Squares
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015 par Till Rohrmann
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Till Rohrmann3.7K vues
Amazon.com: the Hidden Empire - Update 2013 par Fabernovel
Amazon.com: the Hidden Empire - Update 2013Amazon.com: the Hidden Empire - Update 2013
Amazon.com: the Hidden Empire - Update 2013
Fabernovel1.2M vues

Similaire à Algorithmic Music Recommendations at Spotify

Collaborative Filtering with Spark par
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with SparkChris Johnson
46.7K vues56 diapositives
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da... par
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...Hakka Labs
2.3K vues36 diapositives
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ... par
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...icwe2015
238 vues20 diapositives
Deezer - Big data as a streaming service par
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming serviceJulie Knibbe
7.3K vues35 diapositives
Random Walk with Restart for Automatic Playlist Continuation and Query-specif... par
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...Timo van Niedek
290 vues23 diapositives
Models for Information Retrieval and Recommendation par
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
2.5K vues91 diapositives

Similaire à Algorithmic Music Recommendations at Spotify(20)

Collaborative Filtering with Spark par Chris Johnson
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
Chris Johnson46.7K vues
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da... par Hakka Labs
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
Hakka Labs2.3K vues
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ... par icwe2015
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
icwe2015238 vues
Deezer - Big data as a streaming service par Julie Knibbe
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming service
Julie Knibbe7.3K vues
Random Walk with Restart for Automatic Playlist Continuation and Query-specif... par Timo van Niedek
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Timo van Niedek290 vues
Models for Information Retrieval and Recommendation par Arjen de Vries
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
Arjen de Vries2.5K vues
Recommendation Systems Roadtrip par The Real Dyl
Recommendation Systems RoadtripRecommendation Systems Roadtrip
Recommendation Systems Roadtrip
The Real Dyl187 vues
Real-world News Recommender Systems par kib_83
Real-world News Recommender SystemsReal-world News Recommender Systems
Real-world News Recommender Systems
kib_83732 vues
Recsys 2014 Tutorial - The Recommender Problem Revisited par Xavier Amatriain
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem Revisited
Xavier Amatriain27K vues
Music Recommendation 2018 par Fabien Gouyon
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018
Fabien Gouyon10.2K vues
Recsys 2018 overview and highlights par Sandra Garcia
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlights
Sandra Garcia145 vues
Approximate nearest neighbor methods and vector models – NYC ML meetup par Erik Bernhardsson
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
Erik Bernhardsson22.2K vues
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson par Hakka Labs
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonApproximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Hakka Labs985 vues
Recommendation Systems par Robin Reni
Recommendation SystemsRecommendation Systems
Recommendation Systems
Robin Reni905 vues
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B... par Alexandros Karatzoglou
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Intro to R and Data Mining 2012 09 27 par Raj Kasarabada
Intro to R and Data Mining 2012 09 27Intro to R and Data Mining 2012 09 27
Intro to R and Data Mining 2012 09 27
Raj Kasarabada218 vues
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit... par LINE Corp.
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
LINE Corp.726 vues

Dernier

Transcript: The Details of Description Techniques tips and tangents on altern... par
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...BookNet Canada
136 vues15 diapositives
Empathic Computing: Delivering the Potential of the Metaverse par
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the MetaverseMark Billinghurst
478 vues80 diapositives
SUPPLIER SOURCING.pptx par
SUPPLIER SOURCING.pptxSUPPLIER SOURCING.pptx
SUPPLIER SOURCING.pptxangelicacueva6
15 vues1 diapositive
PRODUCT LISTING.pptx par
PRODUCT LISTING.pptxPRODUCT LISTING.pptx
PRODUCT LISTING.pptxangelicacueva6
14 vues1 diapositive
Future of Indian ConsumerTech par
Future of Indian ConsumerTechFuture of Indian ConsumerTech
Future of Indian ConsumerTechKapil Khandelwal (KK)
21 vues68 diapositives
Data Integrity for Banking and Financial Services par
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial ServicesPrecisely
21 vues26 diapositives

Dernier(20)

Transcript: The Details of Description Techniques tips and tangents on altern... par BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada136 vues
Empathic Computing: Delivering the Potential of the Metaverse par Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Data Integrity for Banking and Financial Services par Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely21 vues
Piloting & Scaling Successfully With Microsoft Viva par Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... par Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 par IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
Special_edition_innovator_2023.pdf par WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 vues
Igniting Next Level Productivity with AI-Infused Data Integration Workflows par Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software263 vues
Five Things You SHOULD Know About Postman par Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman33 vues
handbook for web 3 adoption.pdf par Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex22 vues
Business Analyst Series 2023 - Week 3 Session 5 par DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10248 vues
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... par Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker37 vues

Algorithmic Music Recommendations at Spotify

  • 1. Algorithmic Music Discovery at Spotify Chris Johnson @MrChrisJohnson January 13, 2014 Monday, January 13, 14
  • 2. Who am I?? •Chris Johnson – Machine Learning guy from NYC – Focused on music recommendations – Formerly a graduate student at UT Austin Monday, January 13, 14
  • 3. What is Spotify? • • On demand music streaming service “iTunes in the cloud” Monday, January 13, 14 3
  • 5. Data at Spotify.... • 20 Million songs • 24 Million active users • 6 Million paying users • 8 Million daily active users • 1 TB of compressed data generated from users per day • 700 node Hadoop Cluster • 1 Million years worth of music streamed • 1 Billion user generated playlists Monday, January 13, 14 5
  • 6. Challenge: 20 Million songs... how do we recommend music to users? Monday, January 13, 14 6
  • 7. Recommendation Features • Discover (personalized recommendations) • Radio • Related Artists • Now Playing Monday, January 13, 14 7
  • 8. 8 How can we find good recommendations? • Manual Curation • Manually Tag Attributes • Audio Content, Metadata, Text Analysis • Collaborative Filtering Monday, January 13, 14
  • 9. Collaborative Filtering - “The Netflix Prize” Monday, January 13, 14 9
  • 10. Collaborative Filtering 10 Hey, I like tracks P, Q, R, S! Well, I like tracks Q, R, S, T! Then you should check out track P! Nice! Btw try track T! Image via Erik Bernhardsson Monday, January 13, 14
  • 12. Difference between movie and music recs • Scale of catalog 60,000 movies Monday, January 13, 14 20,000,000 songs 12
  • 13. Difference between movie and music recs • Repeated consumption Monday, January 13, 14 13
  • 14. Difference between movie and music recs • Music is more niche Monday, January 13, 14 14
  • 15. “The Netflix Problem” Vs “The Spotify Problem •Netflix: Users explicitly “rate” movies •Spotify: Feedback is implicit through streaming behavior Monday, January 13, 14 15
  • 17. Explicit Matrix Factorization •Users explicitly rate a subset of the movie catalog •Goal: predict how users will rate new movies Movies Users Chris Inception Monday, January 13, 14 17
  • 18. Explicit Matrix Factorization 18 •Approximate ratings matrix by the product of lowdimensional user and movie matrices Minimize RMSE (root mean squared error) • ? 1 2 ? 5 • • • 3 ? ? ? 2 5 ? 3 ? ? ? 1 2 5 4 = user = user rating for movie latent factor vector = item latent factor vector Monday, January 13, 14 X Y Inception Chris • • • = bias for user = bias for item = regularization parameter
  • 19. Implicit Matrix Factorization 19 •Replace Stream counts with binary labels – 1 = streamed, 0 = never streamed •Minimize weighted RMSE (root mean squared error) using a function of stream counts as weights 10001001 00100100 10100011 01000100 00100100 10001001 • • • • = 1 if user = user =i tem Monday, January 13, 14 streamed track latent factor vector latent factor vector X else 0 Y • • • = bias for user = bias for item = regularization parameter
  • 20. Alternating Least Squares • Initialize user and item vectors to random noise • Fix item vectors and solve for optimal user vectors – Take the derivative of loss function with respect to user’s vector, set – equal to 0, and solve Results in a system of linear equations with closed form solution! • Fix user vectors and solve for optimal item vectors • Repeat until convergence code: https://github.com/MrChrisJohnson/implicitMF Monday, January 13, 14 20
  • 21. Alternating Least Squares • Note that: • Then, we can pre-compute – – once per iteration and only contain non-zero elements for tracks that the user streamed Using sparse matrix operations we can then compute each user’s vector efficiently in time where is the number of tracks the user streamed code: https://github.com/MrChrisJohnson/implicitMF Monday, January 13, 14 21
  • 22. Alternating Least Squares code: https://github.com/MrChrisJohnson/implicitMF Monday, January 13, 14 22
  • 23. How do we use the learned vectors? •User-Item score is the dot product •Item-Item similarity is the cosine similarity •Both operations have trivial complexity based on the number of latent factors Monday, January 13, 14 23
  • 24. Latent Factor Vectors in 2 dimensions Monday, January 13, 14 24
  • 26. Scaling up Implicit Matrix Factorization with Hadoop Monday, January 13, 14 26
  • 27. Hadoop at Spotify 2009 Monday, January 13, 14 27
  • 28. Hadoop at Spotify 2014 700 Nodes in our London data center Monday, January 13, 14 28
  • 29. Implicit Matrix Factorization with Hadoop Map step 29 Reduce step item vectors item%L=0 item vectors item%L=1 user vectors u%K=0 u%K=0 i%L=0 u%K=0 i%L=1 ... u%K=0 i % L = L-1 u%K=0 user vectors u%K=1 u%K=1 i%L=0 u%K=1 i%L=1 ... ... u%K=1 ... ... ... ... u % K = K-1 i%L=0 ... ... u % K = K-1 i % L = L-1 user vectors u % K = K-1 item vectors i % L = L-1 u % K = K-1 all log entries u%K=1 i%L=1 Figure via Erik Bernhardsson Monday, January 13, 14
  • 30. Implicit Matrix Factorization with Hadoop 30 One map task Distributed cache: All user vectors where u % K = x Distributed cache: All item vectors where i % L = y Mapper Emit contributions Reducer New vector! Map input: tuples (u, i, count) where u%K=x and i%L=y Figure via Erik Bernhardsson Monday, January 13, 14
  • 31. Implicit Matrix Factorization with Spark 31 Spark Vs Hadoop http://www.slideshare.net/Hadoop_Summit/spark-and-shark Monday, January 13, 14
  • 33. Approximate Nearest Neighbors code: https://github.com/Spotify/annoy Monday, January 13, 14 33
  • 34. Ensemble of Latent Factor Models 34 Figure via Erik Bernhardsson Monday, January 13, 14
  • 36. Open Problems •How to go from predictive model to related artists? (learning to rank?) How do you learn from user feedback? How do you deal with observation bias in the user feedback? (active learning?) How to factor in temporal information? How much value in content based recommendations? How to best evaluate model performance? How to best train an ensemble? • • • • • • Monday, January 13, 14 36