Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Machine Learning at PeerIndex

4 305 vues

Publié le

Slides for talk given at London Machine Learning Meetup on 29 Feb about machine learning behind measuring people's influence at PeerIndex.

Publié dans : Technologie, Formation
  • Soyez le premier à commenter

Machine Learning at PeerIndex

  1. 1. Machine Learning at PeerIndex @fhuszar Ferenc HuszárWednesday, 16 May 12
  2. 2. PeerIndex.com: understand your influenceWednesday, 16 May 12
  3. 3. PeerPerks.com: free stuff for influencersWednesday, 16 May 12
  4. 4. PeerPerks: free stuff for influencersWednesday, 16 May 12
  5. 5. Machine Learning @ PeerIndexWednesday, 16 May 12
  6. 6. Machine Learning @ PeerIndex • The usual stuffWednesday, 16 May 12
  7. 7. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLsWednesday, 16 May 12
  8. 8. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedInWednesday, 16 May 12
  9. 9. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the systemWednesday, 16 May 12
  10. 10. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutralWednesday, 16 May 12
  11. 11. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutral • The really exciting stuffWednesday, 16 May 12
  12. 12. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutral • The really exciting stuff • inferring networks of influence - more about this laterWednesday, 16 May 12
  13. 13. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutral • The really exciting stuff • inferring networks of influence - more about this later • visualise different aspects of influence, in an engaging wayWednesday, 16 May 12
  14. 14. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutral • The really exciting stuff • inferring networks of influence - more about this later • visualise different aspects of influence, in an engaging way • influence maximisation - submodular optimisationWednesday, 16 May 12
  15. 15. Inferring networks of influenceWednesday, 16 May 12
  16. 16. Inferring networks of influence Social networkWednesday, 16 May 12
  17. 17. Inferring networks of influence Social network Propagation probabilities pi,jWednesday, 16 May 12
  18. 18. Inferring networks of influence Social network Propagation probabilities pi,j Information cascade logs http://www.pcworld.com/article/239719 http://techcrunch.com/2011/11/21/... 1079306 2011-08-25T00:03:06+01:00 259725 2011-10-24T03:32:19+01:00 4549198 2011-08-25T04:32:25+01:00 76539 2011-10-24T03:32:23+01:00 2662975 2011-08-25T00:35:11+01:00 1922351 2011-10-24T04:28:47+01:00 2333224 2011-08-25T01:43:18+01:00 9183 2011-10-24T03:30:57+01:00 3141371 2011-08-25T01:52:06+01:00 3335398 2011-10-24T03:34:01+01:00 3482720 2011-08-25T07:18:24+01:00 1616885 2011-10-24T03:48:16+01:00 1403682 2011-08-25T03:52:58+01:00 82198 2011-10-24T03:48:29+01:00 4679657 2011-08-25T01:07:48+01:00 906390 2011-10-24T23:13:51+01:00 32460 2011-08-25T01:11:43+01:00 1051322 2011-10-24T03:40:02+01:00Wednesday, 16 May 12
  19. 19. Heurisric approaches to estimate pi,jWednesday, 16 May 12
  20. 20. Heurisric approaches to estimate pi,j • purely based on local network structure 1 pi,j din (j)Wednesday, 16 May 12
  21. 21. Heurisric approaches to estimate pi,j • purely based on local network structure 1 pi,j din (j) • trivalency “model” my personal favourite :) pi,j {0.1, 0.01, 0.01} randomlyWednesday, 16 May 12
  22. 22. Heurisric approaches to estimate pi,j • purely based on local network structure 1 pi,j din (j) • trivalency “model” my personal favourite :) pi,j {0.1, 0.01, 0.01} randomly • data-driven heuristics number of items shared by j after i shared it pi,j number of items shared by iWednesday, 16 May 12
  23. 23. Heurisric approaches to estimate pi,j • purely based on local network structure 1 pi,j din (j) • trivalency “model” my personal favourite :) pi,j {0.1, 0.01, 0.01} randomly • data-driven heuristics number of items shared by j after i shared it pi,j number of items shared by i How do you solve this with machine learning?Wednesday, 16 May 12
  24. 24. The likelihoodWednesday, 16 May 12
  25. 25. The likelihood P( D | ✓ )Wednesday, 16 May 12
  26. 26. The likelihood P( D | ✓ ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00Wednesday, 16 May 12
  27. 27. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,jWednesday, 16 May 12
  28. 28. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , unWednesday, 16 May 12
  29. 29. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascadeWednesday, 16 May 12
  30. 30. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade p0,u1Wednesday, 16 May 12
  31. 31. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade p0,u1(1 (1 p0,u2 ) (1 pu1 ,u2 ))Wednesday, 16 May 12
  32. 32. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade p0,u1(1 (1 p0,u2 ) (1 pu1 ,u2 ))· · ·Wednesday, 16 May 12
  33. 33. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade 0 1 n Y i 1 Y = @1 (1 puj ,ui )A i=1 j=1Wednesday, 16 May 12
  34. 34. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade 0 1 n Y i 1 Y = @1 (1 puj ,ui )A i=1 j=1 for users that are not in cascadeWednesday, 16 May 12
  35. 35. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade 0 1 n Y i 1 Y = @1 (1 puj ,ui )A i=1 j=1 for users that are not in cascade Y Y (1 pu,v ) u2{u1 ...un } v2users /Wednesday, 16 May 12
  36. 36. Maximum likelihood at scaleWednesday, 16 May 12
  37. 37. Maximum likelihood at scale • data too sparse to learn one parameter per edgeWednesday, 16 May 12
  38. 38. Maximum likelihood at scale • data too sparse to learn one parameter per edge • large scale gradient-based optimisation is costlyWednesday, 16 May 12
  39. 39. Maximum likelihood at scale • data too sparse to learn one parameter per edge • large scale gradient-based optimisation is costly • Solution: combine ensemble of heuristics with MLWednesday, 16 May 12
  40. 40. Maximum likelihood at scale • data too sparse to learn one parameter per edge • large scale gradient-based optimisation is costly • Solution: combine ensemble of heuristics with ML • use heuristics to compute probabilities at scaleWednesday, 16 May 12
  41. 41. Maximum likelihood at scale • data too sparse to learn one parameter per edge • large scale gradient-based optimisation is costly • Solution: combine ensemble of heuristics with ML • use heuristics to compute probabilities at scale • use ML to tune parameters on small-scale dataWednesday, 16 May 12
  42. 42. Influence maximisationWednesday, 16 May 12
  43. 43. Influence maximisation • Select a set of users to maximise outreachWednesday, 16 May 12
  44. 44. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearlyWednesday, 16 May 12
  45. 45. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearly • In many models it combines sub-modularlyWednesday, 16 May 12
  46. 46. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearly • In many models it combines sub-modularly A ✓ B =) f (A [ {x}) f (A) f (B [ {x}) f (B)Wednesday, 16 May 12
  47. 47. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearly • In many models it combines sub-modularly A ✓ B =) f (A [ {x}) f (A) f (B [ {x}) f (B) • these functions are fun to optimiseWednesday, 16 May 12
  48. 48. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearly • In many models it combines sub-modularly A ✓ B =) f (A [ {x}) f (A) f (B [ {x}) f (B) • these functions are fun to optimise • pops up many times in machine learningWednesday, 16 May 12
  49. 49. Wrap upWednesday, 16 May 12
  50. 50. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerksWednesday, 16 May 12
  51. 51. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasksWednesday, 16 May 12
  52. 52. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problemsWednesday, 16 May 12
  53. 53. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problems • inferring propagation probabilitiesWednesday, 16 May 12
  54. 54. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problems • inferring propagation probabilities • compute expected number of users one reaches out toWednesday, 16 May 12
  55. 55. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problems • inferring propagation probabilities • compute expected number of users one reaches out to • putting all aspects together into a single number, and visualiseWednesday, 16 May 12
  56. 56. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problems • inferring propagation probabilities • compute expected number of users one reaches out to • putting all aspects together into a single number, and visualise • influence maximisationWednesday, 16 May 12
  57. 57. Thanks We’re hiring ML scientists, interns and engineers... @fhuszar fh@peerindex.comWednesday, 16 May 12

×