Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Recommendations @ LinkedIn<br />1<br />
Think Platform<br />Leverage Hadoop<br />2<br />
The world’s largest professional networkOver 50% of members are now international<br />135M+<br />75%<br />*<br />Fortune ...
4<br />Recommendations Opportunity<br />
5<br />
6<br />
7<br />
8<br />
9<br />
10<br />
       The Recommendations Opportunity<br />Pandora Search for People<br />Groups browse maps<br />Events You<br />May Be<...
50%<br />12<br />
13<br />Positions<br />Education<br />Summary<br />Experience<br />Skills<br />
Are all titles the same?<br /><ul><li>Software Engineer
Technical Yahoo
Member Technical Staff
Software Development Engineer
SDE</li></li></ul><li>Are all companies the same?<br />‘IBM’ has 8000+ variations<br /><ul><li>ibm – ireland
ibm research
T J Watson Labs
International Bus. Machines</li></li></ul><li>Recommendation Trade-offsThe need for a common platform<br />Real Time<br />...
Recommendation Trade-offsThe need for a common platform<br />Content Analysis<br />Collaborative<br />17<br />
Recommendation Trade-offsThe need for a common platform<br />Precision <br />Recall<br />18<br />
Specialty -> Specialty<br />         Skills-> Skills<br />Seniority<br />Skills<br />Title<br />Specialty<br />Education<b...
Importance <br />weight vector<br />(Skills-> Skills)<br />Feedback<br />0.70<br />Normalization, <br />Scoring <br />& Ra...
Technologies<br />
22<br />Hadoop Case Studies<br /><ul><li>Scaling
 Blending Recommendation Algorithms
 Grandfathering
 Model Selection
 A/B Testing
 Tracking and Reporting</li></li></ul><li>23<br />Scaling<br />Billions of Recommendations<br />Latency > 1 sec<br />Minha...
24<br />Hadoop Case Studies<br /><ul><li>Scaling ✔
 Blending Recommendation Algorithms
 Grandfathering
Prochain SlideShare
Chargement dans…5
×

Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features at LinkedIn - Abhishek Gupta & Adil Aijaz, LinkedIn

1 684 vues

Publié le

This presentation focuses on the design and evolution of the LinkedIn recommendations platform. It currently computes more than 100 billion personalized recommendations every week, powering an ever growing assortment of products, including Jobs You May be Interested In, Groups You May Like, News Relevance, and Ad Targeting. We will describe how we leverage Hadoop to transform raw data to rich features using knowledge aggregated from LinkedIn's 100 million member base, how we use Lucene to do real-time recommendations, and how we marshal Lucene on Hadoop to bridge offline analysis with user-facing services.

Publié dans : Technologie, Business
  • Hello my dear
    I am Modester by name good day. i just went to your profile this time true this site (www.slideshare.net) and i got your detail and your explanation in fact the way you explain your self shows me that you are innocent and maturity and also understand person i decided to have a contact with you so that we can explain to our self each other because God great everyone to make a friend with each other and from that we know that we are from thism planet God great for us ok my dear please try and reach me through my email address (modester4life4@yahoo.com) so that i can send you my picture true your reply we can know each other ok have a nice day and God bless you yours Modester
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features at LinkedIn - Abhishek Gupta & Adil Aijaz, LinkedIn

  1. 1. Recommendations @ LinkedIn<br />1<br />
  2. 2. Think Platform<br />Leverage Hadoop<br />2<br />
  3. 3. The world’s largest professional networkOver 50% of members are now international<br />135M+<br />75%<br />*<br />Fortune 100 Companies use LinkedIn to hire<br />**<br />>2M<br />Company Pages<br />**<br />~2/sec<br />New Members joining<br />*as of Nov 4, 2011**as of June 30, 2011<br />3<br />
  4. 4. 4<br />Recommendations Opportunity<br />
  5. 5. 5<br />
  6. 6. 6<br />
  7. 7. 7<br />
  8. 8. 8<br />
  9. 9. 9<br />
  10. 10. 10<br />
  11. 11. The Recommendations Opportunity<br />Pandora Search for People<br />Groups browse maps<br />Events You<br />May Be<br />Interested In<br />11<br />
  12. 12. 50%<br />12<br />
  13. 13. 13<br />Positions<br />Education<br />Summary<br />Experience<br />Skills<br />
  14. 14. Are all titles the same?<br /><ul><li>Software Engineer
  15. 15. Technical Yahoo
  16. 16. Member Technical Staff
  17. 17. Software Development Engineer
  18. 18. SDE</li></li></ul><li>Are all companies the same?<br />‘IBM’ has 8000+ variations<br /><ul><li>ibm – ireland
  19. 19. ibm research
  20. 20. T J Watson Labs
  21. 21. International Bus. Machines</li></li></ul><li>Recommendation Trade-offsThe need for a common platform<br />Real Time<br /> Time Independent<br />16<br />
  22. 22. Recommendation Trade-offsThe need for a common platform<br />Content Analysis<br />Collaborative<br />17<br />
  23. 23. Recommendation Trade-offsThe need for a common platform<br />Precision <br />Recall<br />18<br />
  24. 24. Specialty -> Specialty<br /> Skills-> Skills<br />Seniority<br />Skills<br />Title<br />Specialty<br />Education<br />Experience<br />Location<br />Industry<br /> Title -> Title<br />Matching<br />0.58<br />Seniority -> Seniority<br />Related Titles<br />Related Companies<br />Related Industries<br />0.94<br />Binary<br />Exact match<br />Exact match in bucket<br />Summary -> Summary<br />0.26<br />Title -> Related Title<br />0.18<br />Education -> Education<br />Soft Match<br /> v1 = tf * idf<br />CosΘ =v1*v2<br />|v1|*|v2|<br />0.98<br />.<br />.<br />.<br />0.16<br />Seniority<br />Skills<br />Title<br />Specialty<br />Education<br />Experience<br />Location<br />Industry<br />0.40<br />Related Titles<br />Related Companies<br />Related Industries<br />
  25. 25. Importance <br />weight vector<br />(Skills-> Skills)<br />Feedback<br />0.70<br />Normalization, <br />Scoring <br />& Ranking<br />Filtering<br />Location<br />Company<br />Industry<br />Similarity <br />score vector<br />(Skills-> Skills)<br />0.94<br />
  26. 26. Technologies<br />
  27. 27. 22<br />Hadoop Case Studies<br /><ul><li>Scaling
  28. 28. Blending Recommendation Algorithms
  29. 29. Grandfathering
  30. 30. Model Selection
  31. 31. A/B Testing
  32. 32. Tracking and Reporting</li></li></ul><li>23<br />Scaling<br />Billions of Recommendations<br />Latency > 1 sec<br />Minhashing<br />Latency < 1 sec<br />Recall = Low<br />Latency < 1 sec<br />Recall = High<br />23<br />
  33. 33. 24<br />Hadoop Case Studies<br /><ul><li>Scaling ✔
  34. 34. Blending Recommendation Algorithms
  35. 35. Grandfathering
  36. 36. Model Selection
  37. 37. A/B Testing
  38. 38. Tracking and Reporting</li></li></ul><li>Blending Recommendation Algorithms<br />Co-View <br />Impact Latency ~ Minutes <br />Complexity = High<br />Co-View <br />Impact Latency ~ Hours<br /> Complexity = Low<br />25<br />
  39. 39. 26<br />Hadoop Case Studies<br /><ul><li>Scaling ✔
  40. 40. Blending Recommendation Algorithms ✔
  41. 41. Grandfathering
  42. 42. Model Selection
  43. 43. A/B Testing
  44. 44. Tracking and Reporting</li></li></ul><li>27<br />Grandfathering<br />Adding and Changing Features<br />Next Profile Edit<br />No Time Guarantees<br />Minimal Disruption<br />Parallel Feature<br />Extraction Pipeline<br />Time ~ Week<br />Significant Systems Work<br />Time ~ Hour<br />Minimal Disruption<br />Grandfather When Ready<br />
  45. 45. 28<br />Hadoop Case Studies<br /><ul><li>Scaling ✔
  46. 46. Blending Recommendation Algorithms ✔
  47. 47. Grandfathering ✔
  48. 48. Model Selection
  49. 49. A/B Testing
  50. 50. Tracking and Reporting</li></li></ul><li>29<br />Model Selection<br />Decision Trees<br /><ul><li>Features
  51. 51. Models
  52. 52. Parameters</li></ul>SVM<br />SVM<br />Logistic<br />Regression<br />`<br />Content,<br />Collaborative<br /> L1+L2<br />Regularization<br />29<br />29<br />
  53. 53. 30<br />Hadoop Case Studies<br /><ul><li>Scaling ✔
  54. 54. Blending Recommendation Algorithms ✔
  55. 55. Grandfathering ✔
  56. 56. Model Selection ✔
  57. 57. A/B Testing
  58. 58. Tracking and Reporting</li></li></ul><li>31<br />A/B Testing<br />Is Option A Better Than Option B? Let’s Test<br />New <br />Model<br />`<br />A<br />10%<br />Traffic<br />Old<br />Model<br />90%<br />B<br />Send 10% of members who have more than 100 connections AND <br />who have logged in the past one week, AND who are based in Europe<br />31<br />31<br />
  59. 59. 32<br />Hadoop Case Studies<br /><ul><li>Scaling ✔
  60. 60. Blending Recommendation Algorithms ✔
  61. 61. Grandfathering ✔
  62. 62. Model Selection ✔
  63. 63. A/B Testing ✔
  64. 64. Tracking and Reporting</li></li></ul><li>33<br />Tracking and Reporting<br />K-way joins across billions of rows<br />Up to the minute reporting<br />Nearsightedness<br />K-way join complexity<br />Lacks up to the<br /> minute reporting<br />Simple k-way joins<br />
  65. 65. 34<br />Think Platform<br />Leverage Hadoop<br />
  66. 66. 35<br />Come work with us at LinkedIn<br />You<br />Applied Research<br />Engineer<br />LinkedIn<br />35<br />

×