Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Recommendations @ LinkedIn                         1
Think PlatformLeverage Hadoop              2
The world’s largest professional network                     Over 50% of members are now international                    ...
Recommendations   Opportunity                  4
5
6
7
8
9
10
The Recommendations Opportunity                       Pandora Search for People                                         Gr...
12
Positions          EducationSummary          Experience Skills                       13
Are all titles the same?-   Software Engineer-   Technical Yahoo-   Member Technical Staff-   Software Development Enginee...
Are all companies the same?  ‘IBM’ has 8000+ variations  -   ibm – ireland  -   ibm research  -   T J Watson Labs  -   Int...
Recommendation Trade-offs        The need for a common platformReal Time   TimeIndependent                                ...
Recommendation Trade-offs         The need for a common platform Content AnalysisCollaborative                            ...
Recommendation Trade-offs         The need for a common platformPrecisionRecall                                          18
Title          SenioritySpecialty      Skills               Matching             Specialty ->Specialty                    ...
Importance        0.70                                                        Feedback  weight vector  (Skills-> Skills)  ...
Technologies
Hadoop Case Studies•Scaling• Blending Recommendation Algorithms• Grandfathering• Model Selection• A/B Testing• Tracking an...
ScalingBillions of Recommendations                       Latency > 1 sec                                  Minhashing      ...
Hadoop Case Studies•Scaling ✔• Blending Recommendation Algorithms• Grandfathering• Model Selection• A/B Testing• Tracking ...
Blending Recommendation Algorithms                                 Co-View                         Impact Latency ~ Minute...
Hadoop Case Studies•Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering• Model Selection• A/B Testing• Trackin...
GrandfatheringAdding and Changing Features         Next Profile Edit        No Time Guarantees         Minimal Disruption ...
Hadoop Case Studies•Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering ✔• Model Selection• A/B Testing• Track...
Model Selection• Features                           SVM     `• Models• Parameters                                 29
Hadoop Case Studies•Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering ✔• Model Selection ✔• A/B Testing• Tra...
A/B Testing      Is Option A Better Than Option B? Let’s Test                 `                             New           ...
Hadoop Case Studies•Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering ✔• Model Selection ✔• A/B Testing ✔• T...
Tracking and Reporting K-way joins across billions of rows                                 Up to the minute reporting     ...
Think PlatformLeverage Hadoop             34
YouCome work with us at LinkedIn           Applied Research           Engineer           LinkedIn                         ...
Prochain SlideShare
Chargement dans…5
×

Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features at LinkedIn - Abhishek Gupta & Adil Aijaz, LinkedIn

3 234 vues

Publié le

This presentation focuses on the design and evolution of the LinkedIn recommendations platform. It currently computes more than 100 billion personalized recommendations every week, powering an ever growing assortment of products, including Jobs You May be Interested In, Groups You May Like, News Relevance, and Ad Targeting. We will describe how we leverage Hadoop to transform raw data to rich features using knowledge aggregated from LinkedIn's 100 million member base, how we use Lucene to do real-time recommendations, and how we marshal Lucene on Hadoop to bridge offline analysis with user-facing services.

Publié dans : Technologie, Business
  • Soyez le premier à commenter

Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features at LinkedIn - Abhishek Gupta & Adil Aijaz, LinkedIn

  1. 1. Recommendations @ LinkedIn 1
  2. 2. Think PlatformLeverage Hadoop 2
  3. 3. The world’s largest professional network Over 50% of members are now international 135M+ * ~2/sec 90 New Members joining ** 55 >2M Company Pages 32 ** 8 17 75% Fortune 100 Companies 2 4 use LinkedIn to hire2004 2005 2006 2007 2008 2009 2010 *as of Nov 4, 2011 LinkedIn Members (Millions) **as of June 30, 2011 3
  4. 4. Recommendations Opportunity 4
  5. 5. 5
  6. 6. 6
  7. 7. 7
  8. 8. 8
  9. 9. 9
  10. 10. 10
  11. 11. The Recommendations Opportunity Pandora Search for People Groups browse maps Events You May Be Interested In 11
  12. 12. 12
  13. 13. Positions EducationSummary Experience Skills 13
  14. 14. Are all titles the same?- Software Engineer- Technical Yahoo- Member Technical Staff- Software Development Engineer- SDE
  15. 15. Are all companies the same? ‘IBM’ has 8000+ variations - ibm – ireland - ibm research - T J Watson Labs - International Bus. Machines
  16. 16. Recommendation Trade-offs The need for a common platformReal Time TimeIndependent 16
  17. 17. Recommendation Trade-offs The need for a common platform Content AnalysisCollaborative 17
  18. 18. Recommendation Trade-offs The need for a common platformPrecisionRecall 18
  19. 19. Title SenioritySpecialty Skills Matching Specialty ->Specialty 0.58Education Skills->Skills Related TitlesExperienc Binary 0.94e Related Companies Title ->TitleLocation Related Industries Exact match 0.26Industry Exact match in bucket Seniority ->Seniority 0.18 Summary ->Summary Soft Match 0.98 v1 = tf * idf Title ->Related Title 0.16 CosΘ =v1*v2Title Seniority Education ->Education |v1|*|v2| 0.40 SkillsSpecialty .Education Related Titles .Experienc .e Related CompaniesLocation Related IndustriesIndustry
  20. 20. Importance 0.70 Feedback weight vector (Skills-> Skills) Normalization, Scoring Filtering & Ranking Location Company IndustrySimilarity 0.94score vector(Skills-> Skills)
  21. 21. Technologies
  22. 22. Hadoop Case Studies•Scaling• Blending Recommendation Algorithms• Grandfathering• Model Selection• A/B Testing• Tracking and Reporting 22
  23. 23. ScalingBillions of Recommendations Latency > 1 sec Minhashing Latency < 1 sec Recall = Low Latency < 1 sec Recall = High 23
  24. 24. Hadoop Case Studies•Scaling ✔• Blending Recommendation Algorithms• Grandfathering• Model Selection• A/B Testing• Tracking and Reporting 24
  25. 25. Blending Recommendation Algorithms Co-View Impact Latency ~ Minutes Complexity = High Co-View Impact Latency ~ Hours Complexity = Low 25
  26. 26. Hadoop Case Studies•Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering• Model Selection• A/B Testing• Tracking and Reporting 26
  27. 27. GrandfatheringAdding and Changing Features Next Profile Edit No Time Guarantees Minimal Disruption Parallel Feature Extraction Pipeline Time ~ Week Significant Systems Work Time ~ Hour Minimal Disruption Grandfather When Ready 27
  28. 28. Hadoop Case Studies•Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering ✔• Model Selection• A/B Testing• Tracking and Reporting 28
  29. 29. Model Selection• Features SVM `• Models• Parameters 29
  30. 30. Hadoop Case Studies•Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering ✔• Model Selection ✔• A/B Testing• Tracking and Reporting 30
  31. 31. A/B Testing Is Option A Better Than Option B? Let’s Test ` New 10% A Model Traffic Old 90% B Model Send 10% of members who have more than 100 connections ANDwho have logged in the past one week, AND who are based in Europe 31
  32. 32. Hadoop Case Studies•Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering ✔• Model Selection ✔• A/B Testing ✔• Tracking and Reporting 32
  33. 33. Tracking and Reporting K-way joins across billions of rows Up to the minute reporting Nearsightedness K-way join complexity Lacks up to the minute reporting Simple k-way joins 33
  34. 34. Think PlatformLeverage Hadoop 34
  35. 35. YouCome work with us at LinkedIn Applied Research Engineer LinkedIn 35

×