Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Which Algorithms Really Matter

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 56 Publicité

Which Algorithms Really Matter

Télécharger pour lire hors ligne

This is the position talk that I gave at CIKM. Included are 4 algorithms that I feel don't get much academic attention, but which are very important industrially. It isn't necessarily true that these algorithms *should* get academic attention, but I do feel that it is true that they are quite important pragmatically speaking.

This is the position talk that I gave at CIKM. Included are 4 algorithms that I feel don't get much academic attention, but which are very important industrially. It isn't necessarily true that these algorithms *should* get academic attention, but I do feel that it is true that they are quite important pragmatically speaking.

Publicité
Publicité

Plus De Contenu Connexe

Similaire à Which Algorithms Really Matter (20)

Publicité
Publicité

Which Algorithms Really Matter

  1. Which Algorithms Really Matter? ©MapR Technologies 2013 1
  2. Me, Us  Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG  MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s  Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR ©MapR Technologies 2013 2
  3. Topic For Today  What is important? What is not?  Why?  What is the difference from academic research?  Some examples ©MapR Technologies 2013 4
  4. What is Important?  Deployable  Robust  Transparent  Skillset and mindset matched?  Proportionate ©MapR Technologies 2013 5
  5. What is Important?  Deployable – Clever prototypes don’t count if they can’t be standardized  Robust  Transparent  Skillset and mindset matched?  Proportionate ©MapR Technologies 2013 6
  6. What is Important?  Deployable –  Robust –  Clever prototypes don’t count Mishandling is common Transparent – Will degradation be obvious?  Skillset and mindset matched?  Proportionate ©MapR Technologies 2013 7
  7. What is Important?  Deployable –  Robust –  Will degradation be obvious? Skillset and mindset matched? –  Mishandling is common Transparent –  Clever prototypes don’t count How long will your fancy data scientist enjoy doing standard ops tasks? Proportionate – Where is the highest value per minute of effort? ©MapR Technologies 2013 8
  8. Academic Goals vs Pragmatics  Academic goals – – –  Reproducible Isolate theoretically important aspects Work on novel problems Pragmatics – – – – – Highest net value Available data is constantly changing Diligence and consistency have larger impact than cleverness Many systems feed themselves, exploration and exploitation are both important Engineering constraints on budget and schedule ©MapR Technologies 2013 9
  9. Example 1: Making Recommendations Better ©MapR Technologies 2013 10
  10. Recommendation Advances  What are the most important algorithmic advances in recommendations over the last 10 years?  Cooccurrence analysis?  Matrix completion via factorization?  Latent factor log-linear models?  Temporal dynamics? ©MapR Technologies 2013 11
  11. The Winner – None of the Above  What are the most important algorithmic advances in recommendations over the last 10 years? 1. Result dithering 2. Anti-flood ©MapR Technologies 2013 12
  12. The Real Issues  Exploration  Diversity  Speed  Not the last fraction of a percent ©MapR Technologies 2013 13
  13. Result Dithering  Dithering is used to re-order recommendation results – Re-ordering is done randomly  Dithering is guaranteed to make off-line performance worse  Dithering also has a near perfect record of making actual performance much better ©MapR Technologies 2013 14
  14. Result Dithering  Dithering is used to re-order recommendation results – Re-ordering is done randomly  Dithering is guaranteed to make off-line performance worse  Dithering also has a near perfect record of making actual performance much better “Made more difference than any other change” ©MapR Technologies 2013 15
  15. Simple Dithering Algorithm  Generate synthetic score from log rank plus Gaussian s = logr + N(0, e )  Pick noise scale to provide desired level of mixing Dr µ r exp e  Typically e Î [ 0.4, 0.8]  Oh… use floor(t/T) as seed ©MapR Technologies 2013 16
  16. Example … ε = 0.5 1 1 1 1 1 1 1 2 4 2 3 2 ©MapR Technologies 2013 2 2 4 2 6 2 2 1 1 1 1 1 6 3 3 4 2 3 3 3 2 5 5 3 5 8 2 3 3 5 4 5 7 3 4 4 3 5 6 15 4 24 6 7 3 4 2 7 17 4 7 7 7 16 7 12 6 9 7 7 12 13 6 11 13 9 17 5 4 8 13 8 17 16 34 10 19 5 13 14 17 5 6 6 16
  17. Example … ε = log 2 = 0.69 1 1 1 1 1 1 1 2 2 3 11 1 ©MapR Technologies 2013 2 8 3 2 5 2 3 4 3 4 1 8 8 14 8 10 33 7 5 11 1 1 2 7 3 15 2 7 15 3 23 8 4 2 4 3 9 3 10 3 2 5 9 3 6 10 5 22 18 15 2 5 8 9 4 7 1 7 11 7 11 7 22 7 6 11 19 4 44 8 15 3 2 6 10 4 14 29 6 2 9 33 14 14 33
  18. Exploring The Second Page ©MapR Technologies 2013 19
  19. Lesson 1: Exploration is good ©MapR Technologies 2013 20
  20. Example 2: Bayesian Bandits ©MapR Technologies 2013 21
  21. Bayesian Bandits  Based on Thompson sampling  Very general sequential test  Near optimal regret  Trade-off exploration and exploitation  Possibly best known solution for exploration/exploitation  Incredibly simple ©MapR Technologies 2013 22
  22. Thompson Sampling  Select each shell according to the probability that it is the best  Probability that it is the best can be computed using posterior é ù P(i is best) = ò I êE[ri | q ] = max E[rj | q ]ú P(q | D) dq ë û j  But I promised a simple answer ©MapR Technologies 2013 23
  23. Thompson Sampling – Take 2  Sample θ q ~ P(q | D)  Pick i to maximize reward i = argmax E[rj | q ] j  Record result from using i ©MapR Technologies 2013 24
  24. Fast Convergence 0.12 0.11 0.1 0.09 0.08 regret 0.07 0.06 ε- greedy, ε = 0.05 0.05 0.04 Bayesian Bandit with Gam m a- Norm al 0.03 0.02 0.01 0 0 100 200 300 400 500 600 n ©MapR Technologies 2013 25 700 800 900 1000 1100
  25. Thompson Sampling on Ads An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011 ©MapR Technologies 2013 26
  26. Bayesian Bandits versus Result Dithering  Many useful systems are difficult to frame in fully Bayesian form  Thompson sampling cannot be applied without posterior sampling  Can still do useful exploration with dithering  But better to use Thompson sampling if possible ©MapR Technologies 2013 27
  27. Lesson 2: Exploration is pretty easy to do and pays big benefits. ©MapR Technologies 2013 28
  28. Example 3: On-line Clustering ©MapR Technologies 2013 29
  29. The Problem  K-means clustering is useful for feature extraction or compression  At scale and at high dimension, the desirable number of clusters increases  Very large number of clusters may require more passes through the data  Super-linear scaling is generally infeasible ©MapR Technologies 2013 30
  30. The Solution  Sketch-based algorithms produce a sketch of the data  Streaming k-means uses adaptive dp-means to produce this sketch in the form of many weighted centroids which approximate the original distribution  The size of the sketch grows very slowly with increasing data size  Many operations such as clustering are well behaved on sketches Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson. Revisiting k-means: New Algorithms via Bayesian Nonparametrics . Brian Kulis, Michael Jordan. ©MapR Technologies 2013 31
  31. An Example ©MapR Technologies 2013 32
  32. An Example ©MapR Technologies 2013 33
  33. The Cluster Proximity Features  Every point can be described by the nearest cluster – –  Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign bit + 2 proximities) – –  4.3 bits per point in this case Significant error that can be decreased (to a point) by increasing number of clusters Error is negligible Unwinds the data into a simple representation Or we can increase the number of clusters (n fold increase adds log n bits per point, decreases error by sqrt(n) ©MapR Technologies 2013 34
  34. Diagonalized Cluster Proximity ©MapR Technologies 2013 35
  35. Lots of Clusters Are Fine ©MapR Technologies 2013 36
  36. Typical k-means Failure Selecting two seeds here cannot be fixed with Lloyds Result is that these two clusters get glued together ©MapR Technologies 2013 37
  37. Streaming k-means Ideas  By using a sketch with lots (k log N) of centroids, we avoid pathological cases  We still get a very good result if the sketch is created – – in one pass with approximate search  In fact, adaptive dp-means works just fine  In the end, the sketch can be used for clustering or … ©MapR Technologies 2013 38
  38. Lesson 3: Sketches make big data small. ©MapR Technologies 2013 39
  39. Example 4: Search Abuse ©MapR Technologies 2013 40
  40. Recommendations Alice Charles ©MapR Technologies 2013 Alice got an apple and a puppy Charles got a bicycle 41
  41. Recommendations Alice Bob Charles ©MapR Technologies 2013 Alice got an apple and a puppy Bob got an apple Charles got a bicycle 42
  42. Recommendations Alice Bob ? What else would Bob like? Charles ©MapR Technologies 2013 43
  43. Log Files Alice Charles Charles Alice Alice Bob Bob ©MapR Technologies 2013 44
  44. History Matrix: Users by Items Alice ✔ Bob ✔ Charles ©MapR Technologies 2013 ✔ ✔ ✔ ✔ 45 ✔
  45. Co-occurrence Matrix: Items by Items How do you tell which co-occurrences are useful?. 1 2 1 1 2 ©MapR Technologies 2013 1 0 - 0 1 1 46 0 0
  46. Co-occurrence Binary Matrix not not ©MapR Technologies 2013 1 1 47 1
  47. Indicator Matrix: Anomalous Co-Occurrence Result: The marked row will be added to the indicator field in the item document… ✔ ✔ ©MapR Technologies 2013 48
  48. Indicator Matrix That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine. ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) Note: data for the indicator field is added directly to meta-data for a document in Solr index. You don’t need to create a separate index for the indicators. ©MapR Technologies 2013 49
  49. Internals of the Recommender Engine 50 ©MapR Technologies 2013 50
  50. Internals of the Recommender Engine 51 ©MapR Technologies 2013 51
  51. Looking Inside LucidWorks Real-time recommendation query and results: Evaluation What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry” 52 ©MapR Technologies 2013 52
  52. Real-life example ©MapR Technologies 2013 53
  53. Lesson 4: Recursive search abuse pays Search can implement recs Which can implement search ©MapR Technologies 2013 54
  54. Summary ©MapR Technologies 2013 55
  55. ©MapR Technologies 2013 56
  56. Me, Us  Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG  MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s  Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR ©MapR Technologies 2013 57

Notes de l'éditeur

  • * A history of what everybody has done. Obviously this is just a cartoon because large numbers of users and interactions with items would be required to build a recommender* Next step will be to predict what a new user might like…
  • *Bob is the “new user” and getting apple is his history
  • *Here is where the recommendation engine needs to go to work…Note to trainer: you might see if audience calls out the answer before revealing next slide…
  • Note to trainer: This is the situation similar to that in which we started, with three users in our history. The difference is that now everybody got a pony. Bob has apple and pony but not a puppy…yet
  • *Binary matrix is stored sparsely
  • *Convert by MapReduce into a binary matrixNote to trainer: Whether consider apple to have occurred with self is open question
  • Old joke: all the world can be divided into 2 categories: Scotch tape and non-Scotch tape… This is a way to think about the co-occurrence
  • Only important co-occurrence is puppy follows apple
  • *Take that row of matrix and combine with all the meta data we might have…*Important thing to get from the co-occurrence matrix is this indicator..Cool thing: analogous to what a lot of recommendation engines do*This row forms the indicator field in a Solr document containing meta-data (you do NOT have to build a separate index for the indicators)Find the useful co-occurrence and get rid of the rest. Sparsify and get the anomalous co-occurrence
  • Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide
  • *This indicator field is where the output of the Mahout recommendation engine are stored (the row from the indicator matrix that identified significant or interesting co-occurrence. *Keep in mind that this recommendation indicator data is added to the same original document in the Solr index that contains meta data for the item in question
  • This is a diagnostics window in the LucidWorksSolr index (not the web interface a user would see). It’s a way for the developer to do a rough evaluation (laugh test) of the choices offered by the recommendation engine.In other words, do these indicator artists represented by their indicator Id make reasonable recommendations Note to trainer: artist 303 happens to be The Beatles. Is that a good match for Chuck Berry?

×