Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Measuring Relevance in the Negative Space

128 vues

Publié le

Trey Grainger's presentation at the Southern Data Science Conference, 2019.

Publié dans : Technologie
  • Identifiez-vous pour voir les commentaires

  • Soyez le premier à aimer ceci

Measuring Relevance in the Negative Space

  1. 1. Measuring Relevance in the Trey Grainger Chief Algorithms Officer, Lucidworks @treygrainger
  2. 2. Trey Grainger Chief Algorithms Officer • Previously: SVP of Engineering @ Lucidworks; Director of Engineering @ CareerBuilder • Georgia Tech – MBA, Management of Technology • Furman University – BA, Computer Science, Business, & Philosophy • Stanford University – Information Retrieval & Web Search Other fun projects: • Co-author of Solr in Action, plus numerous research publications • Advisor to Presearch, the decentralized search engine • Lucene / Solr contributor About Me
  3. 3. Agenda • Fraudulent AI • Adversarial Machine Learning • Cancer • War • Bikinis • Brainwashing • Alt-right • White Supremacism • Time Travel • Avengers Endgame Spoilers • Negative Space • Dark Data • Pornography • Global Warming • Algorithmic Bias • Diet & Exercise • Self-crashing Cars • Racism • Sexism
  4. 4. Who are we? 230 CUSTOMERS ACROSS THE FORTUNE 1000 400+EMPLOYEES OFFICES IN San Francisco, CA (HQ) Raleigh-Durham, NC Cambridge, UK Bangalore, India Hong Kong The Search & AI Conference COMPANY BEHIND Employ about 40% of the active committers on the Solr project 40% Contribute over 70% of Solr's open source codebase 70% DEVELOP & SUPPORT Apache
  5. 5. The standard for enterprise search. of Fortune 500 uses Solr. 90%
  6. 6. Industry’s most powerful Intelligent Search & Discovery Platform.
  7. 7. Let the most respected analysts in the world speak on our behalf Dassault Systèmes Mindbreeze Coveo Microsoft Attivio Expert System Smartlogic Sinequa IBM IHS Markit Funnelback Micro Focus COMPLETENESS OF VISION ABILITYTOEXECUTE CHALLENGERS LEADERS NICHE PLAYERS VISIONARIES Source: June 2018 Gartner Magic Quadrant report on Insight Engines. © Gartner, Inc.
  8. 8. Goals of this Talk 1. Help identify patterns for uncovering overlooked data hidden in plain sight 2. Point out current failures and dangers of overlooking this negative space. 3. Discuss applications to my field (information retrieval) and how my company is working to overcome some of these failures in our own technology.
  9. 9. So what is ?
  10. 10. Negative Space in Data Science • Definition: “The missing or hidden data that gives shape to the data you do have” • If you think of your data within a vector space, then it’s very analogous to negative space in art (art is just usually projected onto two dimensions) • “Negative” is a polysemous word. It can mean “undesirable/bad” or it can mean “taken away/not there”. • This talk intentionally uses both senses to make the point that not leveraging missing or hidden data often leads to bad/undesirable outcomes.”
  11. 11. Data System Generated Human Generated Application Generated Content Index Facet, Topic & Cluster Query Rule Matching Natural Language Machine Learning Boosted Results Signals Search & Discovery Customer Analytics Digital Commerce
  12. 12. 40% of the S&P 500 will be extinct in 10 years
  13. 13. Filling in the Negative Space aka: connecting the dots, or traversing the knowledge graph
  14. 14. https://svs.gsfc.nasa.gov/30919 What is this a picture of?
  15. 15. Stars in the Sky Lights on a Map Mouse Brain with Dementia Jellyfish Larvae
  16. 16. https://svs.gsfc.nasa.gov/30919 Any idea?
  17. 17. How about now?
  18. 18. If we zoom out a little bit…
  19. 19. And if we keep zooming out… We see a map of all lights in the world
  20. 20. And similar patterns emerge in other contexts… Let’s explore airline flight patterns…
  21. 21. https://xkcd.com/1138/ Heatmap
  22. 22. Watson: “You appeared to [see a good deal] which was quite invisible to me” Sherlock: “Not invisible but unnoticed, Watson. You did not know where to look, and so you missed all that was important.” The Adventures of Sherlock Holmes, ADVENTURE III. A CASE OF IDENTITY, Sir. Oliver Conan Doyle
  23. 23. Head? Pipe? Coat Collar? Back of Hat? Hat? Smoke? Nose? Abstract Concept of Detective with Pipe Specific hypothesis from Experience (leveraging social cue that this is probably a well-known answer) Detective (Deerstalker) Hat! Final Answer + conceptual context
  24. 24. Fighting Algorithmic Bias aka: slapping ourselves in the face for a bit
  25. 25. Ok, Google… Is Agave Nectar good for you?
  26. 26. So I bought a few…
  27. 27. …and then one day I checked again… !
  28. 28. Ok, so AI can definitely be wrong, but can it be malicious?
  29. 29. Racist Algorithms? Sexist Algorithms? Creepy Algorithms? Negligent Algorithms? Fraudulent Algorithms? Malicious Algorithms?
  30. 30. Adversarial Machine Learning
  31. 31. “Adversarial Patch”, Tom P. Brown, et. al, 2017.
  32. 32. Racist Algorithms? Sexist Algorithms? Creepy Algorithms? Negligent Algorithms? Fraudulent Algorithms? Malicious Algorithms?
  33. 33. Fraudulent Algorithms?
  34. 34. “Adversarial Attacks on Medical Machine Learning”, Samuel G. Finlayson, et. al., 2019.
  35. 35. Fraudulent Algorithms?
  36. 36. Negligent Algorithms?
  37. 37. Negligent Algorithms?
  38. 38. Racist Algorithms? Sexist Algorithms?
  39. 39. Racist Algorithms? Sexist Algorithms?
  40. 40. Sexist Algorithms? Creepy Algorithms?
  41. 41. Manual Override By Facebook Still Available through Query Variations
  42. 42. Sexist Algorithms? Creepy Algorithms?
  43. 43. Malicious Algorithms?
  44. 44. Malicious Algorithms?
  45. 45. Racist Algorithms? Sexist Algorithms? Creepy Algorithms? Negligent Algorithms? Fraudulent Algorithms? Malicious Algorithms?
  46. 46. Biased Algorithms!
  47. 47. Youtube: Relevance = “Most likely to capture attention” (ads) Facebook: Relevance = “Most likely to capture attention” (ads) Amazon: Relevance = “Satisfied Customer Purchases” (purchases) Lucidworks: Relevance = “Whatever our customers want it to be…” Why the bias?
  48. 48. So how can we help our customers avoid these pitfalls?
  49. 49. Search-Driven Everything Customer Service Customer Insights Fraud Surveillance Research Portal Online Retail Digital Content
  50. 50. Significance of Feedback Loops User Searches User Sees Results User takes an action Users’ actions inform system improvements Southern Data Science
  51. 51. Signal Boosting User Searches User Sees Results User takes an action Users’ actions inform system improvements User Query Results Alonzo pizza doc10, doc22, doc12, … Elena soup doc84, doc2, doc17, … Ming pizza doc10, doc22, doc12, … … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc12 Alonzo purchase doc22 Ming click doc22 Ming purchase doc22 Elena click doc2 … … … Query Document Signal Boost pizza doc22 54,321 pizza doc12 987 soup doc17 1,234 soup doc2 2,345 … … pizza ⌕ query: pizza boost: doc22^54321 boost: doc12^987 ƒ(x) = Σ(click * click_weight * time_decay) + Σ(purchase * purchase_weight * time_decay) + other_factors
  52. 52. Search ipad
  53. 53. Search
  54. 54. Search ipad
  55. 55. • 200%+ increase in click-through rates • 91% lower TCO • 50,000 fewer support tickets • Increased customer satisfaction
  56. 56. Signal Boosting • Benefits: dramatically improves relevance (increased conversions, most popular documents / answers at the top) • Risks: • Reinforces current biases: Documents at the top already are more likely to be clicked on / purchased / interacted with, and therefore diversity is harder to achieve • Solution: Learning to Rank: Learn relevance patterns and feature weights from aggregate behavior instead of overfitting to specific documents • Subject to Manipulation: Once users realize their behaviors (searches, clicks, etc.) influence the ranking, they can manipulate the engine with fake actions to boost or bury content through adversarial actions. • Solutions: • Session-filtering: limit to one action, per-type, per user. Further limit by IP address, browser fingerprint, etc. if necessary • Quality vs. Quality Weighting: For users acting on lots of queries or documents, reduce the weight of each action proportionate to the total actions. The more actions taken per user, the less they count toward the aggregate.
  57. 57. Learning to Rank (LTR) ● It applies machine learning techniques to discover the best combination of features that provide best ranking. ● It requires labeled set of documents with relevancy scores for given set of queries ● Features used for ranking are usually more computationally expensive than the ones used for matching ● It typically re-ranks a subset of the matched documents (e.g. top 1000)
  58. 58. # Run Searches http://localhost:8983/solr/techproducts/select?q=ipod
  59. 59. # Supply User Relevancy Judgements nano contrib/ltr/example/user_queries.txt #Format: query | doc id | relevancy judgement | source # Train and Upload Model ./train_and_upload_demo_model.py -c config.json
  60. 60. # Re-run Searches using Machine-learned Ranking Model http://localhost:8984/solr/techproducts/browse?q=ipod &rq={!ltr model=exampleModel reRankDocs=100 efi.user_query=$q}
  61. 61. Collaborative Filtering (Recommendations) User Searches User Sees Results User takes an action Users’ actions inform system improvements User Query Results Alonzo pizza doc10, doc22, doc12, … Elena soup doc84, doc2, doc17, … Ming pizza doc10, doc22, doc12, … … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc12 Alonzo purchase doc22 Ming click doc22 Ming purchase doc12 Elena click doc2 … … … User Item Weight Alonzo doc22 1.0 Alonzo doc12 0.4 … … … Ming doc12 0.9 Ming doc22 0.6 … … … pizza ⌕ Matrix Factorization Recommendations for Alonzo: • doc22: “Peperoni Pizza” • doc12: “Cheese Pizza” …
  62. 62. Collaborative Filtering • Benefits: crowd-sources related content discovery based on real user interactions with no a-priori understanding of the content required • Risks: • Reinforces biases: People interact with what they are recommended, so those same items get recommended to the next person ad-infinitum • Solutions: • Combine with Content-based Features: Multi-modal recommendations enable mixing non-behavior-based matches and overcome the cold-start problem • Only Count Explicit Actions: If content is on “autoplay”, don’t assume an interaction is positive. Only count explicit clicks, likes, dislikes, etc. • Inject Conceptual Diversity: Use techniques like concept clustering or the Semantic Knowledge Graph to determine key conceptual differences between content, and ensure results coming back represent diverse viewpoints and not just identical ones. • Subject to Manipulation: Same concerns as signals boosting • Solutions: Same solutions as Signals Boosting (Session-filtering, Quality vs. Quality Weighting)
  63. 63. What is the Negative Space between two words?
  64. 64. What’s in the Negative Space Between the words “Jean Grey” and “In Love”? Jean Grey In Love
  65. 65. Semantic Knowledge Graph
  66. 66. Content-based Recommendations http://localhost:8983/solr/job-postings/skg
  67. 67. Scoring of Node Relationships (Edge Weights) Foreground vs. Background Analysis Every term scored against it’s context. The more commonly the term appears within it’s foreground context versus its background context, the more relevant it is to the specified foreground context. countFG(x) - totalDocsFG * probBG(x) z = -------------------------------------------------------- sqrt(totalDocsFG * probBG(x) * (1 - probBG(x))) { "type":"keywords”, "values":[ { "value":"hive", "relatedness":0.9773, "popularity":369 }, { "value":"java", "relatedness":0.9236, "popularity":15653 }, { "value":".net", "relatedness":0.5294, "popularity":17683 }, { "value":"bee", "relatedness":0.0, "popularity":0 }, { "value":"teacher", "relatedness":-0.2380, "popularity":9923 }, { "value":"registered nurse", "relatedness": -0.3802 "popularity":27089 } ] } We are essentially boosting terms which are more related to some known feature (and ignoring terms which are equally likely to appear in the background corpus) + - Foreground Query: "Hadoop" Knowledge Graph
  68. 68. Techniques like the Semantic Knowledge Graph can be used to score “diversity” across content, which can aid in reducing the bias of Signals and Collaborative Filtering.
  69. 69. So, can we go back in time and fix our mistakes?
  70. 70. No, but we do have a wizard….
  71. 71. User Searches User Sees Results User takes an action Well, today, most of us run A/B experiments to test hypothesis to “limit” the unknown negative impact to a subset of users
  72. 72. What if we could use the negative space to view alternate futures… …and then make only the specific choices that will achieve the desired outcomes
  73. 73. Imagine if we could simulate user interactions to changes before having to expose real users to those changes?
  74. 74. User Searches User Sees Results User takes an action Users’ actions inform system improvements User Query Results Alonz o pizza doc10, doc22, doc12, … Elena soup doc84, doc2, doc17, … Ming pizza doc10, doc22, doc12, … … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc10 Alonzo purchase doc22 Ming click doc22 Ming purchase doc22 Elena click doc2 … … … We DO have historical user behavior, but it’s biased to the current algorithm... The click and purchase counts are all higher for docs that are already ranked higher, since they’re seen more often…
  75. 75. User Searches User Sees Results User takes an action Users’ actions inform system improvements User Query Results Alonz o pizza doc10, doc22, doc12, … Elena soup doc84, doc2, doc17, … Ming pizza doc10, doc22, doc12, … … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc10 Alonzo purchase doc22 Ming click doc22 Ming purchase doc22 Elena click doc2 … … … What other data do we have available that we’re not leveraging?
  76. 76. User Query Results Alonz o pizza doc10, doc22, doc12, … Elena soup doc84, doc2, doc17, … Ming pizza doc10, doc22, doc12, … … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc10 Alonzo purchase doc22 Ming click doc22 Ming purchase doc22 Elena click doc2 … … … What we already know: • What the user searched • What the user interacted with (click, purchase) • Results returned to the user What would we ideally like to know? • Which documents are relevant (user liked) • Which documents are irrelevant (user didn’t like) • What is the ideal ranking of documents? Can we use the Negative Space to connect the dots?
  77. 77. How to infer relevance? Rank Document ID 1 Doc1 2 Doc2 3 Doc3 4 Doc4 Query Query Doc1 Doc2 Doc3 0 1 1 Query Doc1 Doc2 Doc3 1 0 0 Click Graph Skip Graph ?
  78. 78. From this click-skip graph, we can generate a ground truth data set mapping known queries to an ideal ranking of documents.
  79. 79. How to Measure Relevance? A B C Retrieved Documents Relevant Documents Precision = B / A Recall = B / C Problem: Assume Prec = 90% and Rec = 100% but assume the 10% irrelevant documents were ranked at the top of the retrieved documents, is that OK?
  80. 80. Discounted Cumulative Gain Rank Relevancy 1 0.95 2 0.65 3 0.80 4 0.85 Rank Relevancy 1 0.95 2 0.65 3 0.80 4 0.85 Ranking Ideal Given • Position is considered in quantifying relevancy. • Labeled dataset is required.
  81. 81. User Query Results Alonz o pizza doc10, doc22, doc12, … Elena soup doc84, doc2, doc17, … Ming pizza doc10, doc22, doc12, … … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc10 Alonzo purchase doc22 Ming click doc22 Ming purchase doc22 Elena click doc2 … … … Relevance Backtesting Simulation
  82. 82. Did we cover our Agenda? • Fraudulent AI • Adversarial Machine Learning • Cancer • War • Bikinis • Brainwashing • Alt-right • White Supremacism • Time Travel • Avengers Endgame Spoilers • Negative Space • Dark Data • Pornography • Global Warming • Algorithmic Bias • Diet & Exercise • Self-crashing Cars • Racism • Sexism
  83. 83. Goals of this Talk 1. Help identify patterns for uncovering overlooked data hidden in plain sight 2. Point out current failures and dangers of overlooking this negative space. 3. Discuss applications to my field (information retrieval) and how my company is working to overcome some of these failures in our own technology.
  84. 84. Trey Grainger trey.grainger@lucidworks.com @treygrainger Thank you! http://solrinaction.com Other presentations: http://www.treygrainger.com Discount code: ctwdsc19 Book Signing 3:00 pm today! (coffee break) @ Registration Desk

×