Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

AI, Search, and the Disruption of Knowledge Management

297 vues

Publié le

Trey Grainger's Presentation from the DOD & Federal Knowledge Management Symposium 2019.

Publié dans : Technologie
  • Identifiez-vous pour voir les commentaires

  • Soyez le premier à aimer ceci

AI, Search, and the Disruption of Knowledge Management

  1. 1. 2019.05.15 Reflected Intelligence: AI, Search, and the Disruption of KM DOD and KM Symposium 2019: What is the Future? Trey Grainger Chief Algorithms Officer
  2. 2. Trey Grainger Chief Algorithms Officer • Previously: SVP of Engineering @ Lucidworks; Director of Engineering @ CareerBuilder • Georgia Tech – MBA, Management of Technology • Furman University – BA, Computer Science, Business, & Philosophy • Stanford University – Information Retrieval & Web Search Other fun projects: • Co-author of Solr in Action, plus numerous research publications • Advisor to Presearch, the decentralized search engine • Advisor to several startups • Open Source Apache Lucene / Solr contributor About Me
  3. 3. The Search & AI Conference COMPANY BEHIND Who are we? 230 CUSTOMERS ACROSS THE FORTUNE 1000 400+EMPLOYEES OFFICES IN San Francisco, CA (HQ) Raleigh-Durham, NC Cambridge, UK Bangalore, India Hong Kong Employ about 40% of the active committers on the Solr project 40% Contribute over 70% of Solr's open source codebase70% DEVELOP & SUPPORT Apache
  4. 4. Industry’s most powerful Intelligent Search & Discovery Platform.
  5. 5. What is the Goal of Knowledge Management?
  6. 6. Creating Sharing Using Managing Knowledge & Information Why: Achieve Organizational Objectives
  7. 7. Creating Sharing Using Managing Why: Achieve Organizational Objectives Knowledge & Information
  8. 8. Creating Sharing Using Managing Knowledge & Information What?: Why?: Achieve Organizational Objectives How?: Right Answers + Right People + Right Time
  9. 9. Creating Sharing Using Managing Knowledge & Information What?: Why?: Achieve Organizational Objectives How?: Right Answers + Right People + Right Time
  10. 10. Search has become today’s de-facto user interface for delivering knowledge & information for seeking knowledge & information
  11. 11. Search Appliance Previous attempts have failed.
  12. 12. • Proudly built with open- source tech at its core: Apache Solr & Apache Spark • Personalizes work with applied machine learning • Proven on the biggest corporate & government information systems
  13. 13. Let the most respected analysts in the world speak on our behalf Dassault Systèmes Mindbreeze Coveo Microsoft Attivio Expert System Smartlogic Sinequa IBM IHS Markit Funnelback Micro Focus COMPLETENESS OF VISION ABILITYTOEXECUTE CHALLENGERS LEADERS NICHE PLAYERS VISIONARIES Source: June 2018 Gartner Magic Quadrant report on Insight Engines. © Gartner, Inc.
  14. 14. What do you mean by "Search”?
  15. 15. 20 Years Ago: Search was navigating a Taxonomy of Relationships
  16. 16. 10 Years Ago Search was finding 10 Blue Links
  17. 17. Today’s Search Is: •Domain-aware •Assistive •Contextual & Personalized (location, last search, profile) •Conversational •Multi-modal (Text, Voice, Images, Event/Pushed-based) •Smart (AI-powered) •Beyond links and information to Answers and Action
  18. 18. Basic Keyword Search (inverted index, tf-idf, bm25, multilingual text analysis, query formulation, etc.) Query Intent (query classification, semantic query parsing, semantic knowledge graphs, concept expansion, automatic query rewrites, clustering, classification, personalization, question/answer systems, virtual assistants) Automated Relevancy Tuning (Signals, AB Testing/multi-armed bandits/back-testing, genetic algorithms, Deep Learning, Learning to Rank) Self-learning Taxonomies / Entity Extraction (entity recognition, taxonomies, ontologies, business rules, synonyms, etc.) Search Intelligence Spectrum
  19. 19. Key Query Intent Components: • Apache Solr • Solr Text Tagger • Semantic Knowledge Graph • Statistical Phrase Identifier • Fusion Semantic Query Pipelines • Fusion AI Synonyms Job • Fusion AI Token & Phrase Spell Correction Job • Fusion AI Head/Tail Analysis Job • Fusion AI Phrase Identification Job • Fusion Query Rules Engine
  20. 20. Through these tools, the engine self-learns domain-specific semantic relationships
  21. 21. … and enables domain experts to easily accept or adjust the built in AI… …completely deferring to the AI, or trusting it above a certain confidence level, or even manually approving every suggestion.
  22. 22. Fusion AI Jobs
  23. 23. Traditional Keyword Search Recommendations Semantic Search User Intent Personalized Search Augmented Search Domain-aware Matching Understanding User Intent
  24. 24. What is Reflected Intelligence?
  25. 25. Importance of Feedback Loops User Searches User Sees Results User takes an action Users’ actions inform system improvements Southern Data Science
  26. 26. Signal Boosting User Searches User Sees Results User takes an action Users’ actions inform system improvements User Query Results Alonzo pizza doc10, doc22, doc12, … Elena soup doc84, doc2, doc17, … Ming pizza doc10, doc22, doc12, … … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc12 Alonzo purchase doc22 Ming click doc22 Ming purchase doc22 Elena click doc2 … … … Query Document Signal Boost pizza doc22 54,321 pizza doc12 987 soup doc17 1,234 soup doc2 2,345 … … pizza ⌕ query: pizza boost: doc22^54321 boost: doc12^987 ƒ(x) = Σ(click * click_weight * time_decay) + Σ(purchase * purchase_weight * time_decay) + other_factors
  27. 27. ipad
  28. 28. ipad
  29. 29. • 200%+ increase in click-through rates • 91% lower TCO • 50,000 fewer support tickets • Increased customer satisfaction
  30. 30. Learning to Rank User Searches User Sees Results User takes an action Users’ actions inform system improvements User Query Re Alonzo pizza do do do Elena soup do do do Ming pizza do do do … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc12 Alonzo purchase doc22 Ming click doc22 Ming purchase doc22 Elena click doc2 … … … Feature Weight title_match_any_terms 15.25 is_known_category 10 popularity 9.5 content_age 9.2 … … pizza ⌕ Initial Results: 1) doc1 2) doc2 3) doc3 Build Ranking Classifier (from Implicit Relevance Judgements) Final Results: 1) doc3 2) doc1 3) doc2
  31. 31. Collaborative Filtering (Recommendations) User Searches User Sees Results User takes an action Users’ actions inform system improvements User Query Results Alonzo pizza doc10, doc22, doc12, … Elena soup doc84, doc2, doc17, … Ming pizza doc10, doc22, doc12, … … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc12 Alonzo purchase doc22 Ming click doc22 Ming purchase doc12 Elena click doc2 … … … User Item Weight Alonzo doc22 1.0 Alonzo doc12 0.4 … … … Ming doc12 0.9 Ming doc22 0.6 … … … pizza ⌕ Matrix Factorization Recommendations for Alonzo: • doc22: “Peperoni Pizza” • doc12: “Cheese Pizza” …
  32. 32. Summary of Signal-based Ranking Models • Signals Boosting: Ensures most popular specific content/answers for a query returns first • Learning to Rank: Learns a model of which combinations of features generally matter most across users, and ranks all content/answers using that model • Collaborative Filtering: Learns which items are best to recommend to a given user (or related to a given item) based on behavior of other users who have previously interacted with similar items
  33. 33. User Searches User Sees Results User takes an action Today, many organizations run A/B experiments to test hypotheses to “limit” the unknown negative impact to a subset of users
  34. 34. …and then make only the specific choices that will achieve the desired outcomes But what if we could peer into millions of alternate futures…
  35. 35. In other words, imagine if we could simulate user interactions to changes before ever having to expose real users to those changes?
  36. 36. User Query Results Alonz o pizza doc10, doc22, doc12, … Elena soup doc84, doc2, doc17, … Ming pizza doc10, doc22, doc12, … … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc10 Alonzo purchase doc22 Ming click doc22 Ming purchase doc22 Elena click doc2 … … … Relevance Simulation (backtesting)
  37. 37. Example Use Cases
  38. 38. How do you tell this story?
  39. 39. Goal: Enable our users to independently turn data into information into knowledge
  40. 40. Digital Commerce
  41. 41. The Curation Challenge: Gracefully combining human and machine intelligence to deliver relevance
  42. 42. Personalizati on
  43. 43. Regular Search Results: Personalized Search Results: User:
  44. 44. Digital Workplace (“Enterprise Search”)
  45. 45. Facet, Topic & Cluster Query Rule Matching Natural Language Machine Learning Boosted Results Signals Content Index System Generated Human Generated Application Generated Solution Digital Workplace Data
  46. 46. NLP: NER, Phrases, POS Document Classification Anomaly Detection Clustering Topic Detection Connectors ETL Pipelines Search Engine & Data Processing SQL Engine Rules Engine Scheduling & Alerting Query Pipelines Query Intent Detector Automatic Relevancy Signals & Query Analytics Recommenders A/B Testing Scalable Operations Extensible System Generated Application Generated Data Modular Components Stateless Architecture User-focused Experience Geospatial Mapping Results Preview Rapid Prototyping Digital Workplace Solution CloudScalable CDCR Security Human Generated Connect users to insights precisely at their moment of need any format, any platform
  47. 47. What is a Knowledge Graph? (vs. Ontology vs. Taxonomy vs. Synonyms, etc.)
  48. 48. Overly Simplistic Definitions Ontology: Defines relationships between types of things [ animal eats food; human is animal ] Knowledge Graph: Instantiation of an Ontology (contains the things that are related) [ john is human; john eats food ] Taxonomy: Classifies things into Categories [ john is Human; Human is Mammal; Mammal is Animal ] Synonyms List: Provides substitute words that can be used to represent the same or very similar things [ human => homo sapien, mankind; food => sustenance, meal ] Alternative Labels: Substitute words with identical meanings [ CTO => Chief Technology Officer; specialise => specialize ] In practice, there is significant overlap… Synonyms List Taxonomy Ontology Knowledge Graph Alt. Labels
  49. 49. What kind of Knowledge Graph can help us with the kinds of problems we encounter in Search use cases?
  50. 50. Knowledge Graph Challenges of building a traditional knowledge graph Because current knowledge bases / ontology learning systems typically requires explicitly modeling nodes and edges into a graph ahead of time, this unfortunately presents several limitations to the use of such a knowledge graph: • Entities not modeled explicitly as nodes have no known relationships to any other entities. • Edges exist between nodes, but not between arbitrary combinations of nodes, and therefore such a graph is not ideal for representing nuanced meanings of an entity when appearing within different contexts, as is common within natural language. • Substantial meaning is encoded in the linguistic representation of the domain that is lost when the underlying textual representation is not preserved: phrases, interaction of concepts through actions (i.e. verbs), positional ordering of entities and the phrases containing those entities, variations in spelling and other representations of entities, the use of adjectives to modify entities to represent more complex concepts, and aggregate frequencies of occurrence for different representations of entities relative to other representations. • It can be an arduous process to create robust ontologies, map a domain into a graph representing those ontologies, and ensure the generated graph is compact, accurate, comprehensive, and kept up to date. Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016.
  51. 51. most often used in reference to “free text”
  52. 52. But… unstructured data is really more like “hyper-structured” data. It is a graph that contains much more structure than typical “structured data.”
  53. 53. Structured Data Employees Table id name company start_date lw100 Trey Grainger 1234 2016-02-01 dis2 Mickey Mouse 9123 1928-11-28 tsla1 Elon Musk 5678 2003-07-01 Companies Table id name start_date 1234 Lucidworks 2016-02-01 5678 Tesla 1928-11-28 9123 Disney 2003-07-01 Discrete Values Continuous Values Foreign Key
  54. 54. Unstructured Data Trey Grainger works at Lucidworks. He is speaking at the 2019 DOD & Federal KM Symposium. #KMSymposium is being held in Baltimore May 14-16, 2019. Trey got his masters from Georgia Tech.
  55. 55. Trey Grainger works for Lucidworks. He is speaking at the DOD & Federal KM Symposium 2019. #KMSymposium (DOD & Federal KM Symposium) is being held in Baltimore May 14-16, 2019. Trey got his masters degree from Georgia Tech. Trey’s Voicemail Unstructured Data
  56. 56. Trey Grainger works for Lucidworks. He is speaking at the DOD & Federal KM Symposium 2019. #KMSymposium (DOD & Federal KM Symposium) is being held in Baltimore May 14-16, 2019. Trey got his masters degree from Georgia Tech. Trey’s Voicemail Foreign Key?
  57. 57. Trey Grainger works for Lucidworks. He is speaking at the DOD & Federal KM Symposium 2019. #KMSymposium (DOD & Federal KM Symposium) is being held in Baltimore May 14-16, 2019. Trey got his masters degree from Georgia Tech. Trey’s Voicemail Fuzzy Foreign Key? (Entity Resolution)
  58. 58. Trey Grainger works for Lucidworks. He is speaking at the DOD & Federal KM Symposium 2019. #KMSymposium (DOD & Federal KM Symposium) is being held in Baltimore May 14-16, 2019. Trey got his masters degree from Georgia Tech. Trey’s Voicemail Fuzzier Foreign Key? (metadata, latent features)
  59. 59. Trey Grainger works for Lucidworks. He is speaking at the DOD & Federal KM Symposium 2019. #KMSymposium (DOD & Federal KM Symposium) is being held in Baltimore May 14-16, 2019. Trey got his masters degree from Georgia Tech. Trey’s Voicemail Fuzzier Foreign Key? (metadata, latent features) Not so fast!
  60. 60. Giant Graph of Relationships... Trey Grainger works for Lucidworks. He is speaking at the DOD & Federal KM Symposium 2019. #KMSymposium (DOD & Federal KM Symposium) is being held in Baltimore May 14-16, 2019. Trey got his masters degree from Georgia Tech. Trey’s Voicemail
  61. 61. Semantic Knowledge Graph
  62. 62. Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016. Knowledge Graph Graph Traversal Data Structure View Graph View doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 skill: Java skill: Java skill: Scala skill: Hibernate skill: Oncology doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 job_title: Software Engineer job_title: Data Scientist job_title: Java Developer …… Inverted Index Lookup Forward Index Lookup Forward Index Lookup Inverted Index Lookup Java Java Developer Hibernate Scala Software Engineer Data Scientist has_related_skill has_related_skill has_related_skill has_related_job_title has_related_job_title has_related_job_title has_related_job_title has_related_job_title has_related_job_title
  63. 63. Scoring of Node Relationships (Edge Weights) Foreground vs. Background Analysis Every term scored against it’s context. The more commonly the term appears within it’s foreground context versus its background context, the more relevant it is to the specified foreground context. countFG(x) - totalDocsFG * probBG(x) z = -------------------------------------------------------- sqrt(totalDocsFG * probBG(x) * (1 - probBG(x))) { "type":"keywords”, "values":[ { "value":"hive", "relatedness":0.9773, "popularity":369 }, { "value":"java", "relatedness":0.9236, "popularity":15653 }, { "value":".net", "relatedness":0.5294, "popularity":17683 }, { "value":"bee", "relatedness":0.0, "popularity":0 }, { "value":"teacher", "relatedness":-0.2380, "popularity":9923 }, { "value":"registered nurse", "relatedness": -0.3802 "popularity":27089 } ] } We are essentially boosting terms which are more related to some known feature (and ignoring terms which are equally likely to appear in the background corpus) + - Foreground Query: "Hadoop" Knowledge Graph
  64. 64. Knowledge Graph
  65. 65. Knowledge Graph
  66. 66. Related term vector (for query concept expansion) http://localhost:8983/solr/stack-exchange-health/skg
  67. 67. Content-based Recommendations (More Like This on Steroids) http://localhost:8983/solr/job-postings/skg
  68. 68. Who’s in Love with Jean Grey?
  69. 69. NER automatically translates… Barack Obama was the president of the United States of America. Before that, Obama was a senator. into… <person id="barack_obama">Barack Obama</person> was the <role>president</role> of the <country id="usa">United States of America</country>. Before that, <person id="barack_obama">Obama</person> was a <role>senator</role>. In the search engine, this would become: text: Barack Obama was the president of the United States of America. Before that, Obama was a senator. person: Barack Obama country: United States of America role: [ president, senator ] Named Entity Recognition (NER)
  70. 70. Differentiating related terms Misspellings: managr => manager Synonyms: cpa => certified public accountant rn => registered nurse r.n. => registered nurse Ambiguous Terms*: driver => driver (trucking) ~80% likelihood driver => driver (software) ~20% likelihood Related Terms: r.n. => nursing, bsn hadoop => mapreduce, hive, pig *differentiated based upon user and query context
  71. 71. Use Case: Query Disambiguation Example Related Keywords (representing multiple meanings) driver truck driver, linux, windows, courier, embedded, cdl, delivery architect autocad drafter, designer, enterprise architect, java architect, designer, architectural designer, data architect, oracle, java, architectural drafter, autocad, drafter, cad, engineer … … Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
  72. 72. Use Case: Query Disambiguation Example Related Keywords (representing multiple meanings) driver truck driver, linux, windows, courier, embedded, cdl, delivery architect autocad drafter, designer, enterprise architect, java architect, designer, architectural designer, data architect, oracle, java, architectural drafter, autocad, drafter, cad, engineer … … Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
  73. 73. A few methodologies: 1) Query Log Mining 2) Semantic Knowledge Graph Knowledge Graph
  74. 74. Semantic Knowledge Graph: Discovering ambiguous phrases 1) Use a document classification field (i.e. category) as the first level of a graph, and the related terms as the second level to which you traverse. 2) Has the benefit that you don’t need query logs to mine, but it will be representative of your data, as opposed to your user’s intent, so the quality depends on how clean and representative your documents are. Additional Benefit: Multi-dimensional disambiguation and dynamic materialization of categories. Effectively an dynamically-materialized probabilistic graphical model
  75. 75. Disambiguation by Category Example Meaning 1: Restaurant => bbq, brisket, ribs, pork, … Meaning 2: Outdoor Equipment => bbq, grill, charcoal, propane, …
  76. 76. Disambiguated meanings (represented as term vectors) Example Related Keywords (Disambiguated Meanings) architect 1: enterprise architect, java architect, data architect, oracle, java, .net 2: architectural designer, architectural drafter, autocad, autocad drafter, designer, drafter, cad, engineer driver 1: linux, windows, embedded 2: truck driver, cdl driver, delivery driver, class b driver, cdl, courier designer 1: design, print, animation, artist, illustrator, creative, graphic artist, graphic, photoshop, video 2: graphic, web designer, design, web design, graphic design, graphic designer 3: design, drafter, cad designer, draftsman, autocad, mechanical designer, proe, structural designer, revit … … Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
  77. 77. Using the disambiguated meanings In a situation where a user searches for an ambiguous phrase, what information can we use to pick the correct underlying meaning? 1. Any pre-existing knowledge about the user: • User is a software engineer • User has previously run searches for “c++” and “linux” 2. Context within the query: User searched for windows AND driver vs. courier OR driver 3. If all else fails (and there is no context), use the most commonly occurring meaning. driver 1: linux, windows, embedded 2: truck driver, cdl driver, delivery driver, class b driver, cdl, courier Source: M. Korayem, C. Ortiz, K. AlJadda, T. Grainger. "Query Sense Disambiguation Leveraging Large Scale User Behavioral Data". IEEE Big Data 2015.
  78. 78. Thought Exercise What do you think of when I say the word “Facebook”?
  79. 79. Every term or phrase is a Context-dependent cluster of meaning with an ambiguous label
  80. 80. What does “love” mean? http://localhost:8983/solr/thesaurus/skg
  81. 81. What does “love” mean in the context of “hug”? http://localhost:8983/solr/thesaurus/skg "embrace"
  82. 82. What does “love” mean in the context of “child”? http://localhost:8983/solr/thesaurus/skg
  83. 83. So what’s my end goal here? User’s Query: machine learning research and development Portland, OR software engineer AND hadoop, java Traditional Query Parsing: (machine AND learning AND research AND development AND portland) OR (software AND engineer AND hadoop AND java) Semantic Query Parsing: "machine learning" AND "research and development" AND "Portland, OR" AND "software engineer" AND hadoop AND java Semantically Expanded Query: "machine learning"^10 OR "data scientist" OR "data mining" OR "artificial intelligence") AND ("research and development"^10 OR "r&d") AND AND ("Portland, OR"^10 OR "Portland, Oregon" OR {!geofilt pt=45.512,-122.676 d=50 sfield=geo}) AND ("software engineer"^10 OR "software developer") AND (hadoop^10 OR "big data" OR hbase OR hive) AND (java^10 OR j2ee)
  84. 84. Example Query
  85. 85. Why this Semantic Nuance Matters
  86. 86. Basic Keyword Search (inverted index, tf-idf, bm25, multilingual text analysis, query formulation, etc.) Query Intent (query classification, semantic query parsing, semantic knowledge graphs, concept expansion, automatic query rewrites, clustering, classification, personalization, question/answer systems, virtual assistants) Automated Relevancy Tuning (Signals, AB Testing/multi-armed bandits/back-testing, genetic algorithms, Deep Learning, Learning to Rank) Self-learning Taxonomies / Entity Extraction (entity recognition, taxonomies, ontologies, business rules, synonyms, etc.) Search Intelligence Spectrum
  87. 87. • Search is today’s de-facto User Experience for delivering knowledge and information. • Reflected Intelligence uses content + signals to constantly gain intelligence about your domain, your content, and your users through continuous feedback loops. • Your content already IS a hyper-structured knowledge graph. Smart search technology makes this graph usable so you don’t have to build it all again yourself. • The nuance of natural language really matters. Though all models are wrong, make sure yours are “useful.” • AI and Search represent an evolution in Knowledge Management. They will disrupt some current practices, but ultimately serve as a highly-complementary tool set to most practitioners. Summary
  88. 88. Questions?
  89. 89. Trey Grainger trey@lucidworks.com @treygrainger http://solrinaction.com Other presentations: http://www.treygrainger.com Discount code: 39grainger Thank you!

×