SlideShare une entreprise Scribd logo
1  sur  30
What Is Semantic Search?
And Why Is It Important?
Bob Kasenchak
Access Innovations
@taxobob
bob_kasenchak@accessinn.com
NISO Webinar
“Discovery and Online search:
Personalized Content, Personal Data”
Outline
Semantic Search
● What Is It? (Basics)
● Why Do We Need It? (Why Does Search Fail?)
● So…What Is It? (Specifics)
● Examples and Implementations
What Is Semantic Search?
Semantic Search goes beyond
keyword searches
to examine
Context
Google says “Things Not Strings”
Why “Basic” Search Fails
Why does search fail?
● Simple search simply matches text strings
● Language is ambiguous
● There is a *lot* of content
Discovery
Google Scholar
Discovery
Specialized Repositories
Why “Basic” Search Fails
Specialized Repositories
Discovery
Why “Basic” Search Fails
Search fails because simple string matching is not adequate
for large, specialized repositories of content with technical
language that evolves over time.
(Also, language is ambiguous.)
What Is Semantic Search?
Semantic Search goes beyond
keyword string-matching
to examine
Context
using a variety of means
Google says “Things Not Strings”
What Is Semantic Search?
Semantic Search
Examines the semantic context of the search query to drive
relevant results.
This can include: taxonomies, lexical variants, location, your
previous searches, previous similar searches, ontologies,
knowledge graphs, and other strategies.
Allow Lexical Variants
“Fuzzy Matching” and Similar Techniques
● Use Levenshtein distance (or similar) to match
misspellings and variants
● Stem words for search
● Instead of exact string matches
● This can cause noise, be careful!
Query Parsing
Contextual Search: Location
Contextual Search: User Activity
Contextual Search: User Activity
Google Knowledge Graph
Google (again): Things not Strings
The Google Knowledge Graph connects search with e.g.
known facts about entities
(Driven by a big old ontology)
Google Knowledge Graph
Google Knowledge Graph
Google Knowledge Graph
Taxonomies and Tagging
Controlling Vocabularies
● Search tags before free text (search engine tuning)
● Allow users to browse (in addition to querying)
● Suggest topics using type-ahead or “did you mean”
● Leverage synonymy to deliver same relevant results from
various inputs
Taxonomies and Tagging
The Irony of Document Categorization
● We’re interested in concepts
● Words are ambiguous
● But words in the text are all we have to go on
○ Unless we apply good subject metadata
PLOS
● 9000+ term thesaurus
● And ~4000 Synonyms (!)
● Applied to documents
● Exposed in browse (!)
● Used to redirect search queries for synonyms
● Exposed at article level to user
○ Crowdsourced QC!
Taxonomies and Tagging
Discovery
PLOS
Discovery
PLOS
JSTOR
● Document becomes the search query (!)
● Combination of taxonomy and naive classification
● Suggests related content for research, bibliography
● Experimental, successful, also very cool
labs.jstor.org
Other Novel Approaches
Discovery
JSTOR
● Simple things:
● Using existing search software tuning/options
● Enable fuzzy matching
● Configure how Booleans are automatically applied
● Weight fields, doc types, etc. where appropriate
● Use dates to deliver recent results
Implementation: Kind of Easy
● Next level:
● Taxonomy
● And tagging
● Knowledge Graphs
● User profiles, user behavior, other targeted means
Implementation: More Complex
Thanks!
Bob Kasenchak
Access Innovations
@taxobob
bob_kasenchak@accessinn.com
NISO Webinar
“Discovery and Online search:
Personalized Content, Personal Data”

Contenu connexe

Similaire à Kasenchak "What Is Semantic Search? And Why Is It Important?"

Internet Research Presentation
Internet Research PresentationInternet Research Presentation
Internet Research Presentationadeason
 
Googling academic research
Googling academic researchGoogling academic research
Googling academic researchwchrism
 
How search engines work Anand Saini
How search engines work Anand SainiHow search engines work Anand Saini
How search engines work Anand SainiDr,Saini Anand
 
Il semaforo di Yoast non è il (tuo) problema
Il semaforo di Yoast non è il (tuo) problemaIl semaforo di Yoast non è il (tuo) problema
Il semaforo di Yoast non è il (tuo) problemaLaura Sacco
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and searchNathan McMinn
 
Effective search strategies
Effective search strategiesEffective search strategies
Effective search strategiesLisa Proctor
 
Martina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesiteMartina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesiteNordicSitecoreConference
 
Vandenbosch2010 04-13search the-internet
Vandenbosch2010 04-13search the-internetVandenbosch2010 04-13search the-internet
Vandenbosch2010 04-13search the-internetJan Beniest
 
E-LEARN: Search Strategies
E-LEARN: Search StrategiesE-LEARN: Search Strategies
E-LEARN: Search StrategiesRose Petralia
 
week 8 Effective Searching on Internet.ppt
week 8 Effective Searching on Internet.pptweek 8 Effective Searching on Internet.ppt
week 8 Effective Searching on Internet.pptMohamed960052
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 
Search skills
Search skillsSearch skills
Search skillsEslamEzz7
 
INFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARY
INFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARYINFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARY
INFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARYChris Okiki
 
Full text search
Full text searchFull text search
Full text searchdeleteman
 
Paradigm Wars: Object Oriented Vs Functional Programming in creating MarkParser
Paradigm Wars: Object Oriented Vs Functional Programming in creating MarkParserParadigm Wars: Object Oriented Vs Functional Programming in creating MarkParser
Paradigm Wars: Object Oriented Vs Functional Programming in creating MarkParserRohit Arora
 

Similaire à Kasenchak "What Is Semantic Search? And Why Is It Important?" (20)

Sourcing languages
Sourcing languagesSourcing languages
Sourcing languages
 
Internet Research Presentation
Internet Research PresentationInternet Research Presentation
Internet Research Presentation
 
Googling academic research
Googling academic researchGoogling academic research
Googling academic research
 
How search engines work Anand Saini
How search engines work Anand SainiHow search engines work Anand Saini
How search engines work Anand Saini
 
Il semaforo di Yoast non è il (tuo) problema
Il semaforo di Yoast non è il (tuo) problemaIl semaforo di Yoast non è il (tuo) problema
Il semaforo di Yoast non è il (tuo) problema
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
Effective search strategies
Effective search strategiesEffective search strategies
Effective search strategies
 
Google Is a Two Page Site
Google Is a Two Page SiteGoogle Is a Two Page Site
Google Is a Two Page Site
 
Martina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesiteMartina Welander - Google is a two pagesite
Martina Welander - Google is a two pagesite
 
Vandenbosch2010 04-13search the-internet
Vandenbosch2010 04-13search the-internetVandenbosch2010 04-13search the-internet
Vandenbosch2010 04-13search the-internet
 
E-LEARN: Search Strategies
E-LEARN: Search StrategiesE-LEARN: Search Strategies
E-LEARN: Search Strategies
 
week 8 Effective Searching on Internet.ppt
week 8 Effective Searching on Internet.pptweek 8 Effective Searching on Internet.ppt
week 8 Effective Searching on Internet.ppt
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
Apfm studio bechard_jan182013class
Apfm studio bechard_jan182013classApfm studio bechard_jan182013class
Apfm studio bechard_jan182013class
 
Search skills
Search skillsSearch skills
Search skills
 
INFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARY
INFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARYINFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARY
INFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARY
 
Deck_Rob Flaherty
Deck_Rob FlahertyDeck_Rob Flaherty
Deck_Rob Flaherty
 
Full text search
Full text searchFull text search
Full text search
 
Paradigm Wars: Object Oriented Vs Functional Programming in creating MarkParser
Paradigm Wars: Object Oriented Vs Functional Programming in creating MarkParserParadigm Wars: Object Oriented Vs Functional Programming in creating MarkParser
Paradigm Wars: Object Oriented Vs Functional Programming in creating MarkParser
 

Plus de National Information Standards Organization (NISO)

Plus de National Information Standards Organization (NISO) (20)

Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 
Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"
 
Ratner "Enhancing Open Science: Assessing Tools & Charting Progress"
Ratner "Enhancing Open Science: Assessing Tools & Charting Progress"Ratner "Enhancing Open Science: Assessing Tools & Charting Progress"
Ratner "Enhancing Open Science: Assessing Tools & Charting Progress"
 
Pfeiffer "Enhancing Open Science: Assessing Tools & Charting Progress"
Pfeiffer "Enhancing Open Science: Assessing Tools & Charting Progress"Pfeiffer "Enhancing Open Science: Assessing Tools & Charting Progress"
Pfeiffer "Enhancing Open Science: Assessing Tools & Charting Progress"
 

Dernier

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 

Dernier (20)

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 

Kasenchak "What Is Semantic Search? And Why Is It Important?"

Notes de l'éditeur

  1. Good afternoon. My name is Bob Kasenchak – I’m a taxonomist and director of business development at Access Innovations. Today I’d like to talk about Semantic Search, why it’s important, ways to implement it, and other related topics. My talk will outline the topic and introduce some concepts; the subsequent talks will, I’m sure, go into more detail.
  2. Here’s a brief outline of my talk. My goal is that by the end of the talk you have some idea of what Semantic Search is – or, at least, about the sundry approaches that people mean when they say “semantic search.” This will set up Duane and Travis to go into specifics and details. There is (in theory, at least) ample time in this block for questions and discussion at the end.
  3. Ironically (since we’re talking about semantics) people use the term “semantic search” to refer to a variety of things – but they all have something in common: trying to extend or amplify or improve search beyond matching keyword strings to – using some method or methods – determine the context of the search. This takes a number of forms, about more which shortly. Google has popularized the tagline “things not strings” to explain its semantic search, which involves a knowledge graph (that is: ontology) and some other stuff. First, though, I think it will be helpful to investigate the problem we’re trying to solve. So: what problem are we trying to solve?
  4. What’s the problem with “regular” or “basic” search? And specifically, I think, I mean: in the context of scholarly publishing. Most of the time, search is limited to whatever default search is available on the platforms used by scholarly publishers – there are almost always some options, and they are almost never used; further, some platforms have limitations. And most are based on “regular” string-based search. When I say “regular” or “basic” search: I mean that you enter a keyword in a box, and the search application (using an inverted index) tries to match that string with documents. Sometimes fuzzy matching is used – to catch misspellings and whatnot – but in essence “regular” search looks, literally, for WORDS in DOCUMENTS. That is to say: it matches text strings. The problem for publishers is that in large, specialized content sets “basic” search fails because (1) it’s simply looking for text strings, and does not have the detailed kinds of indexing that e.g. Google does on a constant basis; (2) because language is ambiguous; and (3) because specialized repositories tend to have very, very large content sets with extraordinarily detailed and specialized vocabularies that change over time.
  5. In essence, simple search just looks for the words in the query. In this example from Google Scholar, I have searched for the word “horse”. Two things are noteworthy here: (1) I got over 3 million results, and (2) the logic of the algorithm must prioritize author names -- since, as you can see (I hope), the first paper listed is not about horses; rather, the author’s name is “Red-Horse.” This is, in my analysis, sub-optimal. Why not have a place to specifically search for Author – or omit Author names from search?
  6. Further -- without using any fancy synonymy or acronyms or other semantic trickery -- if I search instead for the string “horses” I get 1.7 m results. This seems to be triggered by the form of the word in the title. In other words: Google scholar doesn’t even recognize simple English plurals as the same string -- and by extension, the same result set. It’s literally – and merely – looking for instances of the exact string typed in the Box. Again: the search simply tries to match the word (or words) in the box in some place in a very large set of documents, with some priority seemingly given to which field (title, author, abstract) in which the word appears. Google search -- not Google scholar -- seems to have figured this out long ago, but for some reason the Google scholar search...um, has not. Some simple NLP would go a long way here, such as recognizing plurals, not to mention other types of synonyms, instead of just words.
  7. Another example, this time from a specialized repository (content from a scholarly publisher) -- the same concept can be expressed in multiple ways (besides morphological variants of words like plurals and adjectives); this problem is most acutely expressed with acronyms. While all acronyms have a great number of meanings, within many fields some acronyms are so ubiquitous that they should be accounted for in search. This example shows search results for “unmanned aerial vehicles” in the library of a mid-sized academic publisher...for which, as you can see, 171 results were found…
  8. ...whereas a search for “uav” (the acronym for the same concept) returns over three times as many results. Again, it’s obvious from the results listings of the respective queries that the way the concept is expressed in the title has a major bearing on the content returned. Let me say that one more time: The way the author chooses to represent the main topic of an article in the title has a major bearing on the search results. This, again, to me, seems to be sub-optimal.
  9. So. Why does search fail? It fails because simple string matching is not adequate for large, specialized repositories of content with technical language that evolves over time. Also: it goes without saying I think that language is ambiguous.
  10. So: the basic idea is that Semantic Search goes beyond simple keyword text-string matching. This takes a variety of forms, the common denominator of which, I think, is that:
  11. Semantic Search does not merely try to find keywords but examines the semantic context of the search query to drive relevance. This can include synonyms and lexical variants as well as things like location of the user, or involve graph databases and related concepts as well as a host of other methods. Some of these are quite simple, and others quite complex and involved. Accordingly, some are simple to implement – and some are, well, not. So. Some examples will I think help.
  12. One basic kind of semantic search is designed to help people find what they’re looking for without knowing the exact or specific language or terminology in the content. The idea is to allow the search engine to match near-matches instead of just exact strings. This can cause noise – so use caution. The upside is that someone might not know how to spell “Gastrointestinal stromal neoplasms” – but if they can get close, it might still match. Levenshtein distance (speaking of things I have to look up to spell correctly) is a commonly used metric to determine the “distance” between words that essentially depends on how many changes you’d have to make to get from Word A to Word B. So, for example, “Bob” and “Rob” are very close in Levenshtein terms, while “Bob” and “Antidisestablishmentarianism” are far, far apart. In short, it’s a kind of “fuzzy” matching algorithm.
  13. Another use of the knowledge graph stems from user behavior. We are no longer trained to use old-fashioned library searches with keywords and Boolean operators; instead, when we use Google, we often type in a query – not just a keyword. This also relies on the knowledge graph. For example, Googling “Harrison Ford” and “When is Harrison Ford’s Birthday” picks up two different results. The first resembles a keyword search; the second is a natural language query, which requires parsing. Once parsed, the Knowledge Graph can deliver a targeted result with the exact information sought. Note that the info box on the right (the knowledge graph results) are the same – it’s referencing the graph to pull out information based on the parsed query.
  14. Another set of methods are in a category called “contextual search”. This moniker applies to a variety of techniques to use information gathered about the user, location, or recent searches to drive relevance. For example, some applications – notably Google, but also other map-based applications – use your location to deliver relevant local results. This is usually done using either your IP address (which contains location information) or by allowing an application to access GPS data, usually from your phone or other devices. For example, if I Google “pizza” the first results aren’t definitions and Wikipedia pages about what pizza is; rather, I get suggestions for “pizza near me”. In fact, all of the first page of Google results comprises pizza restaurants. Note, however, that the Google Knowledge graph in the right sidebar does provide the generic definition (Wikipedia, again) as well as – interestingly – information about pizza stocks to invest in, which is interesting.
  15. Also from Google (can you see a trend here?) we can see another of contextual search based on recent activity or user profiles. This works particularly well if you’re logged into Google, of course. If I search for “jaguar” – what do I expect: cars or cats? *I* get results for cars – although I can’t quite say why. Someone who regularly searches for animals would get results for large cats. So Google stores logged-in user searches and delivers results based on previous activity. This can be useful for publishers and societies – if, and only if, users log in when they come to search your site – or they persistently “stay” logged in (the browser remembers them or whatever). In this way, if you’re, say, a cancer research organization and your members have some kind of indication about whether they’re doctors or researchers or patients or pharma reps – they can get relevant content delivered based on their member profile. Naturally, there’s some work to do to set this up. But it can be very valuable.
  16. Incidentally, I also Googled “jaguars” plural – expecting to get the big cats. However, probably since I regularly read NFL content, my results were for the Jacksonville Jaguars. (I would like to note that I use the same Google profile across my devices, so this does not mean I’m reading NFL news AT WORK.)
  17. The Google Knowledge Graph is an example that’s becoming very commonplace; I’ll show a screengrab in a minute. Briefly, when a query strikes some node in the graph – in addition to providing search results of web pages, a sidebar appears with other information related to the search. This works particularly well with entities., less well with concepts. Here’s an example:
  18. So. Searching for “Empire State Building” brings up, predictably, the website, twitter feed, and (below, as we shall see) the Wikipedia page. But over on the right side is a bunch of information from the Google Knowledge Graph. Let’s take a closer look:
  19. (I grabbed the stuff downscreen and put them side-by-side). So we get some pictures of the Empire State Building – with a link to more – and links to the website, Google Maps for directions, a blurb (which comes from Wikipedia), address, statistics, Questions and Answers, Reviews, Popular times, stuff about the movie of the same name, links to social media, and links to “other people also search for”. Pretty cool – and a much richer experience than just the web page results. It is also pretty intuitively presented, which is nice. How does this work?
  20. Basically, there’s a ginormous ontology – specifically, a knowledge graph – and if you hit a node it returns a bunch of other information is has associated with that node using the semantic web. Here I’ve shown a totally made up but plausible facsimile of what the knowledge graph behind the information box in the previous slides must look like. It is a lot of work to build knowledge graphs like this – but they are extraordinarily powerful. And the industry is moving in this direction.
  21. A good way to approach semantic search is to, basically, try to control the semantics. Taxonomies can help in a number of ways: tuning the search to prioritize tags over free text, allowing users to browse a taxonomy of subjects, using taxonomy terms to drive type-ahead or “did you mean”-style redirection, and using synonymy to drive the same relevant results for a number of string inputs. This allows for both improvements in search and improvements in interfaces – how users interact with the data.
  22. On tagging: the irony of document categorization is that we’re not interested in words -- we’re interested in concepts; but the only window into the concepts we have available are the words. This is why we use subject metadata to describe things (using a taxonomy or other controlled vocabulary), and then make that metadata available to the search engine. This is not news, of course. But it illustrates why good subject categorization with a controlled vocabulary helps to organize the data to make it more efficient for search. The previous examples showing failed searches using abbreviations, synonyms, acronyms, and other lexical variants can be solved with a robust, well-formed taxonomy and document tagging program. This is not new to the industry – but it merits discussion in a talk about semantic search.
  23. What can good document categorization achieve? As an example, let’s take a look at the PLOS One platform (full disclosure -- PLOS is a client and I have worked with their taxonomy team). PLOS uses a very large thesaurus and automatically tags each article with up to 8 terms from the vocabulary for search; they also expose the full hierarchy for browsing.
  24. Here’s the browse interface; in addition to the hierarchy, you can see how many articles are attached to each term (and, of course, launch a search by clicking).
  25. And here’s a sample PLOS One article; in the lower right-hand corner you can see the terms applied -- and, moreover, click on one to launch a search on a topic. This is great -- you don’t have to guess how they phrase the term you’re interested in! Even cooler -- the little buttons next to each yellow bar are to flag the article if you think a term has been misapplied. These pretty standard applications of metadata, tagging, taxonomy, and search technologies make the user experience much improved. The same principle applies to tagging content for machine learning!
  26. To illustrate the direction semantic search is heading, next I want to say a little big about the work JSTOR -- and in particular, JSTOR Labs -- has been doing around their search experience, which relies on a combination of traditional taxonomy-based tagging and inferential (that is, naive) topic modeling to create new search experiences.
  27. In the JSTOR Labs Text Analyzer, any document you can OCR, upload, or take a picture of with your phone becomes your search. The document is analyzed both in terms of the massive JSTOR taxonomy (again, full disclosure -- I worked on this project!) as well as other words and entities that appear in the document -- and recommends related content in JSTOR (some 7 million articles, last I checked)! You can curate the results to make the algorithms more accurate. This very cool beta project takes a new perspective on search using both traditional metadata, taxonomy, and tagging applications as well as ML-based technologies.
  28. So, practically speaking, how do we get this done? If you want to implement semantic search: where do you start? Any existing search platform has a back-end – which you may need a developer to access and change – that can be configured to take advantage of built-in features. This can include using fuzzy matching (or not), changing other settings like automatic Booleans in strings, using dates to rank relevancy of results, or prioritizing certain fields (for structured content, of course) to, say, prefer keyword strings found in titles first.
  29. Beyond configuring search to improve results, consider taxonomies and tagging – both for retrieval and interface options like type-ahead and “did you mean”. The next level is a knowledge graph – which is a considerable effort, but a very powerful tool. And understanding the user – whether considering previous searches or some kind of user profile with useful information – is yet another avenue to pursue.
  30. Thank you.