SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
Ted Talk
Ted Sullivan
(Well Before Back in the Day - 2018)
- Ted Sullivan, PhD
“(old Phuddy Duddy)”
“Senior (very much so I’m afraid)

Solutions (I hope)

Architect (and sometime plumber)”
- Ted Sullivan
When is my search app done?

“How do you get there grasshopper? Add semantic
intelligence to the engine!”
In his own words...
For the past 15 or so years now I have been building search applications, first with Verity K2 for a project
with a publishing company H.W. Wilson, then with most of the vendor products in the search space,
Ultraseek, Fast, Autonomy, Endeca, Vivissimo, MarkLogic and Exalead. I watched Lucene grow and
develop from an interesting little search engine to a major force in the search technology business. Before
that, I was building collaborative battlefield planning applications for the U.S. Army and before that I was
working on Internet stuff back in the dawn of the Web (well almost - 1994). I have been programming in
Java since 1995 and professionally since 1996 or so. I was learning JavaScript when Netscape was still
developing it, but only recently have begun to truly understand its power! John Resig and Bear Bibeault's
book "Secrets of the JavaScript Ninja" is a must read for anyone that wants to follow this path. Currently, I
am struggling up the AngularJS learning curve.
Before my work in the web with my friend Jim Spatz at Spatz Computer Graphics, I published some Math
games for kids on the original Mac OS, and before that, I did science - Auditory Neuroscience to be more
precise. I studied the auditory system of 'fly-by-night' critters, bats and owls first at Washington University in
St Louis, then at Caltech and Princeton. I was pretty good at Science but didn't like the writing part as
much as I should have. I had much more fun writing code (C, FORTRAN and PDP 8/11 assembler).
Currently, I am enjoying becoming part of the Open Source Revolution working at Lucidworks. Back in
1995 when Linux came out, I had a bet with my boss Jim Spatz about its future - I'm happy to say now that
I lost that bet. I would aspire to be an Open Source evangelist but there are enough of those already. I'll
settle for Solr Evangelist.
I'll settle for Solr Evangelist.
The Search Curmudgeon
• Learned
• Wise
• Pragmatic
• Caring
Random Rants from the
Search Curmudgeon
• https://lucidworks.com/2015/03/09/random-
rants-search-curmudgeon/
• Search vs. Information Access
Data Science for
Dummies
• https://lucidworks.com/2016/09/06/data-
science-for-dummies/
• "A conditional probability is like the probability
that you are a moron if you text while driving
(pretty high it turns out – and would be a good
source of Darwin awards except for the innocent
people that also suffer from this lunacy.)"
The Twilight of the Vengine Gods
(Die Göttervenginedämmerung) or
Die Hard with A Vengines!!!
•  https://lucidworks.com/2016/10/18/the-
twilight-of-the-vengine-gods-die-
gottervenginedammerung/
• "The Curmudgeon doesn’t dispense news, he just
tells you what information, new or old sucks or
what pisses him off and then rants about it. "
Where did all the
Librarians go?
• https://lucidworks.com/2017/11/21/where-did-
all-the-librarians-go/
• "You’ve probably gotten tired of me by now, that’s
OK because I’m tired of me too."
Search Legacy
• Blogs: as Search Curmudgeon and himself
• Lucidworks: heavy duty implementations
• Techniques: autophrasing and query autofiltering
• Presentations: Revolutions and inaugural Haystack
Automatic Phrase Tokenization:
Improving Lucene Search Precision
by More Precise Linguistic Analysis
• https://lucidworks.com/2014/07/02/automatic-
phrase-tokenization-improving-lucene-search-
precision-by-more-precise-linguistic-analysis/
• Takeaway: moving from bag of words towards bag
of things
Solution for Multi-term Synonyms in
Lucene/Solr Using the Auto
Phrasing TokenFilter
• https://lucidworks.com/2014/07/12/solution-for-
multi-term-synonyms-in-lucenesolr-using-the-auto-
phrasing-tokenfilter/
• LUCENE-2605 & Friends resolved over two years
later
• split on whitespace = false
The Well Tempered Search
Application – Prelude
• https://lucidworks.com/2015/01/27/well-tempered-search-application-
prelude/
• Semantic Search, linguistics, context
• Best Bets (landing pages / rules)
• Synonyms, stemming, lemmatization, taxonomy, ontology, machine learning
/ classification, NLP/AI
The Well Tempered Search
Application – Fugue
• https://lucidworks.com/2015/02/03/well-tempered-search-application-fugue/
• autophrasing
• "red sofa" problem
• Takeaway: ahead of its time (evolving into Solr Text Tagger and query
rewriting)
• "seed crystals of knowledge": SME tagging
Introducing Query
Autofiltering
• https://lucidworks.com/2015/02/17/introducing-query-autofiltering/
• "autotagging of the incoming query where the knowledge source is the
search index itself"
• we already have the information that we need to “do the right thing”
we just don’t use it
• "Another approach that was suggested by Erik Hatcher, is to have a
separate collection that is specialized as a knowledge store and query it to
get the categories with which to autofilter on the content collection."
• The key is that in both cases, we are using the search index itself as a
knowledge source that we can use for intelligent query introspection
and thus powerful inferential search!!
Thoughts on 

“Search vs. Discovery”
• https://lucidworks.com/2015/03/02/thoughts-search-
vs-discovery/
• "findability", facets, aboutness, relatedness
• "However if a document is not appropriately tagged, it
may become invisible..."; Data quality really matters here!
• Auto classification and manual subject matter expert
tagging
• Visualization, search driven analytics
Query Autofiltering Revisited
– Lets be more precise!!!
• https://lucidworks.com/2015/05/13/query-autofiltering-
revisited-can-precise/
• "blue red lion socks"
Query Autofiltering Extended –
On Language and Logic in Search
• https://lucidworks.com/2015/06/06/query-
autofiltering-extended-language-logic-search/
• If you've got metadata, use (autofilter) it. If you've
got known multi-word phrases, use them.
• Language usage understanding of AND vs. OR
Focusing on Search Quality at
Lucene/Solr Revolution 2015
• https://lucidworks.com/2015/10/19/focusing-on-
search-quality-at-lucenesolr-revolution-2015/
• "Again, the “knowledge base” ... can be the Solr/
Lucene index itself!"
• “On-The-Fly Predictive Analytics” – as we say in
the search quality biz – its ALL about context!
Query Autofiltering IV:
A Novel Approach to NLP
• https://lucidworks.com/2015/11/19/query-
autofiltering-chapter-4-a-novel-approach-to-
natural-language-processing/
• Verbs
• Bob Dylan cover tunes
• Query Introspection: inferring user intent
• POS mapped to query fields
Pivoting to the Query: Using Pivot
Facets to build a Multi-Field
Suggester
• https://lucidworks.com/2016/08/12/pivoting-to-the-
query-using-pivot-facets-to-build-a-multi-field-suggester/
• Pivot facets: "Think of it as a way of generating a facet
value “taxonomy” – on the fly."
• Facet Phrases
• Once we commit to building a special Solr collection (also
known as a ‘sidecar’ collection) just for typeahead, there
are other powerful search features that we now have to
work with. One of them is contextual metadata. [!!!]
Building a Subject Classifier using
Automatically Discovered Keyword
Clusters, Part I
• https://lucidworks.com/2017/02/28/building-a-
subject-classifier-using-automatically-discovered-
keyword-clusters-part-i/
• subject classifier that uses automatically discovered
key term “clusters” that can then be used to classify
documents
• autophrasing + /terms....
• blah blah relatedness(...) blah blah
Why Facets are Even More
Fascinating than you Might Have
Thought
• https://lucidworks.com/2017/09/22/why-facets-are-even-more-
fascinating-than-you-might-have-thought/
• Context matters!
• Spatial metaphor: N-Dimensional hyperspace
• "Paul McCartney" => "John Lennon"
• contextual usage of first result to boost second
• Facets and UI
• This is “surfin’ the meta-informational universe” that is your Solr collection.
• The Facet Theorem
When Worlds Collide – Artificial
Intelligence Meets Search
• https://lucidworks.com/2018/04/30/when-worlds-collide-artificial-
intelligence-meets-search/
• The Search Loop: questions, answers, then more questions
• Inferring User Intent: NLP, POS, head-tail analysis, directed pattern-
based
• Information Spaces: conceptually near
• Knowledge Spaces and Semantic Reference Frames
• Word Embedded Vectors
• Knowledge Graphs: taxonomies and ontologies
-Ted says
“Sh*t...”
“the Curmudgeon doesn’t dispense
news, he just tells you what
information, new or old sucks or
what pisses him off and then rants
about it. ”
“You may be thinking – "Who’s this
Search Curmudgeon guy? He’s a real
jerk". No argument there.”
“hey IT guys – Buy More Memory for
chrissake! Thanks to Moore’s Law it’s
pretty cheap now so don’t be such a
tight-ass”
“And the role of DBA will likely be
staffed by curmudgeons like me – so
be nice to them – they can save your
ass. We’ve seen our share of techno
cliff jumpers – it doesn’t end well.”
“what we old guys know is that some
of the hot things that you whiz kids
are doing now were done before, i.e.,
`back in the day`. ”
“You are not as smart as you think
you are kiddies – dual quad core, 3
GHz CPUs and 512 GB of RAM can
hide lots of coding sins. ”
“When I was your age sonny, we had
to walk three miles through snow to
submit our box of punch cards … talk
about crappy BAUD rates!)”
“....because in my opinion (notice that
I didn’t say ‘humble’ because that is
one thing that the Curmudgeon is
definitely NOT)...”
“I’m a humanist believe it or not – I
like humans even if they don’t like
me sometimes – I EARNED my
nickname of ‘curmudgeon’ you
know.”
“proper care and feeding of these
"analysis chains" can make you
some serious money – especially you
eCommerce guys”
“You’ve probably gotten tired of me
by now, that’s OK because I’m tired
of me too. Believe me, you don’t have
to live with me – I do.”
Ted on...
• IDOL: "should really be spelled IDLE"
• Fast vs. Solr: "One is named Fast, the other actually is fast"
• Endeca: "what took several hours in Endeca indexed in
about 10 minutes in Solr"
• elidedsearch: "The name of the company is like the material
that is used to hold up my Jockey Shorts (hint, hint)", Fruit-
of-the-Loom Finders, Tightie Whitie Quest, RubberBand
Finders, Brain Splitters, BungeeSeek
-Search Curmudgeon
Big Data: 

“50 foot tall Brent Spiner”
Ted's Big Adventure
• Semantics: bag of things, not bag of words
• synonyms, autophrasing, lemmatization
• "in text search – semantics matter"
• Linguistics: noun phrases, POS, NLP
• Facets
• autofiltering
• The Facet Theorem
• Relatedness
• Knowledge Space, Semantic Reference Frames
• Context matters
The Facet Theorem
• Lemma 1: Similar things tend to occur in similar
contexts
• Lemma 2: Facets are a tool for exploring meta-
informational contexts
•it therefore follows that:
• Theorem: Facets can be used to find similar things.
PubTed
• https://github.com/lucidworks/
• auto-phrase-tokenfilter
• query-autofiltering-component (also SOLR-7539)
• https://github.com/detnavillus/
• multifield_suggester_code
Ted Talk
Ted Talk
Ted Talk
Ted Talk
Ted Talk
Ted Talk

Contenu connexe

Tendances

Tendances (8)

Future of semantic apps
Future of semantic appsFuture of semantic apps
Future of semantic apps
 
Internet101 Presentation
Internet101 PresentationInternet101 Presentation
Internet101 Presentation
 
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
 
Linked Data: The Real Web 2.0 (from 2008)
Linked Data: The Real Web 2.0 (from 2008)Linked Data: The Real Web 2.0 (from 2008)
Linked Data: The Real Web 2.0 (from 2008)
 
Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)
 
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
 
2018 GIS in Development: Semantic Web
2018 GIS in Development: Semantic Web2018 GIS in Development: Semantic Web
2018 GIS in Development: Semantic Web
 
Basics of Web Research for ELA 10
Basics of Web Research for ELA 10Basics of Web Research for ELA 10
Basics of Web Research for ELA 10
 

Similaire à Ted Talk

COSC 111 Research Fall 2012
COSC 111 Research Fall 2012COSC 111 Research Fall 2012
COSC 111 Research Fall 2012
Laksamee Putnam
 
TSEM Cooper Fall 2012 Session 2
TSEM Cooper Fall 2012 Session 2TSEM Cooper Fall 2012 Session 2
TSEM Cooper Fall 2012 Session 2
Laksamee Putnam
 

Similaire à Ted Talk (20)

Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 
Preservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanitiesPreservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanities
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
COSC 111 Research Fall 2012
COSC 111 Research Fall 2012COSC 111 Research Fall 2012
COSC 111 Research Fall 2012
 
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkThe Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
 
Looking into the future with web media analytics marshall sponder - montreal...
Looking into the future with web media analytics  marshall sponder - montreal...Looking into the future with web media analytics  marshall sponder - montreal...
Looking into the future with web media analytics marshall sponder - montreal...
 
Ubiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil TwinUbiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil Twin
 
Can you Cope
Can you CopeCan you Cope
Can you Cope
 
Library Linked Data
Library Linked DataLibrary Linked Data
Library Linked Data
 
Taming Text
Taming TextTaming Text
Taming Text
 
Connecting the Dots
Connecting the DotsConnecting the Dots
Connecting the Dots
 
Information Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based ResearchInformation Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based Research
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
TSEM Cooper Fall 2012 Session 2
TSEM Cooper Fall 2012 Session 2TSEM Cooper Fall 2012 Session 2
TSEM Cooper Fall 2012 Session 2
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
 
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
 
Google Machine Learning Algorithms and SEO
Google Machine Learning Algorithms and SEOGoogle Machine Learning Algorithms and SEO
Google Machine Learning Algorithms and SEO
 
Lesson 2 network and the internet
Lesson 2 network and the internetLesson 2 network and the internet
Lesson 2 network and the internet
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 

Plus de Erik Hatcher

Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
Erik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
Erik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
 

Plus de Erik Hatcher (20)

Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Solr 4
Solr 4Solr 4
Solr 4
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 

Dernier

Dernier (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Ted Talk

  • 2. Ted Sullivan (Well Before Back in the Day - 2018)
  • 3. - Ted Sullivan, PhD “(old Phuddy Duddy)” “Senior (very much so I’m afraid)
 Solutions (I hope)
 Architect (and sometime plumber)”
  • 4. - Ted Sullivan When is my search app done?
 “How do you get there grasshopper? Add semantic intelligence to the engine!”
  • 5. In his own words... For the past 15 or so years now I have been building search applications, first with Verity K2 for a project with a publishing company H.W. Wilson, then with most of the vendor products in the search space, Ultraseek, Fast, Autonomy, Endeca, Vivissimo, MarkLogic and Exalead. I watched Lucene grow and develop from an interesting little search engine to a major force in the search technology business. Before that, I was building collaborative battlefield planning applications for the U.S. Army and before that I was working on Internet stuff back in the dawn of the Web (well almost - 1994). I have been programming in Java since 1995 and professionally since 1996 or so. I was learning JavaScript when Netscape was still developing it, but only recently have begun to truly understand its power! John Resig and Bear Bibeault's book "Secrets of the JavaScript Ninja" is a must read for anyone that wants to follow this path. Currently, I am struggling up the AngularJS learning curve. Before my work in the web with my friend Jim Spatz at Spatz Computer Graphics, I published some Math games for kids on the original Mac OS, and before that, I did science - Auditory Neuroscience to be more precise. I studied the auditory system of 'fly-by-night' critters, bats and owls first at Washington University in St Louis, then at Caltech and Princeton. I was pretty good at Science but didn't like the writing part as much as I should have. I had much more fun writing code (C, FORTRAN and PDP 8/11 assembler). Currently, I am enjoying becoming part of the Open Source Revolution working at Lucidworks. Back in 1995 when Linux came out, I had a bet with my boss Jim Spatz about its future - I'm happy to say now that I lost that bet. I would aspire to be an Open Source evangelist but there are enough of those already. I'll settle for Solr Evangelist. I'll settle for Solr Evangelist.
  • 6. The Search Curmudgeon • Learned • Wise • Pragmatic • Caring
  • 7. Random Rants from the Search Curmudgeon • https://lucidworks.com/2015/03/09/random- rants-search-curmudgeon/ • Search vs. Information Access
  • 8. Data Science for Dummies • https://lucidworks.com/2016/09/06/data- science-for-dummies/ • "A conditional probability is like the probability that you are a moron if you text while driving (pretty high it turns out – and would be a good source of Darwin awards except for the innocent people that also suffer from this lunacy.)"
  • 9. The Twilight of the Vengine Gods (Die Göttervenginedämmerung) or Die Hard with A Vengines!!! •  https://lucidworks.com/2016/10/18/the- twilight-of-the-vengine-gods-die- gottervenginedammerung/ • "The Curmudgeon doesn’t dispense news, he just tells you what information, new or old sucks or what pisses him off and then rants about it. "
  • 10. Where did all the Librarians go? • https://lucidworks.com/2017/11/21/where-did- all-the-librarians-go/ • "You’ve probably gotten tired of me by now, that’s OK because I’m tired of me too."
  • 11. Search Legacy • Blogs: as Search Curmudgeon and himself • Lucidworks: heavy duty implementations • Techniques: autophrasing and query autofiltering • Presentations: Revolutions and inaugural Haystack
  • 12. Automatic Phrase Tokenization: Improving Lucene Search Precision by More Precise Linguistic Analysis • https://lucidworks.com/2014/07/02/automatic- phrase-tokenization-improving-lucene-search- precision-by-more-precise-linguistic-analysis/ • Takeaway: moving from bag of words towards bag of things
  • 13. Solution for Multi-term Synonyms in Lucene/Solr Using the Auto Phrasing TokenFilter • https://lucidworks.com/2014/07/12/solution-for- multi-term-synonyms-in-lucenesolr-using-the-auto- phrasing-tokenfilter/ • LUCENE-2605 & Friends resolved over two years later • split on whitespace = false
  • 14. The Well Tempered Search Application – Prelude • https://lucidworks.com/2015/01/27/well-tempered-search-application- prelude/ • Semantic Search, linguistics, context • Best Bets (landing pages / rules) • Synonyms, stemming, lemmatization, taxonomy, ontology, machine learning / classification, NLP/AI
  • 15. The Well Tempered Search Application – Fugue • https://lucidworks.com/2015/02/03/well-tempered-search-application-fugue/ • autophrasing • "red sofa" problem • Takeaway: ahead of its time (evolving into Solr Text Tagger and query rewriting) • "seed crystals of knowledge": SME tagging
  • 16. Introducing Query Autofiltering • https://lucidworks.com/2015/02/17/introducing-query-autofiltering/ • "autotagging of the incoming query where the knowledge source is the search index itself" • we already have the information that we need to “do the right thing” we just don’t use it • "Another approach that was suggested by Erik Hatcher, is to have a separate collection that is specialized as a knowledge store and query it to get the categories with which to autofilter on the content collection." • The key is that in both cases, we are using the search index itself as a knowledge source that we can use for intelligent query introspection and thus powerful inferential search!!
  • 17. Thoughts on 
 “Search vs. Discovery” • https://lucidworks.com/2015/03/02/thoughts-search- vs-discovery/ • "findability", facets, aboutness, relatedness • "However if a document is not appropriately tagged, it may become invisible..."; Data quality really matters here! • Auto classification and manual subject matter expert tagging • Visualization, search driven analytics
  • 18. Query Autofiltering Revisited – Lets be more precise!!! • https://lucidworks.com/2015/05/13/query-autofiltering- revisited-can-precise/ • "blue red lion socks"
  • 19. Query Autofiltering Extended – On Language and Logic in Search • https://lucidworks.com/2015/06/06/query- autofiltering-extended-language-logic-search/ • If you've got metadata, use (autofilter) it. If you've got known multi-word phrases, use them. • Language usage understanding of AND vs. OR
  • 20. Focusing on Search Quality at Lucene/Solr Revolution 2015 • https://lucidworks.com/2015/10/19/focusing-on- search-quality-at-lucenesolr-revolution-2015/ • "Again, the “knowledge base” ... can be the Solr/ Lucene index itself!" • “On-The-Fly Predictive Analytics” – as we say in the search quality biz – its ALL about context!
  • 21. Query Autofiltering IV: A Novel Approach to NLP • https://lucidworks.com/2015/11/19/query- autofiltering-chapter-4-a-novel-approach-to- natural-language-processing/ • Verbs • Bob Dylan cover tunes • Query Introspection: inferring user intent • POS mapped to query fields
  • 22. Pivoting to the Query: Using Pivot Facets to build a Multi-Field Suggester • https://lucidworks.com/2016/08/12/pivoting-to-the- query-using-pivot-facets-to-build-a-multi-field-suggester/ • Pivot facets: "Think of it as a way of generating a facet value “taxonomy” – on the fly." • Facet Phrases • Once we commit to building a special Solr collection (also known as a ‘sidecar’ collection) just for typeahead, there are other powerful search features that we now have to work with. One of them is contextual metadata. [!!!]
  • 23. Building a Subject Classifier using Automatically Discovered Keyword Clusters, Part I • https://lucidworks.com/2017/02/28/building-a- subject-classifier-using-automatically-discovered- keyword-clusters-part-i/ • subject classifier that uses automatically discovered key term “clusters” that can then be used to classify documents • autophrasing + /terms.... • blah blah relatedness(...) blah blah
  • 24. Why Facets are Even More Fascinating than you Might Have Thought • https://lucidworks.com/2017/09/22/why-facets-are-even-more- fascinating-than-you-might-have-thought/ • Context matters! • Spatial metaphor: N-Dimensional hyperspace • "Paul McCartney" => "John Lennon" • contextual usage of first result to boost second • Facets and UI • This is “surfin’ the meta-informational universe” that is your Solr collection. • The Facet Theorem
  • 25. When Worlds Collide – Artificial Intelligence Meets Search • https://lucidworks.com/2018/04/30/when-worlds-collide-artificial- intelligence-meets-search/ • The Search Loop: questions, answers, then more questions • Inferring User Intent: NLP, POS, head-tail analysis, directed pattern- based • Information Spaces: conceptually near • Knowledge Spaces and Semantic Reference Frames • Word Embedded Vectors • Knowledge Graphs: taxonomies and ontologies
  • 27. “the Curmudgeon doesn’t dispense news, he just tells you what information, new or old sucks or what pisses him off and then rants about it. ”
  • 28. “You may be thinking – "Who’s this Search Curmudgeon guy? He’s a real jerk". No argument there.”
  • 29. “hey IT guys – Buy More Memory for chrissake! Thanks to Moore’s Law it’s pretty cheap now so don’t be such a tight-ass”
  • 30. “And the role of DBA will likely be staffed by curmudgeons like me – so be nice to them – they can save your ass. We’ve seen our share of techno cliff jumpers – it doesn’t end well.”
  • 31. “what we old guys know is that some of the hot things that you whiz kids are doing now were done before, i.e., `back in the day`. ”
  • 32. “You are not as smart as you think you are kiddies – dual quad core, 3 GHz CPUs and 512 GB of RAM can hide lots of coding sins. ”
  • 33. “When I was your age sonny, we had to walk three miles through snow to submit our box of punch cards … talk about crappy BAUD rates!)”
  • 34. “....because in my opinion (notice that I didn’t say ‘humble’ because that is one thing that the Curmudgeon is definitely NOT)...”
  • 35. “I’m a humanist believe it or not – I like humans even if they don’t like me sometimes – I EARNED my nickname of ‘curmudgeon’ you know.”
  • 36. “proper care and feeding of these "analysis chains" can make you some serious money – especially you eCommerce guys”
  • 37. “You’ve probably gotten tired of me by now, that’s OK because I’m tired of me too. Believe me, you don’t have to live with me – I do.”
  • 38. Ted on... • IDOL: "should really be spelled IDLE" • Fast vs. Solr: "One is named Fast, the other actually is fast" • Endeca: "what took several hours in Endeca indexed in about 10 minutes in Solr" • elidedsearch: "The name of the company is like the material that is used to hold up my Jockey Shorts (hint, hint)", Fruit- of-the-Loom Finders, Tightie Whitie Quest, RubberBand Finders, Brain Splitters, BungeeSeek
  • 39. -Search Curmudgeon Big Data: 
 “50 foot tall Brent Spiner”
  • 40. Ted's Big Adventure • Semantics: bag of things, not bag of words • synonyms, autophrasing, lemmatization • "in text search – semantics matter" • Linguistics: noun phrases, POS, NLP • Facets • autofiltering • The Facet Theorem • Relatedness • Knowledge Space, Semantic Reference Frames • Context matters
  • 41. The Facet Theorem • Lemma 1: Similar things tend to occur in similar contexts • Lemma 2: Facets are a tool for exploring meta- informational contexts •it therefore follows that: • Theorem: Facets can be used to find similar things.
  • 42. PubTed • https://github.com/lucidworks/ • auto-phrase-tokenfilter • query-autofiltering-component (also SOLR-7539) • https://github.com/detnavillus/ • multifield_suggester_code