SlideShare une entreprise Scribd logo
Search Intelligence &
MarkLogic Search API
MarkLogic World 2012
Will Thompson
wthompson@jonesmcclure.com
Search API Resources
• 5-minute Guide to the Search API
• MarkLogic Search Developer's Guide
• developer.marklogic.com
• MarkMail.org
• MarkLogic Developer Listserv
Code
Github:
https://github.com/wthoolihan/MLUC-2012-Examples
Search Intelligence
Search Intelligence
Search Intelligence
• Get the most out of our XML in search
– Approach 1: GUI
Search Intelligence
• Get the most out of our XML in search
– Approach 1: GUI
Search Intelligence
• Get the most out of our XML in search
– Approach 2: Syntax
Search Intelligence
• Get the most out of our XML in search
– Approach 2: Syntax
Search Intelligence
• Get the most out of our XML in search
– Approach 3: Facets
Search Intelligence
• Get the most out of our XML in search
– Approach 3:
Facets, constraints, filters
Search Intelligence
• Get the most out of our XML in search
– Infer (Search Intelligence)
Enrich Your Query!
• Infer
– Use knowledge about the user
– Look for meaning in search terms
• Enrich
– Translate into more complex query
– Gain speed, accuracy
Enrich Your Query!
• Strategies
– Custom term handling
• Works well for single term transformations
• See: http://developer.marklogic.com/try/ninja/page13
– Roll your own parser
• A lot of work (see Michael Blakeley’s xqysp)
– Work between parse and search steps
Search API Overview
• The Search API is an XQuery library module designed to
simplify creating search applications:
o Parser
o Constraints
o Faceting
o Snippets
• High performance, scalability
• Extensible
Search API Extensibility
• Search API provides several points to hook in
• Hooks are defined in Search API options XML node
o Custom constraints
o Custom grammar
o Custom snippets
o Custom term handling
o Search operators
Search API Basics
• Search API module:
• Main entry point: search:search()
import module namespace search = "http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
• parses $qtext with given $options
• executes search
• returns <search:response>
o set of <search:result>s
o facets
o snippets
o metrics and other info
Search API Basics
• Search API
options:
Search API Extensibility
• Snippet:
• Constraint:
Search API Extensibility
• Term handler:
• Parser:
let $custom-parser-output :=
my:parse($qtext)
search:resolve(
$custom-parser-output,
$options
)
Search API Basics
• Search API parser:
• Execute search:
• 1st half of search:search()
• returns annotated cts:query XML
• 2nd half of search:search()
• accepts cts:query XML as input
search:parse() Strategy
1. Call search:parse()
2. Analyze and enrich the query XML
3. Call search:resolve()
Our Use Case
• O’Connor’s Online
– Search portal built on MarkLogic
– Legal rules and commentaries content
– Problem
• Users will enter citation numbers, abbreviations, etc. expecting
complete results
• Text editorial content follows different conventions
– Solution
• Detect special cases pre-search and enrich query
Example: detect year
• Content:
– MarkLogic database of news/op-ed articles
• Organized into year directories:
/content/1990
/content/1991
/content/1992
...
/content/2012
• Year is in directory structure, not article text
– But users will still include year in search terms
How to transform query?
• Recursive typeswitch
(function mapping on):
do-stuff-here($q)
Example: detect year
Example: detect year
let $terms := "1996 United States Olympics"
return local:detect-year(search:parse($terms))
Example: detect year
• Strategy depends on your content model
• Other possibilities
– date detection
– date ranges
– locations
– etc.
search:parse() Strategy
• Weakness
– Limited to single word token
• Similar to custom term handling
• What about multiple tokens?
– Analyze querystring text directly using regex
• Dangerous
– Transform cts:query XML into intermediate form
• Preserve Boolean logic & grouping
• Preserve phrases
• Preserve constraints
Building Intermediate Query
• The hack
– Basically, undoing some of the parser's work
– Text "run" concept
• Similar to WordprocessingML
Building Intermediate Query
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
• Content:
– Same MarkLogic database of news/op-ed articles from
detect-year() example
• Query:
– Same as before: "1996 United States Olypmics"
– Start with the search:parse()output
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
1. Flatten query
– remove implicit and-queries from search:parse() output:
1. Flatten query
– XML should look more like cts:query string
representation:
Example: multi-word thesaurus
cts:and-query(
(cts:word-query("1996", "lang=en", 1),
cts:word-query("United", "lang=en", 1),
cts:word-query("States", "lang=en", 1),
cts:word-query("Olympics", "lang=en", 1)),
())
1. Flatten query
• Typeswitch on
cts:and-query:
1. Check and-queries for
parent and-query
2. Remove the nested
ones, copy through
anything else
Example: multi-word thesaurus
Example: multi-word thesaurus
1. Flatten query
– Typeswitch function output:
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
2. Join sibling words in <run>:
• Typeswitch on cts:word-query:
1. Ignore phrases
2. Delete if query is
not the first.
3. Take first
word-query in
sequence and
join with its
following siblings
into a <run>
2. Join sibling words in <run>:
• Input:
– search:parse("1996 United States Olympics")/local:unnest-
ands(.)/local:create-runs(.)
• Output:
Example: multi-word thesaurus
2. Join sibling words in <run>:
• Input:
– search:parse("1996 (sprint OR marathon) United States
Olympics")/local:unnest-ands(.)/local:create-runs(.)
• Output:
Example: multi-word thesaurus
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
3. Transform <run>s:
1. Store terms in thesaurus
2. Build cts:or-query of thesaurus terms
3. Using cts:or-query of terms, cts:highlight() <run>s,
and replace with thesaurus synonyms
3. Transform <run>s:
1. store terms in
thesaurus
Example: multi-word thesaurus
3. Transform <run>s:
2. build cts:or-query of thesaurus terms:
Example: multi-word thesaurus
3. Transform <run>s:
3. replace matches with synonyms:
– cts:highlight() - powerful cts:query-based find/replace
»
»
Example: multi-word thesaurus
3. Transform <run>s:
3. replace matches with synonyms:
Example: multi-word thesaurus
3. Transform <run>s:
Input:
Example: multi-word thesaurus
let $q-thsr :=
cts:or-query(
doc("thesaurus.xml")
//thsr:entry/thsr:term/cts:word-query(string(.)))
)
let $q-runs :=
search:parse("1996 United States Olympics")
/local:unnest-ands(.)/local:create-runs(.)
return local:thsr-expand($runs, $q-thsr)
3. Transform <run>s:
Output:
Example: multi-word thesaurus
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
4. Convert <run>s back to word queries
– Typeswitch:
Example: multi-word thesaurus
4. Convert <run>s back to word queries
Input:
Example: multi-word thesaurus
let $q-thsr :=
cts:or-query(
doc("thesaurus.xml")
//thsr:entry/thsr:term/cts:word-query(string(.)))
)
let $runs := search:parse("1996 United States Olympics")
/local:unnest-ands(.)/local:create-runs(.)
let $expanded := local:thsr-expand($runs, $q-thsr)
return local:resolve-runs($expanded)
4. Convert <run>s back to word queries
Output:
Example: multi-word thesaurus
Combining Examples
local:thsr-expand-runs($runs, $q-thsr)
/local:resolve-runs($expanded)/local:detect-year($runs)
Enrich Your Query!
• Takeaway
1. No added GUI
2. Didn't ask the user for additional input
3. Able to build more robust query before
executing search
• Many potential applications:
– Ad-hoc weighting:
Search API Hacking
local:q-add-weights(
search:parse("bananas"),
(<element ns="$ns" name="p" weight="1"/>,
<element ns="$ns" name="b" weight="2"/>,
<element ns="$ns" name="title" weight="3.5"/>)
)
• Many potential applications:
– Automatic spell correction:
Search API Hacking
• Many potential applications:
– Detect entities
• Transform text into element-based query
• Less false positives and exclusions
• Leverage indexes:
Search API Hacking
"New York Times"
Search API Hacking
• Other ideas
– Regex unparsed query string
• apply constraints, operators, etc as configured in Search API based on key
words/patterns
– Custom term handler
• single-term transformations
– Combine with data enrichment on ingestion
• MarkLogic Entity Framework
• Linguistic processing
Hazards
• Chaos
– Daisy chained transformations can have unintended
consequences
– Performance
• Pre-search transformations need to be fast
• make sure to leverage indexes as much as possible
• Larger queries do take longer
Questions

Contenu connexe

Similaire à Search Intelligence & MarkLogic Search API

SURE_2014 Poster 2.0
SURE_2014 Poster 2.0SURE_2014 Poster 2.0
SURE_2014 Poster 2.0
Alex Sumner
 
Make Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 MinutesMake Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 Minutes
UCLA Social Sciences Computing
 
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra SoniSiteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Jitendra Soni
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Trey Grainger
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
DataArt
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
Syahmi Afiq Nizam
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
Asad Abbas
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
Andrii Soldatenko
 
SURE Research Report
SURE Research ReportSURE Research Report
SURE Research Report
Alex Sumner
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
Saumitra Srivastav
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Sourcesense
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
lucenerevolution
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic
 
Sumo Logic QuickStart Webinar
Sumo Logic QuickStart WebinarSumo Logic QuickStart Webinar
Sumo Logic QuickStart Webinar
Sumo Logic
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
Kais Hassan, PhD
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
Alexander Tokarev
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
Access Innovations, Inc.
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Alexandre Rafalovitch
 

Similaire à Search Intelligence & MarkLogic Search API (20)

SURE_2014 Poster 2.0
SURE_2014 Poster 2.0SURE_2014 Poster 2.0
SURE_2014 Poster 2.0
 
Make Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 MinutesMake Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 Minutes
 
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra SoniSiteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
SURE Research Report
SURE Research ReportSURE Research Report
SURE Research Report
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced Analytics
 
Sumo Logic QuickStart Webinar
Sumo Logic QuickStart WebinarSumo Logic QuickStart Webinar
Sumo Logic QuickStart Webinar
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 

Dernier

Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
TaghreedAltamimi
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
VALiNTRY360
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
YousufSait3
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 

Dernier (20)

Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 

Search Intelligence & MarkLogic Search API

  • 1. Search Intelligence & MarkLogic Search API MarkLogic World 2012 Will Thompson wthompson@jonesmcclure.com
  • 2. Search API Resources • 5-minute Guide to the Search API • MarkLogic Search Developer's Guide • developer.marklogic.com • MarkMail.org • MarkLogic Developer Listserv
  • 6. Search Intelligence • Get the most out of our XML in search – Approach 1: GUI
  • 7. Search Intelligence • Get the most out of our XML in search – Approach 1: GUI
  • 8. Search Intelligence • Get the most out of our XML in search – Approach 2: Syntax
  • 9. Search Intelligence • Get the most out of our XML in search – Approach 2: Syntax
  • 10. Search Intelligence • Get the most out of our XML in search – Approach 3: Facets
  • 11. Search Intelligence • Get the most out of our XML in search – Approach 3: Facets, constraints, filters
  • 12. Search Intelligence • Get the most out of our XML in search – Infer (Search Intelligence)
  • 13. Enrich Your Query! • Infer – Use knowledge about the user – Look for meaning in search terms • Enrich – Translate into more complex query – Gain speed, accuracy
  • 14. Enrich Your Query! • Strategies – Custom term handling • Works well for single term transformations • See: http://developer.marklogic.com/try/ninja/page13 – Roll your own parser • A lot of work (see Michael Blakeley’s xqysp) – Work between parse and search steps
  • 15. Search API Overview • The Search API is an XQuery library module designed to simplify creating search applications: o Parser o Constraints o Faceting o Snippets • High performance, scalability • Extensible
  • 16. Search API Extensibility • Search API provides several points to hook in • Hooks are defined in Search API options XML node o Custom constraints o Custom grammar o Custom snippets o Custom term handling o Search operators
  • 17. Search API Basics • Search API module: • Main entry point: search:search() import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy"; • parses $qtext with given $options • executes search • returns <search:response> o set of <search:result>s o facets o snippets o metrics and other info
  • 18. Search API Basics • Search API options:
  • 19. Search API Extensibility • Snippet: • Constraint:
  • 20. Search API Extensibility • Term handler: • Parser: let $custom-parser-output := my:parse($qtext) search:resolve( $custom-parser-output, $options )
  • 21. Search API Basics • Search API parser: • Execute search: • 1st half of search:search() • returns annotated cts:query XML • 2nd half of search:search() • accepts cts:query XML as input
  • 22. search:parse() Strategy 1. Call search:parse() 2. Analyze and enrich the query XML 3. Call search:resolve()
  • 23. Our Use Case • O’Connor’s Online – Search portal built on MarkLogic – Legal rules and commentaries content – Problem • Users will enter citation numbers, abbreviations, etc. expecting complete results • Text editorial content follows different conventions – Solution • Detect special cases pre-search and enrich query
  • 24. Example: detect year • Content: – MarkLogic database of news/op-ed articles • Organized into year directories: /content/1990 /content/1991 /content/1992 ... /content/2012 • Year is in directory structure, not article text – But users will still include year in search terms
  • 25. How to transform query? • Recursive typeswitch (function mapping on): do-stuff-here($q)
  • 27. Example: detect year let $terms := "1996 United States Olympics" return local:detect-year(search:parse($terms))
  • 28. Example: detect year • Strategy depends on your content model • Other possibilities – date detection – date ranges – locations – etc.
  • 29. search:parse() Strategy • Weakness – Limited to single word token • Similar to custom term handling • What about multiple tokens? – Analyze querystring text directly using regex • Dangerous – Transform cts:query XML into intermediate form • Preserve Boolean logic & grouping • Preserve phrases • Preserve constraints
  • 30. Building Intermediate Query • The hack – Basically, undoing some of the parser's work – Text "run" concept • Similar to WordprocessingML
  • 31. Building Intermediate Query • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 32. Example: multi-word thesaurus • Content: – Same MarkLogic database of news/op-ed articles from detect-year() example • Query: – Same as before: "1996 United States Olypmics" – Start with the search:parse()output
  • 33. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 34. Example: multi-word thesaurus 1. Flatten query – remove implicit and-queries from search:parse() output:
  • 35. 1. Flatten query – XML should look more like cts:query string representation: Example: multi-word thesaurus cts:and-query( (cts:word-query("1996", "lang=en", 1), cts:word-query("United", "lang=en", 1), cts:word-query("States", "lang=en", 1), cts:word-query("Olympics", "lang=en", 1)), ())
  • 36. 1. Flatten query • Typeswitch on cts:and-query: 1. Check and-queries for parent and-query 2. Remove the nested ones, copy through anything else Example: multi-word thesaurus
  • 37. Example: multi-word thesaurus 1. Flatten query – Typeswitch function output:
  • 38. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 39. Example: multi-word thesaurus 2. Join sibling words in <run>: • Typeswitch on cts:word-query: 1. Ignore phrases 2. Delete if query is not the first. 3. Take first word-query in sequence and join with its following siblings into a <run>
  • 40. 2. Join sibling words in <run>: • Input: – search:parse("1996 United States Olympics")/local:unnest- ands(.)/local:create-runs(.) • Output: Example: multi-word thesaurus
  • 41. 2. Join sibling words in <run>: • Input: – search:parse("1996 (sprint OR marathon) United States Olympics")/local:unnest-ands(.)/local:create-runs(.) • Output: Example: multi-word thesaurus
  • 42. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 43. Example: multi-word thesaurus 3. Transform <run>s: 1. Store terms in thesaurus 2. Build cts:or-query of thesaurus terms 3. Using cts:or-query of terms, cts:highlight() <run>s, and replace with thesaurus synonyms
  • 44. 3. Transform <run>s: 1. store terms in thesaurus Example: multi-word thesaurus
  • 45. 3. Transform <run>s: 2. build cts:or-query of thesaurus terms: Example: multi-word thesaurus
  • 46. 3. Transform <run>s: 3. replace matches with synonyms: – cts:highlight() - powerful cts:query-based find/replace » » Example: multi-word thesaurus
  • 47. 3. Transform <run>s: 3. replace matches with synonyms: Example: multi-word thesaurus
  • 48. 3. Transform <run>s: Input: Example: multi-word thesaurus let $q-thsr := cts:or-query( doc("thesaurus.xml") //thsr:entry/thsr:term/cts:word-query(string(.))) ) let $q-runs := search:parse("1996 United States Olympics") /local:unnest-ands(.)/local:create-runs(.) return local:thsr-expand($runs, $q-thsr)
  • 50. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 51. 4. Convert <run>s back to word queries – Typeswitch: Example: multi-word thesaurus
  • 52. 4. Convert <run>s back to word queries Input: Example: multi-word thesaurus let $q-thsr := cts:or-query( doc("thesaurus.xml") //thsr:entry/thsr:term/cts:word-query(string(.))) ) let $runs := search:parse("1996 United States Olympics") /local:unnest-ands(.)/local:create-runs(.) let $expanded := local:thsr-expand($runs, $q-thsr) return local:resolve-runs($expanded)
  • 53. 4. Convert <run>s back to word queries Output: Example: multi-word thesaurus
  • 55. Enrich Your Query! • Takeaway 1. No added GUI 2. Didn't ask the user for additional input 3. Able to build more robust query before executing search
  • 56. • Many potential applications: – Ad-hoc weighting: Search API Hacking local:q-add-weights( search:parse("bananas"), (<element ns="$ns" name="p" weight="1"/>, <element ns="$ns" name="b" weight="2"/>, <element ns="$ns" name="title" weight="3.5"/>) )
  • 57. • Many potential applications: – Automatic spell correction: Search API Hacking
  • 58. • Many potential applications: – Detect entities • Transform text into element-based query • Less false positives and exclusions • Leverage indexes: Search API Hacking "New York Times"
  • 59. Search API Hacking • Other ideas – Regex unparsed query string • apply constraints, operators, etc as configured in Search API based on key words/patterns – Custom term handler • single-term transformations – Combine with data enrichment on ingestion • MarkLogic Entity Framework • Linguistic processing
  • 60. Hazards • Chaos – Daisy chained transformations can have unintended consequences – Performance • Pre-search transformations need to be fast • make sure to leverage indexes as much as possible • Larger queries do take longer