SlideShare une entreprise Scribd logo
1  sur  27
Advanced Query Parsing Techniques
Aruna Kumar Pamulapati (Arun)
Technical Consultant
Search Technologies Overview
Formed June 2005
Over 100 employees and growing
Over 500 customers worldwide
Presence in US, Latin America, UK & Germany
Deep enterprise search expertise
Consistent revenue growth and profitability
Search Engine Independent

2

The expert in the search space
Lucene Relevancy: Simple Operators
term(A)  TF(A) * IDF(A)
Implemented with DefaultSimilarity / TermQuery
TF(A) = sqrt(termInDocCount)
IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0

and(A,B)  A * B
Implemented with BooleanQuery()

or(A, B)  A + B
Implemented with BooleanQuery()

max(A, B)  max(A, B)
Implemented with DisjunctionMaxQuery()

3

The expert in the search space
Simple Operators - Example
0.3 * 0.9 = 0.27
and

0.1 + 0.2 = 0.30

or

max(0, 0.9) = 0.90

max

george

martha

washington

custis

0.10

0.20

0.60

0.90

4

The expert in the search space
Less Used Operators
boost(f, A)  (A * f)
Implemented with Query.setBoost(f)

constant(f, A)  if(A) then f else 0.0
Implemented with ConstantScoreQuery()

boostPlus(A, B)  if(A) then (A + B) else 0.0
Implemented with BooleanQuery()

boostMul(f, A, B)  if(B) then (A * f) else A
Implemented with BoostingQuery()

5

The expert in the search space
Problem: Need for More Flexibility
Difficult / impossible to use all operators
Many not available in standard query parsers

Complex expressions = string manipulation
This is messy

Query construction is in the application layer
Your UI programmer is creating query expressions?
Seriously?

Hard to create and use new operators
Requires modifying query parsers - yuck
6

The expert in the search space
Query Processing Language

Solr
User
Interface

QPL
Engine

Search

QPL
Script

7

The expert in the search space
Introducing: QPL
Query Processing Language
Domain Specific Language for Constructing Queries
Built on Groovy
https://wiki.searchtechnologies.com/index.php/QPL_Home_Page

Solr Plug-Ins
Query Parser
Search Component

“The 4GL for Text Search Query Expressions”
Server-side Solr Access
Cores, Analyzers, Embedded Search, Results XML

8

The expert in the search space
Solr Plug-Ins

9

The expert in the search space
QPL Configuration – solrconfig.xml
Query Parser Configuration:
<queryParser name="qpl"
class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin">
<str name="scriptFile">parser.qpl</str>
<str name="defaultField">text</str>
</queryParser>

Search Component Configuration:
<searchComponent name="qplSearchFirst"
class="com.searchtechnologies.qpl.solr.QPLSearchComponent">
<str name="scriptFile">search.qpl</str>
<str name="defaultField">text</str>
<str name="isProcessScript">false</str>
</searchComponent>

10

The expert in the search space
QPL Example #1
Tokenize:
myTerms = solr.tokenize(query);
Phrase Query:
phraseQ = phrase(myTerms);
And Query:
andQ = and(myTerms);
Or Query:
orQ = (myTerms.size() <= 2) ? null :
orMin( (myTerms.size()+1)/2, myTerms);

Put It All Together:
return phraseQ^3.0 | andQ^2.0 | orQ;

11

The expert in the search space
Thesaurus Example #2
Tokenize:
myTerms = solr.tokenize(query);
Load Thesaurus: (cached)
thes = Thesaurus.load("thesaurus.xml")

Thesaurus Expansion:
thesQ = thes.expand(0.8f,
solr.tokenizer("text"), myTerms);
Put It All Together:
Original Query: bathroom humor
return and(thesQ);
[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]

12

The expert in the search space
More Operators
Boolean Query Parser:
pQ = parseQuery("(george or martha) near/5 washington")

Relevancy Ranking Operators:
q1 = boostPlus(query, optionalQ)
q2 = boostMul(0.5, query, optionalQ)
q3 = constant(0.5, query)
Composite Queries:
compQ = and(compositeMax(
["title":1.5, "body":0.8],
"george", "washington"))

13

The expert in the search space
News Feed Use Case
Order
1
2
3
4
5
6
7
8
9

Documents
markets+terms
markets
terms
companies
markets+terms
markets
terms
companies
markets, companies

Date
Today
Today
Today
Today
Yesterday
Yesterday
Yesterday
Yesterday

older

14

The expert in the search space
News Feed Use Case – Step 1
Segments:
markets = split(solr.markets, "s*;s*")
marketsQ = field("markets", or(markets));
Terms:
terms = solr.tokenize(query);
termsQ = field("body",
or(thesaurus.expand(0.9f, terms)))
Companies:
compIds = split(solr.compIds, "s*;s*")
compIdsQ = field("companyIds", or(compIds))

15

The expert in the search space
News Feed Use Case – Step 2
sdf = new SimpleDateFormat("yyyy-MM-dd")
cal = Calendar.getInstance()

Today:
todayDate = sdf.format(c.getTime())
todayQ = field("date_s",todayDate)
Yesterday:
c.add(Calendar.DAY_OF_MONTH, -1)
yesterdayDate = sdf.format(c.getTime())
yesterdayQ = field("date_s",yesterdayDate)

16

The expert in the search space
News Feed Use Case – Step 3
Weighted Subject Queries:
sq1 = constant(4.0, and(marketsQ, termsQ))
sq2 = constant(3.0, marketsQ)
sq3 = constant(2.0, termsQ)
sq4 = constant(1.0, compIdsQ)
subjectQ = max(sq1, sq2, sq3, sq4)
Weighted Time Queries:
tq1 = constant(10.0, todayQ)
tq2 = constant(1.0, yesterdayQ)
timeQ = max(tq1, tq2)
Put it All Together:
recentQ = and(subjectQ, timeQ)
return max(recentQ, or(marketsQ,compIdsQ)^0.01))

17

The expert in the search space
BT RLP Tokenizer Use Case – Step 1
Define field type:
<tokenizer
class="com.basistech.rlp.solr.RLPTokenizerFactory"
rlpContext=“<PATH>rlp-context-bl1.xml"
postAltLemmas="false"
lang="eng"
postPartOfSpeech="false"/>

QPL Expansion:
finalExpandedQuery = transform(queryTerms,
[ TERM:{
ctx -> def btCustomTokens = solr.tokenize("subject_bt", ctx.op.term)
if(btCustomTokens.size()> 1)
return or( term(btCustomTokens[0])^1.5, or(btCustomTokens[1..-1]));
else
return ctx.op;
} ]
);

18

The expert in the search space
BT RLP Tokenizer Use Case – Step 2
Original User Query:

following is "presentation on QPL"

QPL Parsed:
and(and(term(following),term(is)),
phrase(term(presentation),term(on),term(QPL)))

BT Expansion + QPL Transformation :
and(and(or(term(following)^1.5,term(follow)),or(term(is)^1.5,term(b
e))),phrase(term(presentation),term(on),term(QPL)))

19

The expert in the search space
BT RLP Tokenizer Use Case – Step 3
and

and

phrase

or

Following
^1.5

follow

or

is

be

Presentation on QPL

^1.5

20

The expert in the search space
Embedded Search Example #1
qTerms = solr.tokenize(qTerms);

Execute an Embedded Search:
results = solr.search('subjectsCore', or(qTerms), 50)

Create a query from the results:
subjectsQ = or(results*.subjectId)

Put it all together:
return field("title", and(qTerms)) | subjectsQ^0.9;

21

The expert in the search space
Embedded Search Example #2
qTerms = solr.tokenize(qTerms);

Execute an Embedded Search:
results = solr.search('categories', and(qTerms), 10)

Create a Solr named list:
myList = solr.newList();
myList.add("relatedCategories", results*.title);

Add it to the XML response:
solr.addResponse(myList)

22

The expert in the search space
Other Features
Embedded Grouping Queries
Oh yes they did!

Proximity operators
ADJ, NEAR/#, BEFORE/#

Reverse Lemmatizer
Prefers exact matches over variants

Transformer
Applies transformations recursively to query trees

23

The expert in the search space
Query Processing Language
Application
Dev Team

User
Interface

Data as entered
by user

Search Team
Solr
QPL
Engine

QPL
Script

24

Search

Boolean
Query Expression

The expert in the search space
Query Processing Language
RDBMS

Other
Indexes

Thesaurus

Solr
User
Interface

QPL
Engine

Search

QPL
Script

25

The expert in the search space
More on QPL…

http://www.searchtechnologies.com/
query-parsing-language.html

26

The expert in the search space
THANK YOU
Contact: apamulapati@searchtechnologies.com
www.searchtechnologies.com

Contenu connexe

Tendances

Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveJulian Hyde
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat SheetHortonworks
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using OptiqJulian Hyde
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill WorkshopCharles Givre
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
Time Series Analysis Sample Code
Time Series Analysis Sample CodeTime Series Analysis Sample Code
Time Series Analysis Sample CodeAiden Wu, FRM
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...Lucidworks
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Charles Givre
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryIlya Ganelin
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter
 
Working With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingWorking With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingNeo4j
 
Dapper & Dapper.SimpleCRUD
Dapper & Dapper.SimpleCRUDDapper & Dapper.SimpleCRUD
Dapper & Dapper.SimpleCRUDBlank Chen
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design  by Chad Tindel, Solution Architect, 10genSchema Design  by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10genMongoDB
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenPostgresOpen
 

Tendances (20)

Influxdb and time series data
Influxdb and time series dataInfluxdb and time series data
Influxdb and time series data
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using Optiq
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill Workshop
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Time Series Analysis Sample Code
Time Series Analysis Sample CodeTime Series Analysis Sample Code
Time Series Analysis Sample Code
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2
 
Dapper performance
Dapper performanceDapper performance
Dapper performance
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
 
Working With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingWorking With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and Modeling
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Dapper & Dapper.SimpleCRUD
Dapper & Dapper.SimpleCRUDDapper & Dapper.SimpleCRUD
Dapper & Dapper.SimpleCRUD
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design  by Chad Tindel, Solution Architect, 10genSchema Design  by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10gen
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 

En vedette

The things you need to know about SharePoint 2013 Search
The things you need to know about SharePoint 2013 SearchThe things you need to know about SharePoint 2013 Search
The things you need to know about SharePoint 2013 SearchSearch Technologies
 
Enterprise Search Best Practices Webinar 4.2013
Enterprise Search Best Practices Webinar 4.2013Enterprise Search Best Practices Webinar 4.2013
Enterprise Search Best Practices Webinar 4.2013Search Technologies
 
The Evolution of Search and Big Data
The Evolution of Search and Big DataThe Evolution of Search and Big Data
The Evolution of Search and Big DataSearch Technologies
 
Enterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchEnterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchSearch Technologies
 
Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)
Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)
Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)Florian Hopf
 

En vedette (6)

The things you need to know about SharePoint 2013 Search
The things you need to know about SharePoint 2013 SearchThe things you need to know about SharePoint 2013 Search
The things you need to know about SharePoint 2013 Search
 
Enterprise Search Best Practices Webinar 4.2013
Enterprise Search Best Practices Webinar 4.2013Enterprise Search Best Practices Webinar 4.2013
Enterprise Search Best Practices Webinar 4.2013
 
The Evolution of Search and Big Data
The Evolution of Search and Big DataThe Evolution of Search and Big Data
The Evolution of Search and Big Data
 
Wikipedia Cloud Search Webinar
Wikipedia Cloud Search WebinarWikipedia Cloud Search Webinar
Wikipedia Cloud Search Webinar
 
Enterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchEnterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for Search
 
Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)
Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)
Search Evolution - Von Lucene zu Solr und ElasticSearch (Majug 20.06.2013)
 

Similaire à Advanced Query Parsing Techniques

Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniqueslucenerevolution
 
The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180Mahmoud Samir Fayed
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and OptimizationMongoDB
 
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...Lucidworks
 
Drupal for ng_os
Drupal for ng_osDrupal for ng_os
Drupal for ng_osdstuartnz
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Ontico
 
04 data accesstechnologies
04 data accesstechnologies04 data accesstechnologies
04 data accesstechnologiesBat Programmer
 
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreLukas Fittl
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logginglucenerevolution
 
Testing time and concurrency Rx
Testing time and concurrency RxTesting time and concurrency Rx
Testing time and concurrency RxTamir Dresher
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksAlexandre Rafalovitch
 
What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...
What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...
What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...mfrancis
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
 
The Ring programming language version 1.4.1 book - Part 3 of 31
The Ring programming language version 1.4.1 book - Part 3 of 31The Ring programming language version 1.4.1 book - Part 3 of 31
The Ring programming language version 1.4.1 book - Part 3 of 31Mahmoud Samir Fayed
 
NetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportNetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportAnton Arhipov
 
Introduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-finalIntroduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-finalM Malai
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerMydbops
 

Similaire à Advanced Query Parsing Techniques (20)

Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniques
 
The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
Art and Science Come Together When Mastering Relevance Ranking - Tom Burgmans...
 
Rx workshop
Rx workshopRx workshop
Rx workshop
 
Drupal for ng_os
Drupal for ng_osDrupal for ng_os
Drupal for ng_os
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
04 data accesstechnologies
04 data accesstechnologies04 data accesstechnologies
04 data accesstechnologies
 
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & more
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Testing time and concurrency Rx
Testing time and concurrency RxTesting time and concurrency Rx
Testing time and concurrency Rx
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
 
What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...
What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...
What’s cool in the new and updated OSGi specs (DS, Cloud and more) - C Ziegel...
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 
query_tuning.pdf
query_tuning.pdfquery_tuning.pdf
query_tuning.pdf
 
The Ring programming language version 1.4.1 book - Part 3 of 31
The Ring programming language version 1.4.1 book - Part 3 of 31The Ring programming language version 1.4.1 book - Part 3 of 31
The Ring programming language version 1.4.1 book - Part 3 of 31
 
NetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportNetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience Report
 
Introduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-finalIntroduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-final
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizer
 

Dernier

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Dernier (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Advanced Query Parsing Techniques

  • 1. Advanced Query Parsing Techniques Aruna Kumar Pamulapati (Arun) Technical Consultant
  • 2. Search Technologies Overview Formed June 2005 Over 100 employees and growing Over 500 customers worldwide Presence in US, Latin America, UK & Germany Deep enterprise search expertise Consistent revenue growth and profitability Search Engine Independent 2 The expert in the search space
  • 3. Lucene Relevancy: Simple Operators term(A)  TF(A) * IDF(A) Implemented with DefaultSimilarity / TermQuery TF(A) = sqrt(termInDocCount) IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0 and(A,B)  A * B Implemented with BooleanQuery() or(A, B)  A + B Implemented with BooleanQuery() max(A, B)  max(A, B) Implemented with DisjunctionMaxQuery() 3 The expert in the search space
  • 4. Simple Operators - Example 0.3 * 0.9 = 0.27 and 0.1 + 0.2 = 0.30 or max(0, 0.9) = 0.90 max george martha washington custis 0.10 0.20 0.60 0.90 4 The expert in the search space
  • 5. Less Used Operators boost(f, A)  (A * f) Implemented with Query.setBoost(f) constant(f, A)  if(A) then f else 0.0 Implemented with ConstantScoreQuery() boostPlus(A, B)  if(A) then (A + B) else 0.0 Implemented with BooleanQuery() boostMul(f, A, B)  if(B) then (A * f) else A Implemented with BoostingQuery() 5 The expert in the search space
  • 6. Problem: Need for More Flexibility Difficult / impossible to use all operators Many not available in standard query parsers Complex expressions = string manipulation This is messy Query construction is in the application layer Your UI programmer is creating query expressions? Seriously? Hard to create and use new operators Requires modifying query parsers - yuck 6 The expert in the search space
  • 8. Introducing: QPL Query Processing Language Domain Specific Language for Constructing Queries Built on Groovy https://wiki.searchtechnologies.com/index.php/QPL_Home_Page Solr Plug-Ins Query Parser Search Component “The 4GL for Text Search Query Expressions” Server-side Solr Access Cores, Analyzers, Embedded Search, Results XML 8 The expert in the search space
  • 9. Solr Plug-Ins 9 The expert in the search space
  • 10. QPL Configuration – solrconfig.xml Query Parser Configuration: <queryParser name="qpl" class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin"> <str name="scriptFile">parser.qpl</str> <str name="defaultField">text</str> </queryParser> Search Component Configuration: <searchComponent name="qplSearchFirst" class="com.searchtechnologies.qpl.solr.QPLSearchComponent"> <str name="scriptFile">search.qpl</str> <str name="defaultField">text</str> <str name="isProcessScript">false</str> </searchComponent> 10 The expert in the search space
  • 11. QPL Example #1 Tokenize: myTerms = solr.tokenize(query); Phrase Query: phraseQ = phrase(myTerms); And Query: andQ = and(myTerms); Or Query: orQ = (myTerms.size() <= 2) ? null : orMin( (myTerms.size()+1)/2, myTerms); Put It All Together: return phraseQ^3.0 | andQ^2.0 | orQ; 11 The expert in the search space
  • 12. Thesaurus Example #2 Tokenize: myTerms = solr.tokenize(query); Load Thesaurus: (cached) thes = Thesaurus.load("thesaurus.xml") Thesaurus Expansion: thesQ = thes.expand(0.8f, solr.tokenizer("text"), myTerms); Put It All Together: Original Query: bathroom humor return and(thesQ); [or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)] 12 The expert in the search space
  • 13. More Operators Boolean Query Parser: pQ = parseQuery("(george or martha) near/5 washington") Relevancy Ranking Operators: q1 = boostPlus(query, optionalQ) q2 = boostMul(0.5, query, optionalQ) q3 = constant(0.5, query) Composite Queries: compQ = and(compositeMax( ["title":1.5, "body":0.8], "george", "washington")) 13 The expert in the search space
  • 14. News Feed Use Case Order 1 2 3 4 5 6 7 8 9 Documents markets+terms markets terms companies markets+terms markets terms companies markets, companies Date Today Today Today Today Yesterday Yesterday Yesterday Yesterday older 14 The expert in the search space
  • 15. News Feed Use Case – Step 1 Segments: markets = split(solr.markets, "s*;s*") marketsQ = field("markets", or(markets)); Terms: terms = solr.tokenize(query); termsQ = field("body", or(thesaurus.expand(0.9f, terms))) Companies: compIds = split(solr.compIds, "s*;s*") compIdsQ = field("companyIds", or(compIds)) 15 The expert in the search space
  • 16. News Feed Use Case – Step 2 sdf = new SimpleDateFormat("yyyy-MM-dd") cal = Calendar.getInstance() Today: todayDate = sdf.format(c.getTime()) todayQ = field("date_s",todayDate) Yesterday: c.add(Calendar.DAY_OF_MONTH, -1) yesterdayDate = sdf.format(c.getTime()) yesterdayQ = field("date_s",yesterdayDate) 16 The expert in the search space
  • 17. News Feed Use Case – Step 3 Weighted Subject Queries: sq1 = constant(4.0, and(marketsQ, termsQ)) sq2 = constant(3.0, marketsQ) sq3 = constant(2.0, termsQ) sq4 = constant(1.0, compIdsQ) subjectQ = max(sq1, sq2, sq3, sq4) Weighted Time Queries: tq1 = constant(10.0, todayQ) tq2 = constant(1.0, yesterdayQ) timeQ = max(tq1, tq2) Put it All Together: recentQ = and(subjectQ, timeQ) return max(recentQ, or(marketsQ,compIdsQ)^0.01)) 17 The expert in the search space
  • 18. BT RLP Tokenizer Use Case – Step 1 Define field type: <tokenizer class="com.basistech.rlp.solr.RLPTokenizerFactory" rlpContext=“<PATH>rlp-context-bl1.xml" postAltLemmas="false" lang="eng" postPartOfSpeech="false"/> QPL Expansion: finalExpandedQuery = transform(queryTerms, [ TERM:{ ctx -> def btCustomTokens = solr.tokenize("subject_bt", ctx.op.term) if(btCustomTokens.size()> 1) return or( term(btCustomTokens[0])^1.5, or(btCustomTokens[1..-1])); else return ctx.op; } ] ); 18 The expert in the search space
  • 19. BT RLP Tokenizer Use Case – Step 2 Original User Query: following is "presentation on QPL" QPL Parsed: and(and(term(following),term(is)), phrase(term(presentation),term(on),term(QPL))) BT Expansion + QPL Transformation : and(and(or(term(following)^1.5,term(follow)),or(term(is)^1.5,term(b e))),phrase(term(presentation),term(on),term(QPL))) 19 The expert in the search space
  • 20. BT RLP Tokenizer Use Case – Step 3 and and phrase or Following ^1.5 follow or is be Presentation on QPL ^1.5 20 The expert in the search space
  • 21. Embedded Search Example #1 qTerms = solr.tokenize(qTerms); Execute an Embedded Search: results = solr.search('subjectsCore', or(qTerms), 50) Create a query from the results: subjectsQ = or(results*.subjectId) Put it all together: return field("title", and(qTerms)) | subjectsQ^0.9; 21 The expert in the search space
  • 22. Embedded Search Example #2 qTerms = solr.tokenize(qTerms); Execute an Embedded Search: results = solr.search('categories', and(qTerms), 10) Create a Solr named list: myList = solr.newList(); myList.add("relatedCategories", results*.title); Add it to the XML response: solr.addResponse(myList) 22 The expert in the search space
  • 23. Other Features Embedded Grouping Queries Oh yes they did! Proximity operators ADJ, NEAR/#, BEFORE/# Reverse Lemmatizer Prefers exact matches over variants Transformer Applies transformations recursively to query trees 23 The expert in the search space
  • 24. Query Processing Language Application Dev Team User Interface Data as entered by user Search Team Solr QPL Engine QPL Script 24 Search Boolean Query Expression The expert in the search space