SlideShare a Scribd company logo
1 of 179
Download to read offline
Scorers, Collectors and
Custom Queries
Mikhail Khludnev
Custom
Queries
Custom
Queries
Custom
Queries

http://nlp.stanford.edu/IR-book/
Custom
Queries

http://nlp.stanford.edu/IR-book/
Custom
Queries

Match Spotting

http://nlp.stanford.edu/IR-book/
Custom Queries
..hm what for ?
denim dress
qf=STYLE TYPE
denim dress
qf=STYLE TYPE
DisjunctionMaxQuery((
(STYLE:denim OR TYPE:denim) |
(STYLE:dress OR TYPE:dress)
))
denim dress
qf=STYLE TYPE
(

DisjunctionMaxQuery((
STYLE:denim | TYPE:denim ))

)OR(

DisjunctionMaxQuery((
STYLE:dress | TYPE::dress ))

)
Custom
Queries
Inverted Index
T[0] = "it is what it is"
T[1] = "what is it"
T[2] = "it is a banana"
"a":
"banana":
"is":
"it":
"what":

{2}
{2}
{0, 1, 2}
{0, 1, 2}
{0, 1}

T[0] = "it is what it is"
T[1] = "what is it"
T[2] = "it is a banana"
"a":
"banana":
"is":
"it":
"what":

{2}
{2}
{0, 1, 2}
{0, 1, 2}
{0, 1}

term dictionary

postings list
index/_1.tis
"a"
"banana"
"is"
→"t"
"what"

index/_1.frq
{2}
{2}
{0, 1, 2}
{0, 1, 2}
{0, 1}
http://www.lib.rochester.edu/index.cfm?PAGE=489
What is a Scorer?
"a":
"banana":
"is":
"it":
"what":

{2}
{2}
{0, 1, 2}
{0, 1, 2}
{0, 1}
"a":
"banana":
"is":
"it":
"what":

{2}
{2}
{0, 1, 2}
{0, 1, 2}
{0, 1}
"a":
"banana":
"is":
"it":
"what":

{2}
{2}
{0, 1, 2}
{0, 1, 2}
{0, 1}
while(
(doc = nextDoc())!=NO_MORE_DOCS){

println("found "+ doc +
" with score "+score());
}
2783 issues
Note: Weight is omitted for sake of compactness
Custom
Queries

http://nlp.stanford.edu/IR-book/
Doc-at-time search
"a":

{2}

"banana": {2}
"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}

what OR is OR a OR banana
"a":

{2}

"banana": {2}
"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}

what OR is OR a OR banana
"is":

{0, 1, 2}

"what":

{0, 1}

"a":

{2}

"banana": {2}
"it":

{0, 1, 2}
"is":

{0, 1, 2}

"what":

{0, 1}

"a":

{2}

"banana": {2}

collect(0)
score():2

Collector
"is":

{0, 1, 2}

"what":

{0, 1}

"a":

{2}

"banana": {2}
docID×score
0×2
"is":

{0, 1, 2}

"what":

{0, 1}

"a":

{2}

"banana": {2}

collect(1)
score():2

Collector
0×2
"is":

{0, 1, 2}

"what":

{0, 1}

"a":

{2}

"banana": {2}
Collector
0×2
1×2
"is":

{0, 1, 2}

"a":

{2}

"banana": {2}
"what":

{0, 1}

collect(2)
score():3

Collector
0×2
1×2
Term-at-time search
"lorem"
"ipsum"
"dolor"
"sit"
"amet"
"consectetur"
"a":

{2}

"banana": {2}
"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}

what OR is OR a OR banana
"a":

{2}

"banana": {2}
"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}

Accumulator
... 0×1 ... 1×1 ...
"a":

{2}

"banana": {2}
"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}

Accumulator
... 0×2 ... 1×2 ... 2×1 ...
"a":

{2}

"banana": {2}
"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}

Accumulator
... 0×2 ... 1×2 ... 2×2 ...
"a":

{2}

"banana": {2}
"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}

Accumulator
... 0x2 ... 1x2 ... 2x3 ...
"a":

{2}

"banana": {2}
"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}

Accumulator
... 0×2 ... 1×2 ... 2×3 ...

Collector
2×3
0×2
1×2
O(n)

"lorem"
"ipsum"
"dolor"
"sit"
"amet"
"consectetur"

http://nlp.stanford.edu/IR-book/
k

1×9
7×9
2×7
2×5
9×5
6×4
...
...
≤4
...
...

n
http://en.wikipedia.org/wiki/Binary_heap
6×4

log k

9×5 2×4
2×7 7×9 1×9

n
...
...
≤4
...
...
"a":

{2}

"banana": {2}
"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}

p

what OR is OR a OR banana

q
doc at time
complexity

memory

term at time
doc at time
complexity

memory

term at time
O(p + n log k)
"a":

{2}

"banana": {2}

q

"is":

1

{0, 1, 2}

1
2

"what":

{0, 1}

2
doc at time
complexity

memory

term at time

O(p log q + n log k)

O(p + n log k)
doc at time
complexity

memory

term at time

O(p log q + n log k)

O(p + n log k)

q + k
doc at time
complexity

memory

term at time

O(p log q + n log k)

O(p + n log k)

q + k

n
BooleanScorer
org.apache.lucene.search.BooleanScorer
"a":

{2}

"banana":

{2}

"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}
chunk

Hashtable[2]

×1

×1

0

1
org.apache.lucene.search.BooleanScorer
"a":

{2}

"banana":

{2}

"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}
chunk

x2

x2

0

1
org.apache.lucene.search
"a":

{2}

"banana":

{2}

"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}

×2

×2

0

1

Collector
0×2
1×2
org.apache.lucene.search
"a":

{2}

"banana":

{2}

"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}
Collector
0×2
1×2

×1
0

1
org.apache.lucene.search
"a":

{2}

"banana":

{2}

"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}
Collector
0×2
1×2

×2
0

1
org.apache.lucene.search
"a":

{2}

"banana":

{2}

"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}
Collector
0×2
1×2

×3
0

1
org.apache.lucene.search
"a":

{2}

"banana":

{2}

"is":

{0, 1, 2}

"it":

{0, 1, 2}

"what":

{0, 1}

×3
0

1

Collector
2×3
0×2
1×2
Linked Open Hash [2K]

×2
0

×3

×1

×1

×5

×2

1

2

3

4

5

6

7
if (

collector.acceptsDocsOutOfOrder() &&
topScorer &&
required.size() == 0 &&
minNrShouldMatch == 1) {
new BooleanScorer

else

//term-at-time
new BooleanScorer2

//doc-at-time
q=village operations years disaster visit
q=village operations years disaster visit etc
map seventieth peneplains tussock sir
memory character campaign author public
wonder forker middy vocalize enable race
object signal symptom deputy where typhous
rectifiable
polygamous
originally
look
generation ultimately reasonably ratio numb
apposing enroll manhood problem suddenly
definitely corp event material affair diploma
would dimout speech notion engine artist
hotel text field hashed rottener impeding i
cricket virtually valley sunday rock come
observes gallnuts vibrantly prize involve
q=+village +operations +years +disaster +visit
Conjunction
(+, MUST)
"a":

{2,3}

"banana": {2,3}
"is":

{0, 1, 2, 3}

"it":

{0, 1, 3}

"what":

{0, 1, 3}

what AND is AND a AND it
"a":

{2,3}

"banana": {2,3}
"is":

{0, 1, 2, 3}

"it":

{0, 1, 3}

"what":

{0, 1, 3}
"a":

{2,3}

"banana": {2,3}
"is":

{0, 1, 2, 3}

"it":

{0, 1, 3}

"what":

{0, 1, 3}
"a":

{2,3}

"banana": {2,3}
"is":

{0, 1, 2, 3}

"it":

{0, 1, 3}

"what":

{0, 1, 3}
"a":

{2,3}

"banana": {2,3}
"is":

{0, 1, 2, 3}

"it":

{0, 1, 3}

"what":

{0, 1, 3}
"a":

{2,3}

"banana": {2,3}
"is":

{0, 1, 2, 3}

"it":

{0, 1, 3}

"what":

{0, 1, 3}
Collector
3x4
http://www.flickr.com/photos/fatniu/184615348/
Ω(n q + n log k)
Wrap-up
● doc-at-time vs term-at-time
● conjunction and leapfrog
complexity

O(n)

memory

O(const)
Custom
Queries

http://nlp.stanford.edu/IR-book/
Custom Queries
●

Sample Coverage Query

●

Deeply Branched vs Flat

●

minShouldMatch

●

Filtering

●

Performance Problem
silver jeans dress
"silver"

"jeans"

Note: "foo bar" is not a phrase query, just a string

"dress"
silver jeans dress
"silver" "jeans" "dress"
"silver jeans dress"
silver jeans dress
"silver" "jeans" "dress"
"silver jeans dress"
"silver jeans"
"dress"
"silver"
"jeans dress"
silver jeans dress
"silver" "jeans" "dress"
"silver jeans dress"
"silver jeans"
"dress"
"silver"
"jeans dress"
"silver" "dress"
"silver jeans" "jeans"
"silver jeans"
"jeans" "dress"
Note: "foo bar" is not a phrase query, just a string
boolean verifyMatch(){
int sumLength=0;
for(Scorer child:getChildren()){
if(child.docID()==docID()){
TermQuery tq=child.weight.query;
sumLength += tq.term.text.length;
}
}
return sumLength>=expectedLength;
}
Deeply Branched vs Flat
(+"silver jeans" +"dress")
ORmax
(+"silver jeans dress")
ORmax
(+"silver" +(
(+"jeans" +"dress")
ORmax
+"jeans dress"
)
)
ORmax is DisjunctionMaxQuery
(+"silver jeans" +"dress")
ORmax
(+"silver jeans dress")
ORmax
(+"silver" +(
(+"jeans" +"dress")
ORmax
+"jeans dress"
)
)
ORmax is DisjunctionMaxQuery
(+"silver jeans" +"dress")
ORmax
(+"silver jeans dress")
ORmax
(+"silver" +(
(+"jeans" +"dress")
ORmax
+"jeans dress"
)
)
ORmax is DisjunctionMaxQuery
("silver jeans" "dress")
ORmax
("silver jeans dress")
ORmax
("silver" (
("jeans" "dress")
ORmax
"jeans dress"
)
)
ORmax is DisjunctionMaxQuery
+

B:"silver jeans" ORmax
T:"silver jeans" ORmax
S:"silver jeans"

+

B:"dress" ORmax
T:"dress" ORmax
S:"dress"

B - BRAND
T - TYPE
S - STYLE

ORmax
B:"silver jeans dress" ORmax
T:"silver jeans dress" ORmax
S:"silver jeans dress"
ORmax

+

B:"silver" ORmax
T:"silver" ORmax
S:"silver"

+
+

B:"jeans" ORmax
T:"jeans" ORmax
S:"jeans"

+

B:"dress" ORmax
T:"dress" ORmax
S:"dress"

ORmax
B:"jeans dress" ORmax
T:"jeans dress" ORmax
S:"jeans dress"
B:"silver"

T:"silver"

S:"silver"

B:"jeans"

T:"jeans"

S:"jeans"

B:"dress"

T:"dress"

S:"dress"

B:"silver jeans"

T:"silver jeans"

S:"silver jeans"

B:"silver jeans dress"

T:"silver jeans dress"
S:"silver jeans dress"

B:"jeans dress"

T:"jeans dress"

S:"jeans dress"
Steadiness problem
AFAIK 3.x only.
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{2,3, 27,31,..}
{..., 20, 27,32,..}
{..., 30, 31,32,..}
{..., 30,37,..}
3
3 20
3 30 30
{3, 5, 10, 27,32,..}
{1, 3, 7, 10, 27,30,..}
{2,3, 27,31,..}
{..., 20, 27,32,..}
{..., 30, 31,32,..}
{..., 30,37,..}
docID=

3

5
7 20
27 30 30

3.x
minShouldMatch
straight silver jeans

minShouldMatch=2
straight jeans
silver jeans
silver jeans straight
jeans
silver
org.apache.lucene.search.DisjunctionSumScorer
int nextDoc() {
while(true) {
while (subScorers[0].docID() == doc) {
if (subScorers[0].nextDoc() != NO_DOCS) {
heapAdjust(0);
} else {
....
}
}
...
if (nrMatchers >= minimumNrMatchers) {
break;
}
}
return doc;
}
Let’s filter!
btw, what it is?
RANDOM_ACCESS_FILTER_STRATEGY
LEAP_FROG_FILTER_FIRST_STRATEGY
LEAP_FROG_QUERY_FIRST_STRATEGY
QUERY_FIRST_FILTER_STRATEGY
http://localhost:8983/solr/collection1/select
?q=village operations years disaster visit etc map
seventieth peneplains tussock sir memory character
campaign author public wonder forker middy vocalize
enable race object signal symptom deputy where
generation ultimately reasonably ratio numb apposing
enroll manhood problem suddenly definitely corp event
gallnuts vibrantly prize involve explanation module&
qf=text_all&defType=edismax&
http://localhost:8983/solr/collection1/select
?q=village operations years disaster visit etc map
seventieth peneplains tussock sir memory character
campaign author public wonder forker middy vocalize
enable race object signal symptom deputy where
generation ultimately reasonably ratio numb apposing
enroll manhood problem suddenly definitely corp event
gallnuts vibrantly prize involve explanation module&
qf=text_all&defType=edismax&
fq= id:yes_49912894 id:nurse_30134968&
http://localhost:8983/solr/collection1/select
?q=village operations years disaster visit etc map
seventieth peneplains tussock sir memory character
campaign author public wonder forker middy vocalize
enable race object signal symptom deputy where
generation ultimately reasonably ratio numb apposing
enroll manhood problem suddenly definitely corp event
gallnuts vibrantly prize involve explanation module&
qf=text_all&defType=edismax&
fq= id:yes_49912894 id:nurse_30134968&
mm=32&
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{ 20,27,31,..}
mm=3

{ 30,37,..}
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{ 20,27,31,..}
mm=3

{ 30,37,..}
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{ 20,27,31,..}
mm=3

{ 30,37,..}
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{ 20,27,31,..}
mm=3

{ 30,37,..}
{1, 3, 7, 10, 27,30,..}
{3, 5, 10, 27,32,..}
{ 20,27,31,..}
mm=3

{ 30,37,..}
Custom
Queries

Match Spotting

http://nlp.stanford.edu/IR-book/
BRAND:"silver jeans"
BRAND:"alfani"

TYPE:"dress"

TYPE:"dress"

BRAND:"chaloree"

TYPE:"dress"

STYLE:"white"

STYLE:"silver","jeans"
STYLE:"silver"

BRAND:"style&co" TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans"

TYPE:"dress"

STYLE:"black"

BRAND:"silver jeans"

TYPE:"dress"

STYLE:"white"

BRAND:"silver jeans"

TYPE:"jacket"

STYLE: "black"

BRAND:"angie"

TYPE:"dress"

STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans"
BRAND:"dotty"
BRAND:"chaloree"

TYPE:"dress"

TYPE:"dress"

STYLE:"blue"
STYLE:"silver","jeans"

STYLE:"jeans" "dress"
BRAND:"silver jeans"
BRAND:"alfani"

TYPE:"dress"

TYPE:"dress"

BRAND:"chaloree"

TYPE:"dress"

STYLE:"white"

STYLE:"silver","jeans"
STYLE:"silver"

BRAND:"style&co" TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans"

TYPE:"dress"

STYLE:"black"

BRAND:"silver jeans" TYPE:"dress"
silver jeans dress

STYLE:"white"

BRAND:"silver jeans"

STYLE: "black"

BRAND:"angie"

TYPE:"jacket"

TYPE:"dress"

STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"jeans dress" STYLE:"silver"
BRAND:"silver jeans"
BRAND:"dotty"
BRAND:"chaloree"

TYPE:"dress"

TYPE:"dress"

STYLE:"blue"
STYLE:"silver","jeans"

STYLE:"jeans" "dress"
BRAND:"silver jeans"
BRAND:"alfani"

TYPE:"dress" STYLE:"white"

TYPE:"dress"

BRAND:"chaloree"

TYPE:"dress"

STYLE:"silver","jeans"
STYLE:"silver"

BRAND:"style&co" TYPE:"jeans dress"

STYLE:"silver"

BRAND:"silver jeans"

TYPE:"dress"

STYLE:"black"

BRAND:"silver jeans"

TYPE:"dress"

STYLE:"white"

BRAND:"silver jeans"
BRAND:"angie"

TYPE:"jacket"

TYPE:"dress"

STYLE: "black"
STYLE:"silver","jeans"

BRAND:"chaloree" TYPE:"jeans dress"
BRAND:"silver jeans"
BRAND:"dotty"
BRAND:"chaloree"

STYLE:"silver"

TYPE:"dress" STYLE:"blue"

TYPE:"dress"

STYLE:"silver","jeans"

STYLE:"jeans" "dress"
BRAND:"silver jeans"

TYPE:"dress"

TYPE:"dress"

STYLE:"silver","jeans"

TYPE:"jeans dress"
BRAND:"silver jeans"

TYPE:"dress"

BRAND:"silver jeans"

STYLE:"silver"

TYPE:"dress"

TYPE:"dress"

STYLE:"silver","jeans"

TYPE:"jeans dress"
BRAND:"silver jeans"

STYLE:"silver"

TYPE:"dress"

TYPE:"dress"

STYLE:"silver","jeans"
BRAND:"silver jeans"

TYPE:"dress"

TYPE:"dress"

STYLE:"silver","jeans"

TYPE:"jeans dress"
BRAND:"silver jeans"

TYPE:"dress"

BRAND:"silver jeans"

STYLE:"silver"

TYPE:"dress"

TYPE:"dress"

STYLE:"silver","jeans"

TYPE:"jeans dress"
BRAND:"silver jeans"

STYLE:"silver"

TYPE:"dress"

TYPE:"dress"

STYLE:"silver","jeans"
BRAND:"silver jeans"

TYPE:"dress" (4)

TYPE:"dress"

STYLE:"silver","jeans"

TYPE:"jeans dress"

TYPE:"dress"

STYLE:"silver","jeans"

TYPE:"jeans dress"

TYPE:"dress"

STYLE:"silver"

STYLE:"silver"

STYLE:"silver","jeans"
BRAND:"silver jeans"

TYPE:"dress" (4)

TYPE:"dress"

STYLE:"silver","jeans"

TYPE:"jeans dress"

TYPE:"dress"

STYLE:"silver","jeans"

TYPE:"jeans dress"

TYPE:"dress"

STYLE:"silver"

STYLE:"silver"

STYLE:"silver","jeans"
BRAND:"silver jeans"

TYPE:"dress" (4)

TYPE:"dress" STYLE:"silver","jeans" (3)

TYPE:"jeans dress"

STYLE:"silver"

TYPE:"jeans dress"

STYLE:"silver"
BRAND:"silver jeans"

TYPE:"dress" (4)

TYPE:"dress" STYLE:"silver","jeans" (3)

TYPE:"jeans dress"

STYLE:"silver" (2)
silver jeans dress
BRAND:"silver jeans" TYPE:"dress"

(4)

TYPE:"dress" STYLE:"silver","jeans" (3)
TYPE:"jeans dress" STYLE:"silver"

(2)
silver jeans dress
BRAND:"silver jeans" TYPE:"dress"

(4)

TYPE:"dress" STYLE:"silver","jeans" (3)
TYPE:"jeans dress" STYLE:"silver"

(2)
http://goo.gl/7LJFi

Scorers, Collectors and
Custom Queries
http://google.com/+MikhailKhludnev
Appendixes
● Drill Sideways Facets
● Collectors
Appendix D

Drill Sideways Facets
+CATEGORY: Denim
+FIT: Straight
+WASH: Dark&B
+CATEGORY: Denim
+WASH: Dark&B

+CATEGORY: Denim
+FIT: Straight
+WASH: Dark&B
+CATEGORY: Denim
+WASH: Dark&B

+CATEGORY: Denim
+FIT: Straight
+WASH: Dark&B

+CATEGORY: Denim
+FIT: Straight
+CATEGORY: Denim
FIT: Straight
WASH: Dark&Black
...
/minShouldMatch=Ndrilldowns-1
FIT: Straight
+CAT: Denim

WASH: Dark
FIT: Straight
near miss
2
totalHits
3
near miss
2
WASH: Dark

+CAT: Denim
FIT: Straight
near miss
2
totalHits
3
near miss
2
WASH: Dark

+CAT: Denim
FIT: Straight
near miss
2
totalHits
3
near miss
2
WASH: Dark

+CAT: Denim
Doc at time
base query is highly selective
+CAT:D..{1, 7, 9, 15 }
FIT:S.. {2, 7, 8, 9, 10,12}
WASH:D..{2, 7, 11,13,15}
...
+CAT:D..{1, 7, 9, 15 }
FIT:S.. {2, 7, 8, 9, 10,12}
WASH:D..{2, 7, 11,13,15}
...
+CAT:D..{1, 7, 9, 15 }
FIT:S.. {2, 7, 8, 9, 10,12}
WASH:D..{2, 7, 11,13,15}
...
+CAT:D..{1, 7, 9, 15 }
FIT:S.. {2, 7, 8, 9, 10,12}
WASH:D..{2, 7, 11,13,15}
...

TopDocsCollector
+CAT:D..{1, 7, 9, 15 }
FIT:S.. {2, 7, 8, 9, 10,12}
WASH:D..{2, 7, 11,13,15}
...

TopDocsCollector
+CAT:D..{1, 7, 9, 15 }
FIT:S.. {2, 7, 8, 9, 10,12}
WASH:D..{2, 7, 11,13,15}
...

TopDocsCollector
+CAT:D..{1, 7, 9, 15 }
FIT:S.. {2, 7, 8, 9, 10,12}
WASH:D..{2, 7, 11,13,15}
...

TopDocsCollector
+CAT:D..{1, 7, 9, 15 }
FIT:S.. {2, 7, 8, 9, 10,12}
WASH:D..{2, 7, 11,13,15}
...

TopDocsCollector
+CAT:D..{1, 7, 9, 15 }
FIT:S.. {2, 7, 8, 9, 10,12}
WASH:D..{2, 7, 11,13,15}
...

TopDocsCollector
Term at time
drilldown queries are highly selective
+CAT:D..{1, 7, 9, 15 }
FIT:S.. {2, 7, 8, 9, 10,12}
WASH:D..{2, 7, 11,13,15}
...

hits
1
miss
Fit

1

2

...

hits
1
miss
Fit

7

hits
1
miss
Fit

8

9

10

11

hits hits
1
1
miss miss
Fit
Fit

12

13 15
+CAT:D..{1, 7, 9, 15 }
FIT:S.. {2, 7, 8, 9, 10,12}
WASH:D..{2, 7, 11,13,15}
...

hits
2
miss
no

1

2

...

hits hits hits hits hits hits hits hits
2
1
1
1
1
1
1
1
miss miss miss miss miss miss miss miss
no Wash Wash Wash Fit Wash Fit Fit

7

8

9

10

11

12

13 15
+CAT:D..{1, 7, 9, 15 }
FIT:S.. {2, 7, 8, 9, 10,12}
WASH:D..{2, 7, 11,13,15}
...

hits
2
miss
Cat

1

2

...

hits hits hits hits hits hits hits hits
3
1
1
1
1
2
2
1
miss miss miss miss miss miss miss miss
Wash
Wash
Fit
Wash
Wash
Fit Cat
Fit
Cat
Cat
Cat
Cat

7

8

9

10

11

12

13 15
hits
2
miss
Cat

1

2

...

hits hits hits hits hits hits hits hits
3
1
1
1
1
2
2
1
miss miss miss miss miss miss miss miss
Fit
no Wash Wash Wash Cat Wash Fit Cat Fit
Cat
Cat
Cat

7

8

9

10

11

12

13 15
TopDocsCollector

hits
3
miss

...
1

2

no
7

hits
2
miss
Fit

hits
2
miss
Wash

8

9

10

11

12

13 15
TopDocsCollector

hits
3
miss

...
1

2

no
7

hits
2
miss
Fit

hits
2
miss
Wash

8

9

10

11

12

13 15
TopDocsCollector

hits
3
miss

...
1

2

no
7

hits
2
miss
Fit

hits
2
miss
Wash

8

9

10

11

12

13 15
Collector

DocSetCollector

TopDocsCollector

TopFieldCollector
TopScoreDocsCollector
DocSet or DocList?
long [952045] = { 0, 0, 0, 0, 2050, 0, 0, 8, 0, 0, 0,... }

int [2079] = {4, 12, 45, 67, 103, 673, 5890, 34103,...}

int [100] = {8947, 7498,1, 230, 2356, 9812, 167,....}
DocList/
TopDoc

DocSet

Size

k
(numHits or
rows)

N
(maxDocs)

Ordered by

score or
field

docID

allows*

almost
could allow
(No)

Out-of-order
collecting
?×4

6×4
9×5 2×4
2×7 7×9 1×9
http://www.flickr.com/photos/jbagley/4303976811/sizes/o/
class OutOfOrderTopScoreDocCollector
boolean acceptsDocsOutOfOrder(){ return true;
}
..
void collect(int doc) {
float score = scorer.score();
...
if (score == pqTop.score && doc > pqTop.doc) {
...
}
UML
http://www.flickr.com/photos/kristykay/2922670979/lightbox/

More Related Content

Viewers also liked

Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lucidworks
 

Viewers also liked (16)

BDD - beyond: Given, When and Then
BDD - beyond: Given, When and ThenBDD - beyond: Given, When and Then
BDD - beyond: Given, When and Then
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solr
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
乱択データ構造の最新事情 -MinHash と HyperLogLog の最近の進歩-
乱択データ構造の最新事情 -MinHash と HyperLogLog の最近の進歩-乱択データ構造の最新事情 -MinHash と HyperLogLog の最近の進歩-
乱択データ構造の最新事情 -MinHash と HyperLogLog の最近の進歩-
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Redmineを快適に使うためのおすすめ初期設定
Redmineを快適に使うためのおすすめ初期設定Redmineを快適に使うためのおすすめ初期設定
Redmineを快適に使うためのおすすめ初期設定
 
How to Help a Jamaican Come on Time - Time-Based Productivity via Psychology ...
How to Help a Jamaican Come on Time - Time-Based Productivity via Psychology ...How to Help a Jamaican Come on Time - Time-Based Productivity via Psychology ...
How to Help a Jamaican Come on Time - Time-Based Productivity via Psychology ...
 
Guia do Desenvolvimento de Brindes
Guia do Desenvolvimento de BrindesGuia do Desenvolvimento de Brindes
Guia do Desenvolvimento de Brindes
 

More from lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Lucene Search Essentials: Scorers, Collectors and Custom Queries