SlideShare a Scribd company logo
1 of 23
Download to read offline
O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
The Evolution of Lucene & Solr Numerics
from Strings to Points
Steve Rowe
Senior Software Engineer, Lucidworks
@steven_a_rowe
3
01
Agenda
1. {Long time ago, yesterday}: History
2. Today: Benchmarks
3. Tomorrow: Future developments
Not on the agenda: geospatial; stats/analytics; streaming expressions
4
01
Yesterday
Lucene 0.01
March 2000
Lucene 1.2
June 2002
Lucene 1.9
Feb. 2006
Solr 1.1
Dec. 2006
NumberTools
1. BCDTypeField
2. SortableTypeField
3. TypeField
Solr 1.4
Nov. 2009
Lucene 2.9
Sept. 2009
Trie numerics
Lucene/Solr 4.0
Oct. 2012
Lucene 2.4
Oct. 2008
UTF-8 terms
1. Flexible indexing
2. Binary terms
3. DocValues
4. -=NumberTools
Modified
UTF-8 terms
Lucene/Solr 5.2
June 2015
Auto-prefix terms
Lucene/Solr 6.2
Aug. 2016
-=Auto-prefix terms
Lucene/Solr 6.0
Apr. 2016
1. Dimensional Points
2. Trie deprecated
Solr 5.0
Feb. 2015
1. -=BCDTypeField
2. -=SortableTypeField
3. -=TypeField
Lucene 1.4
July 2004
FieldCache
5
01
Yesterday
Lucene 0.01
March 2000
Lucene 1.2
June 2002
Lucene 1.9
Feb. 2006
Solr 1.1
Dec. 2006
NumberTools
Solr 1.4
Nov. 2009
Lucene 2.9
Sept. 2009
Trie numerics
Lucene/Solr 4.0
Oct. 2012
Lucene 2.4
Oct. 2008
UTF-8 terms
1. Flexible indexing
2. Binary terms
3. DocValues
4. -=NumberTools
Modified
UTF-8 terms
Lucene/Solr 5.2
June 2015
Auto-prefix terms
Lucene/Solr 6.2
Aug. 2016
-=Auto-prefix terms
Lucene/Solr 6.0
Apr. 2016
1. Dimensional Points
2. Trie deprecated
Solr 5.0
Feb. 2015
Lucene 1.4
July 2004
FieldCache
Binary terms
1. Modified UTF-8: null is 2 bytes C0 80; UTF-16 surrogate code units are 3 bytes; length in UTF-16 chars
2. String -> byte sequence
1. BCDTypeField
2. SortableTypeField
3. TypeField
1. -=BCDTypeField
2. -=SortableTypeField
3. -=TypeField
6
01
Yesterday
Lucene 0.01
March 2000
Lucene 1.2
June 2002
Lucene 1.9
Feb. 2006
Solr 1.1
Dec. 2006
NumberTools
Solr 1.4
Nov. 2009
Lucene 2.9
Sept. 2009
Trie numerics
Lucene/Solr 4.0
Oct. 2012
Lucene 2.4
Oct. 2008
UTF-8 terms
1. Flexible indexing
2. Binary terms
3. DocValues
4. -=NumberTools
Modified
UTF-8 terms
Lucene/Solr 5.2
June 2015
Auto-prefix terms
Lucene/Solr 6.2
Aug. 2016
-=Auto-prefix terms
Lucene/Solr 6.0
Apr. 2016
1. Dimensional Points
2. Trie deprecated
Solr 5.0
Feb. 2015
Lucene 1.4
July 2004
FieldCache
1. In the beginning, everything was a String
2. Solr Int/Long/etc.: base 10 variable-width String
3. To make string-encoded integers sortable, 

left-zero-pad to fixed width, e.g. 15 -> 000015
1. BCDTypeField
2. SortableTypeField
3. TypeField
1. -=BCDTypeField
2. -=SortableTypeField
3. -=TypeFieldTypeField -=TypeField
7
01
Yesterday
Lucene 0.01
March 2000
Lucene 1.2
June 2002
Lucene 1.9
Feb. 2006
Solr 1.1
Dec. 2006
NumberTools
Solr 1.4
Nov. 2009
Lucene 2.9
Sept. 2009
Trie numerics
Lucene/Solr 4.0
Oct. 2012
Lucene 2.4
Oct. 2008
UTF-8 terms
1. Flexible indexing
2. Binary terms
3. DocValues
4. -=NumberTools
Modified
UTF-8 terms
Lucene/Solr 5.2
June 2015
Auto-prefix terms
Lucene/Solr 6.2
Aug. 2016
-=Auto-prefix terms
Lucene/Solr 6.0
Apr. 2016
1. Dimensional Points
2. Trie deprecated
Solr 5.0
Feb. 2015
Lucene 1.4
July 2004
FieldCache
-=NumberTools
1. NumberTools: base 36 long
2. BCD: base 10k int/long
3. Sortable Int/Float/Long/Double/Date:
32-bit=12 bits/char; 64-bit=14 bits/char
1. BCDTypeField
2. SortableTypeField
3. TypeField
1. -=BCDTypeField
2. -=SortableTypeField
3. -=TypeField
BCDTypeField
SortableTypeField
-=BCDTypeField
-=SortableTypeField
8
01
Yesterday
Lucene 0.01
March 2000
Lucene 1.2
June 2002
Lucene 1.9
Feb. 2006
Solr 1.1
Dec. 2006
NumberTools
Solr 1.4
Nov. 2009
Lucene 2.9
Sept. 2009
Trie numerics
Lucene/Solr 4.0
Oct. 2012
Lucene 2.4
Oct. 2008
UTF-8 terms
1. Flexible indexing
2. Binary terms
3. DocValues
4. -=NumberTools
Modified
UTF-8 terms
Lucene/Solr 5.2
June 2015
Auto-prefix terms
Lucene/Solr 6.2
Aug. 2016
-=Auto-prefix terms
Lucene/Solr 6.0
Apr. 2016
1. Dimensional Points
2. Trie deprecated
Solr 5.0
Feb. 2015
Lucene 1.4
July 2004
FieldCache
FieldCache: uninverted per-doc array of native field values, constructed at search time
1. BCDTypeField
2. SortableTypeField
3. TypeField
1. -=BCDTypeField
2. -=SortableTypeField
3. -=TypeField
9
01
Yesterday
Lucene 0.01
March 2000
Lucene 1.2
June 2002
Lucene 1.9
Feb. 2006
Solr 1.1
Dec. 2006
NumberTools
1. BCDTypeField
2. SortableTypeField
3. TypeField
Solr 1.4
Nov. 2009
Lucene 2.9
Sept. 2009
Trie numerics
Lucene/Solr 4.0
Oct. 2012
Lucene 2.4
Oct. 2008
UTF-8 terms
1. Flexible indexing
2. Binary terms
3. DocValues
4. -=NumberTools
Modified
UTF-8 terms
Lucene/Solr 5.2
June 2015
Auto-prefix terms
Lucene/Solr 6.2
Aug. 2016
-=Auto-prefix terms
Lucene/Solr 6.0
Apr. 2016
1. Dimensional Points
2. Trie deprecated
Solr 5.0
Feb. 2015
1. -=BCDTypeField
2. -=SortableTypeField
3. -=TypeField
Lucene 1.4
July 2004
FieldCache Trie deprecated
10
01
Trie numerics
From http://www.thetaphi.de/share/Schindler-TrieRange.ppt:
421
52
4
44 6442
644642641634633632522521448446445423
63
5 6
Range
1. Fast range queries
2. Fewer terms required than term range queries
3. 7-bit encoded to minimize disk footprint
4. Adjustable “precisionStep”: number of bits to

shift when generating synthetic terms
5. Synthetic prefix terms created by stripping low

bits and prepending the shift amount in the first byte
1. E.g.: For 423, synthetic terms 42 and 4 are also

indexed
2. When searching range [423, 642]: the lowest-

precision terms covering the range are used:

423, 44, 5, 63, 641, 642 (6 terms), versus

11 terms required by a term range query.
11
01
Yesterday
Lucene 0.01
March 2000
Lucene 1.2
June 2002
Lucene 1.9
Feb. 2006
Solr 1.1
Dec. 2006
NumberTools
Solr 1.4
Nov. 2009
Lucene 2.9
Sept. 2009
Trie numerics
Lucene/Solr 4.0
Oct. 2012
Lucene 2.4
Oct. 2008
UTF-8 terms
1. Flexible indexing
2. Binary terms
3. DocValues
4. -=NumberTools
Modified
UTF-8 terms
Lucene/Solr 5.2
June 2015
Auto-prefix terms
Lucene/Solr 6.2
Aug. 2016
-=Auto-prefix terms
Lucene/Solr 6.0
Apr. 2016
1. Dimensional Points
2. Trie deprecated
Solr 5.0
Feb. 2015
Lucene 1.4
July 2004
FieldCache
1. BCDTypeField
2. SortableTypeField
3. TypeField
1. -=BCDTypeField
2. -=SortableTypeField
3. -=TypeField
DocValues
DocValues: field cache constructed at index-time
12
01
Yesterday
Lucene 0.01
March 2000
Lucene 1.2
June 2002
Lucene 1.9
Feb. 2006
Solr 1.1
Dec. 2006
NumberTools
1. BCDTypeField
2. SortableTypeField
3. TypeField
Solr 1.4
Nov. 2009
Lucene 2.9
Sept. 2009
Trie numerics
Lucene/Solr 4.0
Oct. 2012
Lucene 2.4
Oct. 2008
UTF-8 terms
1. Flexible indexing
2. Binary terms
3. DocValues
4. -=NumberTools
Modified
UTF-8 terms
Lucene/Solr 5.2
June 2015
Auto-prefix terms
Lucene/Solr 6.2
Aug. 2016
-=Auto-prefix terms
Lucene/Solr 6.0
Apr. 2016
1. Dimensional Points
2. Trie deprecated
Solr 5.0
Feb. 2015
1. -=BCDTypeField
2. -=SortableTypeField
3. -=TypeField
Lucene 1.4
July 2004
FieldCache
Flexible indexing
Flexible indexing: simplify/enable new index formats via modularization
13
01
Yesterday
Lucene 0.01
March 2000
Lucene 1.2
June 2002
Lucene 1.9
Feb. 2006
Solr 1.1
Dec. 2006
NumberTools
1. BCDTypeField
2. SortableTypeField
3. TypeField
Solr 1.4
Nov. 2009
Lucene 2.9
Sept. 2009
Trie numerics
Lucene/Solr 4.0
Oct. 2012
Lucene 2.4
Oct. 2008
UTF-8 terms
1. Flexible indexing
2. Binary terms
3. DocValues
4. -=NumberTools
Modified
UTF-8 terms
Lucene/Solr 5.2
June 2015
Auto-prefix terms
Lucene/Solr 6.2
Aug. 2016
-=Auto-prefix terms
Lucene/Solr 6.0
Apr. 2016
1. Dimensional Points
2. Trie deprecated
Solr 5.0
Feb. 2015
1. -=BCDTypeField
2. -=SortableTypeField
3. -=TypeField
Lucene 1.4
July 2004
FieldCache
1. Auto-prefix terms: generalization of trie numeric

strategy, in block tree terms dictionary.
2. Intended to replace trie numerics: LUCENE-5966
3. Removed in favor of points.
14
01
Yesterday
Lucene 0.01
March 2000
Lucene 1.2
June 2002
Lucene 1.9
Feb. 2006
Solr 1.1
Dec. 2006
NumberTools
1. BCDTypeField
2. SortableTypeField
3. TypeField
Solr 1.4
Nov. 2009
Lucene 2.9
Sept. 2009
Trie numerics
Lucene/Solr 4.0
Oct. 2012
Lucene 2.4
Oct. 2008
UTF-8 terms
1. Flexible indexing
2. Binary terms
3. DocValues
4. -=NumberTools
Modified
UTF-8 terms
Lucene/Solr 5.2
June 2015
Auto-prefix terms
Lucene/Solr 6.2
Aug. 2016
-=Auto-prefix terms
Lucene/Solr 6.0
Apr. 2016
1. Dimensional Points
2. Trie deprecated
Solr 5.0
Feb. 2015
1. -=BCDTypeField
2. -=SortableTypeField
3. -=TypeField
Lucene 1.4
July 2004
FieldCache
Dimensional Points
}
}15
01
Dimensional Points
1. All point values in a field have

the same fixed width (max 128bit)
2. 1D - 8D
3. Block k-d tree
4. Points are sorted;

recursively

partitioned along

the longest

dimension; then

at a target

cardinality, the

“leaf block” is

written out.
1-8 dimensions
1-16 bytes per dimension
4. An in-memory binary

tree index points to

the leaf blocks.
5. Adaptive optimal

partitioning (versus

trie numerics, which

generates terms

irrespective of local

density.)
16
01
Dimensional Points
1. Lucene-only - no Solr support yet
2. Optimized for query types: 

range, distance, nearest-neighbor, and point-in-polygon
3. Multi-valued support
4. Not supported: value retrieval (store if you need this)
5. Not supported: sorting or faceting (use DocValues for these)
17
01
Dimensional Points
1D Native 1D 128-bit 1D-4D Range 2D Geospatial 3D Geospatial
Implementations
LongPoint

IntPoint
DoublePoint
FloatPoint
BinaryPoint
BigIntegerPoint
InetAddressPoint
LongRangeField
IntRangeField
DoubleRangeField
FloatRangeField
LatLonPoint Geo3DPoint
Supported
queries
1. any in set
2. exact
3. range
1. any in set
2. exact
3. range
1.intersects
2.contains
3.within
(given a range)
1. within box
2. within distance
3. within polygon
4. nearest neighbor
1.within shape
18
01
Today
Mike McCandless benchmarked pre-6.0 1D points and found*:

1. Points were substantially faster at both index- and query- time than the equivalent 

Trie numeric type.
2. Index size was smaller with points.
3. Query-time heap usage with points was much lower.
Adrien Grand re-ran Mike’s benchmark against a Lucene 6.2 snapshot**, and drew similar

conclusions: “36% faster at query time, 71% faster at index time and used 66% less disk and

85% less memory"
* https://www.elastic.co/blog/lucene-points-6.0
** https://www.elastic.co/blog/searching-numb3rs-in-5.0
19
01
Today
I benchmarked fixed range queries against trie and point long, int and double fields in
25 million NYC taxi trips using modified tools from luceneutil.
I create an index with three versions of each long, int and double field:
1. Trie numerics with the default precision step
2. Point fields
3. Trie numerics with a precision step the same width as the numbers - this should provide

a maximum performance threshold for String ranges.
20
01
Today
Indexing
time
Index size
Points 31s 1.2GiB
Trie 53s 1.6GiB
Single-precision trie 19s 0.7GiB
The index has 24 fields defined: 6 string fields, 1 text field, 2 long fields,

1 int field, and 14 double fields.
21
01
Today
field cardinality hits type query time
passenger_count 10 7.5M
IntPoint 86ms
TrieInt/8 114ms
TrieInt/32 116ms
pick_up_date_time 4.1M 10.4M
LongPoint 69ms
TrieLong/16 105ms
TrieLong/64 365ms
trip_distance 4,754 9.6M
DoublePoint 116ms
TrieDouble/16 92ms
TrieDouble/64 105ms
22
01
Tomorrow
1. Add support for PointFields in Solr: SOLR-8396
2. David Smiley will be working on adding a Solr adaptor for LatLonPoint in the near future.
3. Trie numerics will be removed from Lucene in 7.0, but Solr may take ownership to provide

a longer backcompat timeframe.
4. FieldCache may be removed from Lucene / moved to Solr: LUCENE-7283
23
01
References
1. Numeric Range Queries with Lucene TrieRange:

http://www.thetaphi.de/share/Schindler-TrieRange.ppt
2. Generic XML-based Framework for Metadata Portals:

http://epic.awi.de/17813/1/Sch2007br.pdf
3. Fun with flexible indexing: 

http://blog.mikemccandless.com/2010/10/fun-with-flexible-indexing.html
4. Searching numb3rs in 5.0: https://www.elastic.co/blog/searching-numb3rs-in-5.0
5. Multi-dimensional points, coming in Apache Lucene 6.0:

https://www.elastic.co/blog/lucene-points-6.0
6. Bkd-tree: A Dynamic Scalable kd-tree:

http://www.madalgo.au.dk/~large/Papers/bkdsstd03.ps
7. Luceneutil: https://github.com/mikemccand/luceneutil/

More Related Content

What's hot

Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?Adrien Grand
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Adrien Grand
 
Improved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert MuirImproved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert Muirlucenerevolution
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetuprcmuir
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Luceneotisg
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneSwapnil & Patil
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)dnaber
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache luceneShrikrishna Parab
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsOpenSource Connections
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)Kira
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and SolrGrant Ingersoll
 

What's hot (20)

Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015
 
Improved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert MuirImproved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert Muir
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Lucene
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 
Lucene
LuceneLucene
Lucene
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache lucene
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 

Viewers also liked

Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMLucidworks
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrVadim Kirilchuk
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartLucidworks
 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill lucenerevolution
 
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchBattle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchSematext Group, Inc.
 
Facettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrFacettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrThomas Koch
 
Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Warum 'ne Datenbank, wenn wir Elasticsearch haben?Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Warum 'ne Datenbank, wenn wir Elasticsearch haben?Jodok Batlogg
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solrlucenerevolution
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Lucidworks
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1YI-CHING WU
 
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksState of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksLucidworks
 
Webinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceWebinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceLucidworks
 
Analytics in olap with lucene & hadoop
Analytics in olap with lucene & hadoopAnalytics in olap with lucene & hadoop
Analytics in olap with lucene & hadooplucenerevolution
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBertrand Delacretaz
 
Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrAnshum Gupta
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Hortonworks
 
Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrAnshum Gupta
 

Viewers also liked (20)

Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and Solr
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
 
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchBattle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
 
Facettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrFacettensuche mit Lucene und Solr
Facettensuche mit Lucene und Solr
 
Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Warum 'ne Datenbank, wenn wir Elasticsearch haben?Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Warum 'ne Datenbank, wenn wir Elasticsearch haben?
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solr
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
 
Lucene And Solr Intro
Lucene And Solr IntroLucene And Solr Intro
Lucene And Solr Intro
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
 
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksState of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
 
Webinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceWebinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior Relevance
 
Analytics in olap with lucene & hadoop
Analytics in olap with lucene & hadoopAnalytics in olap with lucene & hadoop
Analytics in olap with lucene & hadoop
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache Solr
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
 
Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache Solr
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by Steve Rowe, Lucidworks

  • 1. O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
  • 2. The Evolution of Lucene & Solr Numerics from Strings to Points Steve Rowe Senior Software Engineer, Lucidworks @steven_a_rowe
  • 3. 3 01 Agenda 1. {Long time ago, yesterday}: History 2. Today: Benchmarks 3. Tomorrow: Future developments Not on the agenda: geospatial; stats/analytics; streaming expressions
  • 4. 4 01 Yesterday Lucene 0.01 March 2000 Lucene 1.2 June 2002 Lucene 1.9 Feb. 2006 Solr 1.1 Dec. 2006 NumberTools 1. BCDTypeField 2. SortableTypeField 3. TypeField Solr 1.4 Nov. 2009 Lucene 2.9 Sept. 2009 Trie numerics Lucene/Solr 4.0 Oct. 2012 Lucene 2.4 Oct. 2008 UTF-8 terms 1. Flexible indexing 2. Binary terms 3. DocValues 4. -=NumberTools Modified UTF-8 terms Lucene/Solr 5.2 June 2015 Auto-prefix terms Lucene/Solr 6.2 Aug. 2016 -=Auto-prefix terms Lucene/Solr 6.0 Apr. 2016 1. Dimensional Points 2. Trie deprecated Solr 5.0 Feb. 2015 1. -=BCDTypeField 2. -=SortableTypeField 3. -=TypeField Lucene 1.4 July 2004 FieldCache
  • 5. 5 01 Yesterday Lucene 0.01 March 2000 Lucene 1.2 June 2002 Lucene 1.9 Feb. 2006 Solr 1.1 Dec. 2006 NumberTools Solr 1.4 Nov. 2009 Lucene 2.9 Sept. 2009 Trie numerics Lucene/Solr 4.0 Oct. 2012 Lucene 2.4 Oct. 2008 UTF-8 terms 1. Flexible indexing 2. Binary terms 3. DocValues 4. -=NumberTools Modified UTF-8 terms Lucene/Solr 5.2 June 2015 Auto-prefix terms Lucene/Solr 6.2 Aug. 2016 -=Auto-prefix terms Lucene/Solr 6.0 Apr. 2016 1. Dimensional Points 2. Trie deprecated Solr 5.0 Feb. 2015 Lucene 1.4 July 2004 FieldCache Binary terms 1. Modified UTF-8: null is 2 bytes C0 80; UTF-16 surrogate code units are 3 bytes; length in UTF-16 chars 2. String -> byte sequence 1. BCDTypeField 2. SortableTypeField 3. TypeField 1. -=BCDTypeField 2. -=SortableTypeField 3. -=TypeField
  • 6. 6 01 Yesterday Lucene 0.01 March 2000 Lucene 1.2 June 2002 Lucene 1.9 Feb. 2006 Solr 1.1 Dec. 2006 NumberTools Solr 1.4 Nov. 2009 Lucene 2.9 Sept. 2009 Trie numerics Lucene/Solr 4.0 Oct. 2012 Lucene 2.4 Oct. 2008 UTF-8 terms 1. Flexible indexing 2. Binary terms 3. DocValues 4. -=NumberTools Modified UTF-8 terms Lucene/Solr 5.2 June 2015 Auto-prefix terms Lucene/Solr 6.2 Aug. 2016 -=Auto-prefix terms Lucene/Solr 6.0 Apr. 2016 1. Dimensional Points 2. Trie deprecated Solr 5.0 Feb. 2015 Lucene 1.4 July 2004 FieldCache 1. In the beginning, everything was a String 2. Solr Int/Long/etc.: base 10 variable-width String 3. To make string-encoded integers sortable, 
 left-zero-pad to fixed width, e.g. 15 -> 000015 1. BCDTypeField 2. SortableTypeField 3. TypeField 1. -=BCDTypeField 2. -=SortableTypeField 3. -=TypeFieldTypeField -=TypeField
  • 7. 7 01 Yesterday Lucene 0.01 March 2000 Lucene 1.2 June 2002 Lucene 1.9 Feb. 2006 Solr 1.1 Dec. 2006 NumberTools Solr 1.4 Nov. 2009 Lucene 2.9 Sept. 2009 Trie numerics Lucene/Solr 4.0 Oct. 2012 Lucene 2.4 Oct. 2008 UTF-8 terms 1. Flexible indexing 2. Binary terms 3. DocValues 4. -=NumberTools Modified UTF-8 terms Lucene/Solr 5.2 June 2015 Auto-prefix terms Lucene/Solr 6.2 Aug. 2016 -=Auto-prefix terms Lucene/Solr 6.0 Apr. 2016 1. Dimensional Points 2. Trie deprecated Solr 5.0 Feb. 2015 Lucene 1.4 July 2004 FieldCache -=NumberTools 1. NumberTools: base 36 long 2. BCD: base 10k int/long 3. Sortable Int/Float/Long/Double/Date: 32-bit=12 bits/char; 64-bit=14 bits/char 1. BCDTypeField 2. SortableTypeField 3. TypeField 1. -=BCDTypeField 2. -=SortableTypeField 3. -=TypeField BCDTypeField SortableTypeField -=BCDTypeField -=SortableTypeField
  • 8. 8 01 Yesterday Lucene 0.01 March 2000 Lucene 1.2 June 2002 Lucene 1.9 Feb. 2006 Solr 1.1 Dec. 2006 NumberTools Solr 1.4 Nov. 2009 Lucene 2.9 Sept. 2009 Trie numerics Lucene/Solr 4.0 Oct. 2012 Lucene 2.4 Oct. 2008 UTF-8 terms 1. Flexible indexing 2. Binary terms 3. DocValues 4. -=NumberTools Modified UTF-8 terms Lucene/Solr 5.2 June 2015 Auto-prefix terms Lucene/Solr 6.2 Aug. 2016 -=Auto-prefix terms Lucene/Solr 6.0 Apr. 2016 1. Dimensional Points 2. Trie deprecated Solr 5.0 Feb. 2015 Lucene 1.4 July 2004 FieldCache FieldCache: uninverted per-doc array of native field values, constructed at search time 1. BCDTypeField 2. SortableTypeField 3. TypeField 1. -=BCDTypeField 2. -=SortableTypeField 3. -=TypeField
  • 9. 9 01 Yesterday Lucene 0.01 March 2000 Lucene 1.2 June 2002 Lucene 1.9 Feb. 2006 Solr 1.1 Dec. 2006 NumberTools 1. BCDTypeField 2. SortableTypeField 3. TypeField Solr 1.4 Nov. 2009 Lucene 2.9 Sept. 2009 Trie numerics Lucene/Solr 4.0 Oct. 2012 Lucene 2.4 Oct. 2008 UTF-8 terms 1. Flexible indexing 2. Binary terms 3. DocValues 4. -=NumberTools Modified UTF-8 terms Lucene/Solr 5.2 June 2015 Auto-prefix terms Lucene/Solr 6.2 Aug. 2016 -=Auto-prefix terms Lucene/Solr 6.0 Apr. 2016 1. Dimensional Points 2. Trie deprecated Solr 5.0 Feb. 2015 1. -=BCDTypeField 2. -=SortableTypeField 3. -=TypeField Lucene 1.4 July 2004 FieldCache Trie deprecated
  • 10. 10 01 Trie numerics From http://www.thetaphi.de/share/Schindler-TrieRange.ppt: 421 52 4 44 6442 644642641634633632522521448446445423 63 5 6 Range 1. Fast range queries 2. Fewer terms required than term range queries 3. 7-bit encoded to minimize disk footprint 4. Adjustable “precisionStep”: number of bits to
 shift when generating synthetic terms 5. Synthetic prefix terms created by stripping low
 bits and prepending the shift amount in the first byte 1. E.g.: For 423, synthetic terms 42 and 4 are also
 indexed 2. When searching range [423, 642]: the lowest-
 precision terms covering the range are used:
 423, 44, 5, 63, 641, 642 (6 terms), versus
 11 terms required by a term range query.
  • 11. 11 01 Yesterday Lucene 0.01 March 2000 Lucene 1.2 June 2002 Lucene 1.9 Feb. 2006 Solr 1.1 Dec. 2006 NumberTools Solr 1.4 Nov. 2009 Lucene 2.9 Sept. 2009 Trie numerics Lucene/Solr 4.0 Oct. 2012 Lucene 2.4 Oct. 2008 UTF-8 terms 1. Flexible indexing 2. Binary terms 3. DocValues 4. -=NumberTools Modified UTF-8 terms Lucene/Solr 5.2 June 2015 Auto-prefix terms Lucene/Solr 6.2 Aug. 2016 -=Auto-prefix terms Lucene/Solr 6.0 Apr. 2016 1. Dimensional Points 2. Trie deprecated Solr 5.0 Feb. 2015 Lucene 1.4 July 2004 FieldCache 1. BCDTypeField 2. SortableTypeField 3. TypeField 1. -=BCDTypeField 2. -=SortableTypeField 3. -=TypeField DocValues DocValues: field cache constructed at index-time
  • 12. 12 01 Yesterday Lucene 0.01 March 2000 Lucene 1.2 June 2002 Lucene 1.9 Feb. 2006 Solr 1.1 Dec. 2006 NumberTools 1. BCDTypeField 2. SortableTypeField 3. TypeField Solr 1.4 Nov. 2009 Lucene 2.9 Sept. 2009 Trie numerics Lucene/Solr 4.0 Oct. 2012 Lucene 2.4 Oct. 2008 UTF-8 terms 1. Flexible indexing 2. Binary terms 3. DocValues 4. -=NumberTools Modified UTF-8 terms Lucene/Solr 5.2 June 2015 Auto-prefix terms Lucene/Solr 6.2 Aug. 2016 -=Auto-prefix terms Lucene/Solr 6.0 Apr. 2016 1. Dimensional Points 2. Trie deprecated Solr 5.0 Feb. 2015 1. -=BCDTypeField 2. -=SortableTypeField 3. -=TypeField Lucene 1.4 July 2004 FieldCache Flexible indexing Flexible indexing: simplify/enable new index formats via modularization
  • 13. 13 01 Yesterday Lucene 0.01 March 2000 Lucene 1.2 June 2002 Lucene 1.9 Feb. 2006 Solr 1.1 Dec. 2006 NumberTools 1. BCDTypeField 2. SortableTypeField 3. TypeField Solr 1.4 Nov. 2009 Lucene 2.9 Sept. 2009 Trie numerics Lucene/Solr 4.0 Oct. 2012 Lucene 2.4 Oct. 2008 UTF-8 terms 1. Flexible indexing 2. Binary terms 3. DocValues 4. -=NumberTools Modified UTF-8 terms Lucene/Solr 5.2 June 2015 Auto-prefix terms Lucene/Solr 6.2 Aug. 2016 -=Auto-prefix terms Lucene/Solr 6.0 Apr. 2016 1. Dimensional Points 2. Trie deprecated Solr 5.0 Feb. 2015 1. -=BCDTypeField 2. -=SortableTypeField 3. -=TypeField Lucene 1.4 July 2004 FieldCache 1. Auto-prefix terms: generalization of trie numeric
 strategy, in block tree terms dictionary. 2. Intended to replace trie numerics: LUCENE-5966 3. Removed in favor of points.
  • 14. 14 01 Yesterday Lucene 0.01 March 2000 Lucene 1.2 June 2002 Lucene 1.9 Feb. 2006 Solr 1.1 Dec. 2006 NumberTools 1. BCDTypeField 2. SortableTypeField 3. TypeField Solr 1.4 Nov. 2009 Lucene 2.9 Sept. 2009 Trie numerics Lucene/Solr 4.0 Oct. 2012 Lucene 2.4 Oct. 2008 UTF-8 terms 1. Flexible indexing 2. Binary terms 3. DocValues 4. -=NumberTools Modified UTF-8 terms Lucene/Solr 5.2 June 2015 Auto-prefix terms Lucene/Solr 6.2 Aug. 2016 -=Auto-prefix terms Lucene/Solr 6.0 Apr. 2016 1. Dimensional Points 2. Trie deprecated Solr 5.0 Feb. 2015 1. -=BCDTypeField 2. -=SortableTypeField 3. -=TypeField Lucene 1.4 July 2004 FieldCache Dimensional Points
  • 15. } }15 01 Dimensional Points 1. All point values in a field have
 the same fixed width (max 128bit) 2. 1D - 8D 3. Block k-d tree 4. Points are sorted;
 recursively
 partitioned along
 the longest
 dimension; then
 at a target
 cardinality, the
 “leaf block” is
 written out. 1-8 dimensions 1-16 bytes per dimension 4. An in-memory binary
 tree index points to
 the leaf blocks. 5. Adaptive optimal
 partitioning (versus
 trie numerics, which
 generates terms
 irrespective of local
 density.)
  • 16. 16 01 Dimensional Points 1. Lucene-only - no Solr support yet 2. Optimized for query types: 
 range, distance, nearest-neighbor, and point-in-polygon 3. Multi-valued support 4. Not supported: value retrieval (store if you need this) 5. Not supported: sorting or faceting (use DocValues for these)
  • 17. 17 01 Dimensional Points 1D Native 1D 128-bit 1D-4D Range 2D Geospatial 3D Geospatial Implementations LongPoint
 IntPoint DoublePoint FloatPoint BinaryPoint BigIntegerPoint InetAddressPoint LongRangeField IntRangeField DoubleRangeField FloatRangeField LatLonPoint Geo3DPoint Supported queries 1. any in set 2. exact 3. range 1. any in set 2. exact 3. range 1.intersects 2.contains 3.within (given a range) 1. within box 2. within distance 3. within polygon 4. nearest neighbor 1.within shape
  • 18. 18 01 Today Mike McCandless benchmarked pre-6.0 1D points and found*:
 1. Points were substantially faster at both index- and query- time than the equivalent 
 Trie numeric type. 2. Index size was smaller with points. 3. Query-time heap usage with points was much lower. Adrien Grand re-ran Mike’s benchmark against a Lucene 6.2 snapshot**, and drew similar
 conclusions: “36% faster at query time, 71% faster at index time and used 66% less disk and
 85% less memory" * https://www.elastic.co/blog/lucene-points-6.0 ** https://www.elastic.co/blog/searching-numb3rs-in-5.0
  • 19. 19 01 Today I benchmarked fixed range queries against trie and point long, int and double fields in 25 million NYC taxi trips using modified tools from luceneutil. I create an index with three versions of each long, int and double field: 1. Trie numerics with the default precision step 2. Point fields 3. Trie numerics with a precision step the same width as the numbers - this should provide
 a maximum performance threshold for String ranges.
  • 20. 20 01 Today Indexing time Index size Points 31s 1.2GiB Trie 53s 1.6GiB Single-precision trie 19s 0.7GiB The index has 24 fields defined: 6 string fields, 1 text field, 2 long fields,
 1 int field, and 14 double fields.
  • 21. 21 01 Today field cardinality hits type query time passenger_count 10 7.5M IntPoint 86ms TrieInt/8 114ms TrieInt/32 116ms pick_up_date_time 4.1M 10.4M LongPoint 69ms TrieLong/16 105ms TrieLong/64 365ms trip_distance 4,754 9.6M DoublePoint 116ms TrieDouble/16 92ms TrieDouble/64 105ms
  • 22. 22 01 Tomorrow 1. Add support for PointFields in Solr: SOLR-8396 2. David Smiley will be working on adding a Solr adaptor for LatLonPoint in the near future. 3. Trie numerics will be removed from Lucene in 7.0, but Solr may take ownership to provide
 a longer backcompat timeframe. 4. FieldCache may be removed from Lucene / moved to Solr: LUCENE-7283
  • 23. 23 01 References 1. Numeric Range Queries with Lucene TrieRange:
 http://www.thetaphi.de/share/Schindler-TrieRange.ppt 2. Generic XML-based Framework for Metadata Portals:
 http://epic.awi.de/17813/1/Sch2007br.pdf 3. Fun with flexible indexing: 
 http://blog.mikemccandless.com/2010/10/fun-with-flexible-indexing.html 4. Searching numb3rs in 5.0: https://www.elastic.co/blog/searching-numb3rs-in-5.0 5. Multi-dimensional points, coming in Apache Lucene 6.0:
 https://www.elastic.co/blog/lucene-points-6.0 6. Bkd-tree: A Dynamic Scalable kd-tree:
 http://www.madalgo.au.dk/~large/Papers/bkdsstd03.ps 7. Luceneutil: https://github.com/mikemccand/luceneutil/