SlideShare une entreprise Scribd logo
1  sur  16
LUCENE 4 SPATIAL
2012 Basis Technology
Open Source Search Conference
Presented by David Smiley, MITRE




                                   © 2012 The MITRE Corporation. All rights reserved.
About David Smiley
• Working at MITRE, for 12 years
  • web development, Java, search
  • 3 Solr apps, 1 Endeca
• Published 1st book on Solr; then 2nd edition (2009, 2011)
• Apache Lucene / Solr committer (2012)
  • Specializing on spatial
• Presented at Lucene Revolution (2010) & Basis O.S.
  Search Conference (2011)
• Taught Solr classes at MITRE (2010, 2011, 2012)
• Solr search consultant within MITRE and its sponsors,
  and privately via OpenSource Connections

                                              2
                                              © 2012 The MITRE Corporation. All rights reserved.
What is Spatial Search?
Primary features:
  • Spatial filter query
  • Spatial distance sorting
  • Spatial distance relevancy (i.e. spatial query score)
  NOT “geocoding” – resolve “Boston” to its latitude and longitude


Typical use-case:
1. Index a location for each Lucene document given a
   latitude & longitude
2. Then search for matching documents by a circle (point-
   radius) or bounding box
3. Then sort results by distance
                                                        © 2012 The MITRE Corporation. All rights reserved.
History of Spatial for Lucene & Solr
• 2007: Local-Lucene
   • by Patric O’Leary (AOL)
• 2009-09: LL -> Lucene spatial contrib in Lucene 2.9.0
   • Local-Lucene graduates to an official Lucene contrib module
• 2009-12: Spatial Search Plugin (SSP) for Solr
   • by Chris Male (JTeam -> Orange11, ElasticSearch)
• 2010-10: SOLR-2155 a geohash prefix tree filter
   • by David Smiley (MITRE)
• 2011-01: Lucene Spatial Playground (LSP)
   • by Ryan McKinley (Voyager GIS), David, and Chris
• 2011-03: Solr 3.1 new spatial features
   • by Grant Ingersoll and Yonik Seeley (LucidWorks)
• 2012-03: LSP -> Lucene 4 spatial module + Spatial4j
   • replaces former Lucene spatial contrib module

                                                        © 2012 The MITRE Corporation. All rights reserved.
Lucene Spatial Committers
• David Smiley, MITRE
  • Bedford, MA




• Chris Male, Elastic Search
  • New Zealand




• Ryan McKinley, Voyager GIS
  • Oakland, CA



                               © 2012 The MITRE Corporation. All rights reserved.
Breakdown of Spatial Components

                                 Misc
                                 16%
          Solr adapters
               6%
                                                  Spatial4j
                                                   43%

                          Lucene spatial
                              35%




Total: 4,781 Non-Comment Source Statements (without javadocs or tests)
                                                              © 2012 The MITRE Corporation. All rights reserved.
Spatial4j: It’s all about the shapes
• Shapes
  • Types: Point, Rectangle, Circle, Polygon
  • Geospatial & Euclidean/2D implementations
  • Intersection: within, contains, intersects, disjoint
• Distance and area math utilities
• Input/Output serialization to Well Known Text (WKT)
   • Ex: POLYGON ((30 10, 10 20, 20 40, 40 40, 30 10))
• ASL licensed project independent of Apache on GitHub
• Requires JTS (3rd party LGPL) for polygon & WKT support
• Ported to .NET as Spatial4n and used by RavenDB
  • by Itamar Syn-Herskhko


                                                           © 2012 The MITRE Corporation. All rights reserved.
Lucene 4 Spatial Module
• There isn’t one best way to implement spatial indexing for
 all use-cases
  • Index just points, or other shapes too? Which?
  • Multiple shapes per field?
  • Query by Intersection? Contains? Within? Equals? Disjoint? …
  • Distance sorting? Query boost by distance?
    • Or more exotic shape relevancy like overlap percentage?
  • Tradeoff shape precision for speed?
• Multiple SpatialStrategy implementations:
  • RecursivePrefixTreeStrategy and TermQueryPrefixTreeStrategy
  • PointVectorStrategy
  • BBoxStrategy (currently in trunk, not 4x)
  • JtsGeoStrategy (in Spatial4j/LSP)           Names subject
                                                         to change!

                                                         © 2012 The MITRE Corporation. All rights reserved.
Strategy: PointVector
• Similar to Solr’s PointType / LatLonType
  • X & Y trie double fields; caching via FieldCache
• Characteristics
  • Indexes points (only)
  • Single-valued field (no multi)
  • Query by rectangle or circle (only)
     • Circle uses FieldCache (requires memory)
     • Circle does bbox pre-filter for performance
     • Relations: Intersects, Within (only)
  • Exact precision for x & y coordinates and query shape
  • Distance sort
     • Uses FieldCache (requires memory)



                                                       © 2012 The MITRE Corporation. All rights reserved.
Strategy: RecursivePrefixTree
                                   Potential rename to
• Grid / Tile / Trie / Prefix-
                                 GridFilterSpatialStrategy
 Tree based
  • With recursive decent
    algorithm
  • Or TermQueryPrefixTree
    alternative
• Choose Geohash (geo
  only) or Quad tree
• The most mature
  strategy to date
• The current evolution of
  SOLR-2155

                                   © 2012 The MITRE Corporation. All rights reserved.
Strategy: RecursivePrefixTree
• Characteristics:
  • Indexes all shapes
    • Variable precision of shape edges
       • Highly precise shapes other than point won’t scale
       • LineString’s possibly not precise enough for your needs
  • Multi-valued field support
  • Query by any shape
    • Variable precision for query shape
       • Highest precision usually scales
    • Relations: Intersects (only)
  • Distance sort (w/ multi-value support)
    • Warning: immature, won’t scale
    • Uses significant amounts of memory
  • Fast spatial filtering; no cache needed

                                                                   © 2012 The MITRE Corporation. All rights reserved.
Strategy: BBox
• Implemented with 4 doubles & 1 boolean
• Ported from ESRI Open SourceGeoPortal
• Characteristics:
  • Indexes rectangles (only)
  • Single-valued field (no multi)
  • Query by rectangle (only)
     • Supports all relations: Intersects, Within, Contains, …
  • Distance sort from box center
     • Uses FieldCache (requires memory)
  • Area overlap sorting
     • Sort results by percentage overlap between query and indexed boxes
     • Uses FieldCache (requires memory)
  • Note: FieldCache needs are somewhat high
                                                                 © 2012 The MITRE Corporation. All rights reserved.
Strategy: JtsGeoStrategy
• Stores any JTS geometry in Lucene 4’s DocValues
  • Stores WKB -- WKT in binary format
     • Full vector geometry is retained for search
  • DocValues is mostly a better FieldCache
    • Faster loading into memory
    • Can be disk resident or memory
• Characteristics:
  • Indexes any shape
  • Single valued field but can be MultiPoint, MultiPolygon, etc.
  • Query by any shape
     • Uses DocValues (memory use optional)
     • Supports all relations: intersect, within, contains, …
  • No sorting
  • Experimental / immature status

                                                                © 2012 The MITRE Corporation. All rights reserved.
Solr Adapters
• Configuration:
<fieldType name="geo" class="solr.SpatialRecursivePrefixTreeFieldType"
      spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"         distErrPct="0.025"
maxDistErr="0.000009" />
<field name="geo" type="geo" indexed="true" stored="true” multiValued="true" />

• Adding data:
<field name="geo">43.17614,-90.57341</field>
<field name="geo">POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))</field>

• Search Filter
fq=geo:”Intersects(Circle(54.729696,-98.525391 d=10))”

• Distance Sort
sort=query($sortsq) asc&sortsq={! score=distance v=$sq}&sq=store:"Intersects(Circle(54.729696,-98.525391 d=10))"




                                                                                        © 2012 The MITRE Corporation. All rights reserved.
Future Possibilities
• Solr:
  • Filter out points in multi-valued field from search results not matching
    filter
  • Heatmap/grid faceting spatial summarization
• Spatial-Temporal search
  • 3d (x,y,t) point shapes, and “track” shape queries
• Support any query shape for all Strategies
• PrefixTreeStrategy:
  • More efficient binary grid encoding; use Hilbert Curve order
  • Better multi-value point caches
  • Cache-less sort of top-N results
  • More query relations: Contains, Within
• Configurable DocValues vs. FieldCache choice
• Choose floats or configurable bits instead of forcing doubles
• CircleStrategy

                                                           © 2012 The MITRE Corporation. All rights reserved.
Thank you!
• References
  • Lucene 4 spatial javadocs
    • https://builds.apache.org/job/Lucene-Artifacts-4.x/javadoc/spatial/
  • Spatial4j at GitHub
    • https://github.com/spatial4j/spatial4j ( spatial4j.com redirect)
    • http://spatial4j.16575.n6.nabble.com -- dev@lists.spatial4j.com
  • Solr
    • http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

• Contact me:
  • David Smiley dsmiley@mitre.org dsmiley@apache.org




                                                               © 2012 The MITRE Corporation. All rights reserved.

Contenu connexe

En vedette

Geometry
GeometryGeometry
Geometrykayenta
 
Planar Geometry Terms
Planar Geometry TermsPlanar Geometry Terms
Planar Geometry Termsguest2b18d
 
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...DataStax Academy
 
Vwag nachhaltigkeitsbericht online_e
Vwag nachhaltigkeitsbericht online_eVwag nachhaltigkeitsbericht online_e
Vwag nachhaltigkeitsbericht online_eAlina Wang
 
Rk 3 gsm network
Rk 3 gsm networkRk 3 gsm network
Rk 3 gsm networkAzri Randy
 
Nice photos in the nature
Nice photos in the natureNice photos in the nature
Nice photos in the natureRenny
 
Social Media Marketing in the American and French wine industry in 2012
Social Media Marketing in the American and French wine industry in 2012Social Media Marketing in the American and French wine industry in 2012
Social Media Marketing in the American and French wine industry in 2012pierrickbouquet
 
Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...
Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...
Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...Cognizant
 
Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)
Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)
Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)Carlos Cueto
 
eFactor: Why Customer Experience is the Next Big Thing in Sales
eFactor: Why Customer Experience is the Next Big Thing in SaleseFactor: Why Customer Experience is the Next Big Thing in Sales
eFactor: Why Customer Experience is the Next Big Thing in SalesBarbara Giamanco
 
Inno culture status quo tanyer sonmezer
Inno culture status quo tanyer sonmezerInno culture status quo tanyer sonmezer
Inno culture status quo tanyer sonmezerTanyer Sonmezer
 
Outside in done right- mindmeet-handout
Outside in done right- mindmeet-handoutOutside in done right- mindmeet-handout
Outside in done right- mindmeet-handoutMarion Debruyne
 
Awit ng paghilom
Awit ng paghilomAwit ng paghilom
Awit ng paghilomabad93
 
Antecedentes de la Admistracion
Antecedentes de la AdmistracionAntecedentes de la Admistracion
Antecedentes de la AdmistracionDiana Sastoque
 
origen del callao
origen del callaoorigen del callao
origen del callaostefano4016
 

En vedette (20)

Geometry
GeometryGeometry
Geometry
 
Planar Geometry Terms
Planar Geometry TermsPlanar Geometry Terms
Planar Geometry Terms
 
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
 
Eops 2015 1_28
Eops 2015 1_28Eops 2015 1_28
Eops 2015 1_28
 
Vwag nachhaltigkeitsbericht online_e
Vwag nachhaltigkeitsbericht online_eVwag nachhaltigkeitsbericht online_e
Vwag nachhaltigkeitsbericht online_e
 
Rk 3 gsm network
Rk 3 gsm networkRk 3 gsm network
Rk 3 gsm network
 
Etapas de la industria de un deporte
Etapas de la industria de un deporteEtapas de la industria de un deporte
Etapas de la industria de un deporte
 
Nice photos in the nature
Nice photos in the natureNice photos in the nature
Nice photos in the nature
 
Social Media Marketing in the American and French wine industry in 2012
Social Media Marketing in the American and French wine industry in 2012Social Media Marketing in the American and French wine industry in 2012
Social Media Marketing in the American and French wine industry in 2012
 
Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...
Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...
Integrated Mobility QA: A Strategic Business Enabler for Enhancing End-user E...
 
Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)
Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)
Cardales (milonga campera)-Enrique Widmann_Roberto Ávila (1956)
 
eFactor: Why Customer Experience is the Next Big Thing in Sales
eFactor: Why Customer Experience is the Next Big Thing in SaleseFactor: Why Customer Experience is the Next Big Thing in Sales
eFactor: Why Customer Experience is the Next Big Thing in Sales
 
Inno culture status quo tanyer sonmezer
Inno culture status quo tanyer sonmezerInno culture status quo tanyer sonmezer
Inno culture status quo tanyer sonmezer
 
Outside in done right- mindmeet-handout
Outside in done right- mindmeet-handoutOutside in done right- mindmeet-handout
Outside in done right- mindmeet-handout
 
LIDERAZGO
LIDERAZGOLIDERAZGO
LIDERAZGO
 
Primer Portal Empleo RSE
Primer Portal Empleo RSEPrimer Portal Empleo RSE
Primer Portal Empleo RSE
 
Awit ng paghilom
Awit ng paghilomAwit ng paghilom
Awit ng paghilom
 
4 Principios de Email Marketing
4 Principios de Email Marketing4 Principios de Email Marketing
4 Principios de Email Marketing
 
Antecedentes de la Admistracion
Antecedentes de la AdmistracionAntecedentes de la Admistracion
Antecedentes de la Admistracion
 
origen del callao
origen del callaoorigen del callao
origen del callao
 

Similaire à Lucene 4 spatial

2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal updateDavid Smiley
 
The Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David SmileyThe Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David SmileyLucidworks
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL ServerEduardo Castro
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comJungsu Heo
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmaplucenerevolution
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road maplucenerevolution
 
MySQL 5.7 GIS
MySQL 5.7 GISMySQL 5.7 GIS
MySQL 5.7 GISMatt Lord
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemMarco Parenzan
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL ServerEduardo Castro
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode
 
EDB's Migration Portal - Migrate from Oracle to Postgres
EDB's Migration Portal - Migrate from Oracle to PostgresEDB's Migration Portal - Migrate from Oracle to Postgres
EDB's Migration Portal - Migrate from Oracle to PostgresEDB
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunktdthomassld
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1Stefan Schmidt
 
5 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 20185 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 2018Matthew Groves
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?SearchStax
 

Similaire à Lucene 4 spatial (20)

2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update
 
The Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David SmileyThe Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David Smiley
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL Server
 
MySQL 5.7 GIS
MySQL 5.7 GISMySQL 5.7 GIS
MySQL 5.7 GIS
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road map
 
MySQL 5.7 GIS
MySQL 5.7 GISMySQL 5.7 GIS
MySQL 5.7 GIS
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
 
No(Geo)SQL
No(Geo)SQLNo(Geo)SQL
No(Geo)SQL
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL Server
 
State of JTS 2017
State of JTS 2017State of JTS 2017
State of JTS 2017
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CIT
 
EDB's Migration Portal - Migrate from Oracle to Postgres
EDB's Migration Portal - Migrate from Oracle to PostgresEDB's Migration Portal - Migrate from Oracle to Postgres
EDB's Migration Portal - Migrate from Oracle to Postgres
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunk
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
 
5 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 20185 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 2018
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?
 

Dernier

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Dernier (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Lucene 4 spatial

  • 1. LUCENE 4 SPATIAL 2012 Basis Technology Open Source Search Conference Presented by David Smiley, MITRE © 2012 The MITRE Corporation. All rights reserved.
  • 2. About David Smiley • Working at MITRE, for 12 years • web development, Java, search • 3 Solr apps, 1 Endeca • Published 1st book on Solr; then 2nd edition (2009, 2011) • Apache Lucene / Solr committer (2012) • Specializing on spatial • Presented at Lucene Revolution (2010) & Basis O.S. Search Conference (2011) • Taught Solr classes at MITRE (2010, 2011, 2012) • Solr search consultant within MITRE and its sponsors, and privately via OpenSource Connections 2 © 2012 The MITRE Corporation. All rights reserved.
  • 3. What is Spatial Search? Primary features: • Spatial filter query • Spatial distance sorting • Spatial distance relevancy (i.e. spatial query score) NOT “geocoding” – resolve “Boston” to its latitude and longitude Typical use-case: 1. Index a location for each Lucene document given a latitude & longitude 2. Then search for matching documents by a circle (point- radius) or bounding box 3. Then sort results by distance © 2012 The MITRE Corporation. All rights reserved.
  • 4. History of Spatial for Lucene & Solr • 2007: Local-Lucene • by Patric O’Leary (AOL) • 2009-09: LL -> Lucene spatial contrib in Lucene 2.9.0 • Local-Lucene graduates to an official Lucene contrib module • 2009-12: Spatial Search Plugin (SSP) for Solr • by Chris Male (JTeam -> Orange11, ElasticSearch) • 2010-10: SOLR-2155 a geohash prefix tree filter • by David Smiley (MITRE) • 2011-01: Lucene Spatial Playground (LSP) • by Ryan McKinley (Voyager GIS), David, and Chris • 2011-03: Solr 3.1 new spatial features • by Grant Ingersoll and Yonik Seeley (LucidWorks) • 2012-03: LSP -> Lucene 4 spatial module + Spatial4j • replaces former Lucene spatial contrib module © 2012 The MITRE Corporation. All rights reserved.
  • 5. Lucene Spatial Committers • David Smiley, MITRE • Bedford, MA • Chris Male, Elastic Search • New Zealand • Ryan McKinley, Voyager GIS • Oakland, CA © 2012 The MITRE Corporation. All rights reserved.
  • 6. Breakdown of Spatial Components Misc 16% Solr adapters 6% Spatial4j 43% Lucene spatial 35% Total: 4,781 Non-Comment Source Statements (without javadocs or tests) © 2012 The MITRE Corporation. All rights reserved.
  • 7. Spatial4j: It’s all about the shapes • Shapes • Types: Point, Rectangle, Circle, Polygon • Geospatial & Euclidean/2D implementations • Intersection: within, contains, intersects, disjoint • Distance and area math utilities • Input/Output serialization to Well Known Text (WKT) • Ex: POLYGON ((30 10, 10 20, 20 40, 40 40, 30 10)) • ASL licensed project independent of Apache on GitHub • Requires JTS (3rd party LGPL) for polygon & WKT support • Ported to .NET as Spatial4n and used by RavenDB • by Itamar Syn-Herskhko © 2012 The MITRE Corporation. All rights reserved.
  • 8. Lucene 4 Spatial Module • There isn’t one best way to implement spatial indexing for all use-cases • Index just points, or other shapes too? Which? • Multiple shapes per field? • Query by Intersection? Contains? Within? Equals? Disjoint? … • Distance sorting? Query boost by distance? • Or more exotic shape relevancy like overlap percentage? • Tradeoff shape precision for speed? • Multiple SpatialStrategy implementations: • RecursivePrefixTreeStrategy and TermQueryPrefixTreeStrategy • PointVectorStrategy • BBoxStrategy (currently in trunk, not 4x) • JtsGeoStrategy (in Spatial4j/LSP) Names subject to change! © 2012 The MITRE Corporation. All rights reserved.
  • 9. Strategy: PointVector • Similar to Solr’s PointType / LatLonType • X & Y trie double fields; caching via FieldCache • Characteristics • Indexes points (only) • Single-valued field (no multi) • Query by rectangle or circle (only) • Circle uses FieldCache (requires memory) • Circle does bbox pre-filter for performance • Relations: Intersects, Within (only) • Exact precision for x & y coordinates and query shape • Distance sort • Uses FieldCache (requires memory) © 2012 The MITRE Corporation. All rights reserved.
  • 10. Strategy: RecursivePrefixTree Potential rename to • Grid / Tile / Trie / Prefix- GridFilterSpatialStrategy Tree based • With recursive decent algorithm • Or TermQueryPrefixTree alternative • Choose Geohash (geo only) or Quad tree • The most mature strategy to date • The current evolution of SOLR-2155 © 2012 The MITRE Corporation. All rights reserved.
  • 11. Strategy: RecursivePrefixTree • Characteristics: • Indexes all shapes • Variable precision of shape edges • Highly precise shapes other than point won’t scale • LineString’s possibly not precise enough for your needs • Multi-valued field support • Query by any shape • Variable precision for query shape • Highest precision usually scales • Relations: Intersects (only) • Distance sort (w/ multi-value support) • Warning: immature, won’t scale • Uses significant amounts of memory • Fast spatial filtering; no cache needed © 2012 The MITRE Corporation. All rights reserved.
  • 12. Strategy: BBox • Implemented with 4 doubles & 1 boolean • Ported from ESRI Open SourceGeoPortal • Characteristics: • Indexes rectangles (only) • Single-valued field (no multi) • Query by rectangle (only) • Supports all relations: Intersects, Within, Contains, … • Distance sort from box center • Uses FieldCache (requires memory) • Area overlap sorting • Sort results by percentage overlap between query and indexed boxes • Uses FieldCache (requires memory) • Note: FieldCache needs are somewhat high © 2012 The MITRE Corporation. All rights reserved.
  • 13. Strategy: JtsGeoStrategy • Stores any JTS geometry in Lucene 4’s DocValues • Stores WKB -- WKT in binary format • Full vector geometry is retained for search • DocValues is mostly a better FieldCache • Faster loading into memory • Can be disk resident or memory • Characteristics: • Indexes any shape • Single valued field but can be MultiPoint, MultiPolygon, etc. • Query by any shape • Uses DocValues (memory use optional) • Supports all relations: intersect, within, contains, … • No sorting • Experimental / immature status © 2012 The MITRE Corporation. All rights reserved.
  • 14. Solr Adapters • Configuration: <fieldType name="geo" class="solr.SpatialRecursivePrefixTreeFieldType" spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory" distErrPct="0.025" maxDistErr="0.000009" /> <field name="geo" type="geo" indexed="true" stored="true” multiValued="true" /> • Adding data: <field name="geo">43.17614,-90.57341</field> <field name="geo">POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))</field> • Search Filter fq=geo:”Intersects(Circle(54.729696,-98.525391 d=10))” • Distance Sort sort=query($sortsq) asc&sortsq={! score=distance v=$sq}&sq=store:"Intersects(Circle(54.729696,-98.525391 d=10))" © 2012 The MITRE Corporation. All rights reserved.
  • 15. Future Possibilities • Solr: • Filter out points in multi-valued field from search results not matching filter • Heatmap/grid faceting spatial summarization • Spatial-Temporal search • 3d (x,y,t) point shapes, and “track” shape queries • Support any query shape for all Strategies • PrefixTreeStrategy: • More efficient binary grid encoding; use Hilbert Curve order • Better multi-value point caches • Cache-less sort of top-N results • More query relations: Contains, Within • Configurable DocValues vs. FieldCache choice • Choose floats or configurable bits instead of forcing doubles • CircleStrategy © 2012 The MITRE Corporation. All rights reserved.
  • 16. Thank you! • References • Lucene 4 spatial javadocs • https://builds.apache.org/job/Lucene-Artifacts-4.x/javadoc/spatial/ • Spatial4j at GitHub • https://github.com/spatial4j/spatial4j ( spatial4j.com redirect) • http://spatial4j.16575.n6.nabble.com -- dev@lists.spatial4j.com • Solr • http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 • Contact me: • David Smiley dsmiley@mitre.org dsmiley@apache.org © 2012 The MITRE Corporation. All rights reserved.

Notes de l'éditeur

  1. Distance sorting &amp; relevancy wind up being one underlying technical requirement from the implementation
  2. Misc: is a demo web application and a Lucene spatial strategy called “JtsSpatialStrategy” that cannot be included in Lucene spatial due to licensing.
  3. Polygons support dateline wrap.Well tested.Key differentiators: ASL licensed, Geospatial support, Circles &amp; Polygons
  4. In time there will be additional unique capabilities of different implementations.TermQueryPrefixTreeStrategy too.SpatialStrategies can be combined just as people index text different ways simultaneouslySee SpatialExample.java for some code samples
  5. This is a simple strategy. I’d like to see it extended to support choosing floats or other more compact means of holding the coordinates in memory for a desired precision level.
  6. Recommend pairing with TwoDoublesStrategy for single-value distance sort
  7. Would like to see customizable to floats ore other compact