SlideShare une entreprise Scribd logo
1  sur  10
Télécharger pour lire hors ligne
Search with Polygons


Another Approach to Solr Geospatial Search
Dr. Andrew L. Urquhart

May 10, 2012

                                           Copyright © 2012 Raytheon Company. All rights reserved.
                     Customer Success Is Our Mission is a registered trademark of Raytheon Company.
What is the “Burning Platform”?
§  Need to break dependency on expensive licenses for
    proprietary database
 –  Major cost driver
 –  Unsustainable in current economic environment
§  Solr identified as promising replacement candidate
 –    Excellent cost
 –    Excellent performance
 –    Excellent access to source code
 –    Major weakness in required Geospatial Search capability
 –    Is Geospatial Search weakness mitigation possible?
      §  Must index points for search by polygons
      §  Should index polygons for search by polygons




            Solr promising, Polygon Geospatial Search needed
                                                                5/16/12   2
What Has Been Produced?
§  A single add-in JAR file plus Schema enhancements
 –  Older variant requires a GPL library for point-in-polygon support
 –  Newer variant requires no external libraries
§  Internals use inherent three-dimensional mathematics
 –  LUCENE-3795/“Lucene Spatial Playground” geospatial search capability uses
    JTS library for polygon support
 –  JTS uses two-dimensional mathematics
 –  JTS has greater vulnerability to special points
    §  North and South Poles
    §  180° meridian
    §  Potential problem for customer applications
 –  JTS supports complex polygons
    §  Alternative approach only supports simple polygons at this time



             Single JAR file using 3-D internal mathematics
                                                                        5/16/12   3
What is the Magic?
§  Variant geohash coding
 –  64-bit long integers instead of strings
 –  Three most significant bits for octants of Earth’s surface
    §  Dividing at equator, prime meridian, 90° E/W meridians, 180° meridian
 –  Followed by three-bit groups
    §  One stop/continue bit
    §  One north/south split bit
    §  One east/west split bit
 –  Allows precision down to 10 cm × 10 cm squares at equator
 –  Produces various-size “tiles” representing parts of Earth’s surface
§  Points indexed by the smallest tile which contains the point
   E/W N/S E/W C/S N/S E/W C/S N/S E/W C/S N/S E/W C/S N/S E/W



          Indexing using 64-bit integers for trie-driven search
                                                                            5/16/12   4
What About Polygons?
§  Polygons indexed as collection of tiles inside polygon
 –  Larger tiles completely contained in indexed polygon are not subdivided
 –  Smallest indexed tiles may extend outside indexed polygon




              Polygons indexed with series of hash codes
                                                                              5/16/12   5
What About Polygon Search?
§  Search polygon converted to tiles using indexing conversion
    process
 –  Possible to get too many tile indices to search
    §  Risks Lucene complaints about too many of BooleanClauses
    §  Consolidate adjacent indices into ranges
    §  Reduce tiling precision
        –  Reduce number of ranges
        –  Produce acceptable number of BooleanClauses
§  Results filtered by original search polygon
 –  Requires storage of original geometry data in addition to index
 –  No filter query required
    §  Index always accessed with NumericRangeQuery
    §  Insert custom logic wrapping NumericRangeQuery




           Search similar to indexing with additional filtering
                                                                      5/16/12   6
How Is This Capability Used?
§  Indexing accessed using custom FieldTypes in schema
 –  Specific types for each supported geometry type
 –  A general type to allow polymorphic geometry types
    §  Trade-off is greater application coupling
 –  Specific type classes transform inputs and hand-off to general type class
 –  Indexing writes out two fields
    §  Geospatial tile index
    §  Original geometry storage
§  Search accessed using custom QParserPlugin
 –  Detects special suffixes on search field name to determine geometry type
 –  Converts input to geospatial tile index collection
 –  Builds Lucene query structure including custom and standard classes




           New schema FieldTypes and new QParserPlugin
                                                                            5/16/12   7
What Geometries Are Supported?
§  Points
  –  Specified by latitude and longitude
§  Polygons
  –  Specified by latitude-longitude pairs
§  Latitude-Longitude Boxes
  –  Specified by two latitude-longitude pairs specifying opposite corners
  –  Internally converted to polygons
§  Point-Radii
  –  Specified by latitude and longitude of center plus radius in meters, kilometers,
     statute miles, or nautical miles
  –  Assumes spherical Earth NOT WGS-84 ellipsoid
     §  Errors accepted for search
  –  Internally converted to approximating polygons



          Latitude-Longitude Boxes and Point-Radii supported
                                                                               5/16/12   8
How Can the Public Get This?
§  Currently working Intellectual Property issues
 –  Employer required provisional patent application submission before Lucene
    Revolution abstract could be submitted
    §  Could protect public use of license assuming public release
 –  Customer has Unrestricted Rights
    §  Customer can release to public open source community
    §  Customer may release to public open source community
        –  Customer dislikes proprietary solutions
§  Also need to work packaging issues such as a name




           Not yet available to public, but that may change
                                                                          5/16/12   9
Summary
§  Solr is excellent choice for our replacement of expensive
    database
§  Geospatial Search with Polygons in Solr is possible and
    implemented
 –  Can be used with or without LUCENE-3795/Lucene Spatial Playground”
    approach
 –  Inherent 3-dimensional mathematics not found in LUCENE-3795 polygon
    support
 –  Stores and uses both indices and original geometries
 –  No support for complex polygons at this time
§  Capabilities accessed with new FieldTypes and a new
    QParserPlugin
§  Not yet released to public



                                                                      5/16/12   10

Contenu connexe

Tendances

NetVLAD: CNN architecture for weakly supervised place recognition
NetVLAD:  CNN architecture for weakly supervised place recognitionNetVLAD:  CNN architecture for weakly supervised place recognition
NetVLAD: CNN architecture for weakly supervised place recognitionGeunhee Cho
 
Kernel Descriptors for Visual Recognition
Kernel Descriptors for Visual RecognitionKernel Descriptors for Visual Recognition
Kernel Descriptors for Visual RecognitionPriyatham Bollimpalli
 
Geographical Data Management for Web Applications
Geographical Data Management for Web ApplicationsGeographical Data Management for Web Applications
Geographical Data Management for Web ApplicationsSymeon Papadopoulos
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Dongmin Choi
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Universitat de Barcelona
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachConvolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachUniversitat de Barcelona
 

Tendances (9)

Raster package jacob
Raster package jacobRaster package jacob
Raster package jacob
 
NetVLAD: CNN architecture for weakly supervised place recognition
NetVLAD:  CNN architecture for weakly supervised place recognitionNetVLAD:  CNN architecture for weakly supervised place recognition
NetVLAD: CNN architecture for weakly supervised place recognition
 
Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
 
Kernel Descriptors for Visual Recognition
Kernel Descriptors for Visual RecognitionKernel Descriptors for Visual Recognition
Kernel Descriptors for Visual Recognition
 
Geographical Data Management for Web Applications
Geographical Data Management for Web ApplicationsGeographical Data Management for Web Applications
Geographical Data Management for Web Applications
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...
 
QGIS training class 2
QGIS training class 2QGIS training class 2
QGIS training class 2
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachConvolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approach
 

Similaire à Search with Polygons: Another Approach to Solr Geospatial Search

DGGS & Python @ GeoPython 2017
DGGS & Python @ GeoPython 2017DGGS & Python @ GeoPython 2017
DGGS & Python @ GeoPython 2017Clinton Dow
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation George Percivall
 
HDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemHDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemKonstantin V. Shvachko
 
A Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph AnalyticsA Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph AnalyticsDonald Nguyen
 
Barcelona - LIBER - OpenSource
Barcelona - LIBER - OpenSourceBarcelona - LIBER - OpenSource
Barcelona - LIBER - OpenSourcePetr Pridal
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
Using Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User ExperienceUsing Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User ExperienceLucidworks (Archived)
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020GEO Analytics Canada
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceMarin Dimitrov
 
EarthCube Activities at DOE by Dan King, DOE Geothermal Technologies Office F...
EarthCube Activities at DOE by Dan King, DOE Geothermal Technologies Office F...EarthCube Activities at DOE by Dan King, DOE Geothermal Technologies Office F...
EarthCube Activities at DOE by Dan King, DOE Geothermal Technologies Office F...EarthCube
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache SparkGeospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache SparkDatabricks
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...DataStax Academy
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Geographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresGeographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresKostis Kyzirakos
 
Efficient Query Processing in Geographic Web Search Engines
Efficient Query Processing in Geographic Web Search EnginesEfficient Query Processing in Geographic Web Search Engines
Efficient Query Processing in Geographic Web Search EnginesYen-Yu Chen
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19ExtremeEarth
 

Similaire à Search with Polygons: Another Approach to Solr Geospatial Search (20)

DGGS & Python @ GeoPython 2017
DGGS & Python @ GeoPython 2017DGGS & Python @ GeoPython 2017
DGGS & Python @ GeoPython 2017
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation
 
Phd defense slides
Phd defense slidesPhd defense slides
Phd defense slides
 
HDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemHDFS for Geographically Distributed File System
HDFS for Geographically Distributed File System
 
A Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph AnalyticsA Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph Analytics
 
Barcelona - LIBER - OpenSource
Barcelona - LIBER - OpenSourceBarcelona - LIBER - OpenSource
Barcelona - LIBER - OpenSource
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
Using Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User ExperienceUsing Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User Experience
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 
EarthCube Activities at DOE by Dan King, DOE Geothermal Technologies Office F...
EarthCube Activities at DOE by Dan King, DOE Geothermal Technologies Office F...EarthCube Activities at DOE by Dan King, DOE Geothermal Technologies Office F...
EarthCube Activities at DOE by Dan King, DOE Geothermal Technologies Office F...
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache SparkGeospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache Spark
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Spark at Zillow
Spark at ZillowSpark at Zillow
Spark at Zillow
 
Geographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresGeographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF Stores
 
Efficient Query Processing in Geographic Web Search Engines
Efficient Query Processing in Geographic Web Search EnginesEfficient Query Processing in Geographic Web Search Engines
Efficient Query Processing in Geographic Web Search Engines
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
 

Plus de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Plus de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Dernier

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Dernier (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Search with Polygons: Another Approach to Solr Geospatial Search

  • 1. Search with Polygons Another Approach to Solr Geospatial Search Dr. Andrew L. Urquhart May 10, 2012 Copyright © 2012 Raytheon Company. All rights reserved. Customer Success Is Our Mission is a registered trademark of Raytheon Company.
  • 2. What is the “Burning Platform”? §  Need to break dependency on expensive licenses for proprietary database –  Major cost driver –  Unsustainable in current economic environment §  Solr identified as promising replacement candidate –  Excellent cost –  Excellent performance –  Excellent access to source code –  Major weakness in required Geospatial Search capability –  Is Geospatial Search weakness mitigation possible? §  Must index points for search by polygons §  Should index polygons for search by polygons Solr promising, Polygon Geospatial Search needed 5/16/12 2
  • 3. What Has Been Produced? §  A single add-in JAR file plus Schema enhancements –  Older variant requires a GPL library for point-in-polygon support –  Newer variant requires no external libraries §  Internals use inherent three-dimensional mathematics –  LUCENE-3795/“Lucene Spatial Playground” geospatial search capability uses JTS library for polygon support –  JTS uses two-dimensional mathematics –  JTS has greater vulnerability to special points §  North and South Poles §  180° meridian §  Potential problem for customer applications –  JTS supports complex polygons §  Alternative approach only supports simple polygons at this time Single JAR file using 3-D internal mathematics 5/16/12 3
  • 4. What is the Magic? §  Variant geohash coding –  64-bit long integers instead of strings –  Three most significant bits for octants of Earth’s surface §  Dividing at equator, prime meridian, 90° E/W meridians, 180° meridian –  Followed by three-bit groups §  One stop/continue bit §  One north/south split bit §  One east/west split bit –  Allows precision down to 10 cm × 10 cm squares at equator –  Produces various-size “tiles” representing parts of Earth’s surface §  Points indexed by the smallest tile which contains the point E/W N/S E/W C/S N/S E/W C/S N/S E/W C/S N/S E/W C/S N/S E/W Indexing using 64-bit integers for trie-driven search 5/16/12 4
  • 5. What About Polygons? §  Polygons indexed as collection of tiles inside polygon –  Larger tiles completely contained in indexed polygon are not subdivided –  Smallest indexed tiles may extend outside indexed polygon Polygons indexed with series of hash codes 5/16/12 5
  • 6. What About Polygon Search? §  Search polygon converted to tiles using indexing conversion process –  Possible to get too many tile indices to search §  Risks Lucene complaints about too many of BooleanClauses §  Consolidate adjacent indices into ranges §  Reduce tiling precision –  Reduce number of ranges –  Produce acceptable number of BooleanClauses §  Results filtered by original search polygon –  Requires storage of original geometry data in addition to index –  No filter query required §  Index always accessed with NumericRangeQuery §  Insert custom logic wrapping NumericRangeQuery Search similar to indexing with additional filtering 5/16/12 6
  • 7. How Is This Capability Used? §  Indexing accessed using custom FieldTypes in schema –  Specific types for each supported geometry type –  A general type to allow polymorphic geometry types §  Trade-off is greater application coupling –  Specific type classes transform inputs and hand-off to general type class –  Indexing writes out two fields §  Geospatial tile index §  Original geometry storage §  Search accessed using custom QParserPlugin –  Detects special suffixes on search field name to determine geometry type –  Converts input to geospatial tile index collection –  Builds Lucene query structure including custom and standard classes New schema FieldTypes and new QParserPlugin 5/16/12 7
  • 8. What Geometries Are Supported? §  Points –  Specified by latitude and longitude §  Polygons –  Specified by latitude-longitude pairs §  Latitude-Longitude Boxes –  Specified by two latitude-longitude pairs specifying opposite corners –  Internally converted to polygons §  Point-Radii –  Specified by latitude and longitude of center plus radius in meters, kilometers, statute miles, or nautical miles –  Assumes spherical Earth NOT WGS-84 ellipsoid §  Errors accepted for search –  Internally converted to approximating polygons Latitude-Longitude Boxes and Point-Radii supported 5/16/12 8
  • 9. How Can the Public Get This? §  Currently working Intellectual Property issues –  Employer required provisional patent application submission before Lucene Revolution abstract could be submitted §  Could protect public use of license assuming public release –  Customer has Unrestricted Rights §  Customer can release to public open source community §  Customer may release to public open source community –  Customer dislikes proprietary solutions §  Also need to work packaging issues such as a name Not yet available to public, but that may change 5/16/12 9
  • 10. Summary §  Solr is excellent choice for our replacement of expensive database §  Geospatial Search with Polygons in Solr is possible and implemented –  Can be used with or without LUCENE-3795/Lucene Spatial Playground” approach –  Inherent 3-dimensional mathematics not found in LUCENE-3795 polygon support –  Stores and uses both indices and original geometries –  No support for complex polygons at this time §  Capabilities accessed with new FieldTypes and a new QParserPlugin §  Not yet released to public 5/16/12 10