4. Geo in databases challenges
Data is complex to store in SQL
Data is bi dimensional
Data is dense
Data is huge
Why
5. Multiples dimensions but B-trees sort on one
Query dependent index sorting calculation
New data structures and algorithms to handle dimensions
A two phases search : select and then filter
Origin (challenge)
6. Geographic Information Systems
handling of geometric objects
The origins of geography in the information systems are in the needs
administrations had to handle data of the real world :
Geology / Geography
Roads, administrative areas for cadastral surveys
Census data
Infrastructure elements (water delivery network, electrical delivery
network, communication network)
Other needs came when the data became available and use the same
tools :
Geo marketing (market areas)
Origin (needs)
7. All you ever hated about SQL … and more !
Complex SQL additions
Full size complex normalized API
Vendor dependent implementations
Not scalable
How
8. The Open Geospatial Consortium edits a norm : OpenGIS
Oracle
SQL Server
Quad Trees / R-Trees
4 level Grid Index
Oracle 4 side dev (1984)
Since 2008 version (2007)
integrated in Oracle 7 (1992)
PostgreSQL
Spatialite
R-tree-over-GiST
R-Trees
since PostGIS 1.0 for 8.0
since 3.6.0 (Mar 2008)
(Apr 2005)
MySQL since Feb 2005, DB2 Spatial Extender since July 2006, Ingres added
support very recently
Hibernate Spatial is a generic access to OpenGis implementations
GIS Software as ESRI, MapInfo, GeoConcept, QuantumGIS use this standard to
access data
Current Implementations (traditionnal SGBD)
9.
10. Do we need all this ?
Is Geo only for geo
centric companies ?
Puzzled ?
11. LBS changed everything !
Maps, geocoding & route planning available
Platforms handle millions of hits/day
Available through multiples APIs
Often for free
How
12. MAPS GEOCODING
Data is huge and complex Data is huge
objects
Indexing is geo
Not a geo problem
Processing capabilities required Expertise extremely valued
Provided Provided
ROUTE PLANNING POI SEARCH
Data is huge Data is less huge (your
Not a geo problem business size)
Not shard able Indexing is geo
May shard
Provided Less relevant
How
13. Location aware data
handling of data associated with a latitude/longitude tuple
Location became a search criterion :
Geo search
The map/the geography is the center of the search process
Proximity search
The location is one in many criteria to refine a search
Origin (needs)
16. Why does Geo fits a NoSQL approach ?
Geo does not fit in traditional ‘pure’ DBMS : First normal form
(1NF), many dimensions in one column break the rules
(48,23) <?> (47,25)
Geo Objects hard to be strictly defined by SQL types : they are
fickle
Tim Anglade ‘No SQL for fun and profit’ : Geo/hierarchical is
one of seven forms of NoSQL to date
Geo as a NoSQL Technology
17. Extensions to SQL or NoSQL data stores
Quad-trees
R-trees
Geo as a NoSQL Technology
19. Search steps
1) Select
Compute level
Compute boxes ids
Fetch boxes
2) Filter
Compute distance
Select result set
Limits
High levels
How does it work ?
21. Spatial Lucene/Solr, Elastic Search
Quad tree labels in Lucene tokens
Tile indices or GeoHash labels
GeoCouch
R-tree in Erlang
Neo4J Spatial
R-tree & quad-tree
Object can be stored as graph elements
Current Implementations (NoSQL databases)
22. MongoDb
Geo hashes into MongoDB B-trees
Shard support incoming
Spherical model since 1.7
Pincaster
In memory quad tree
Current Implementations (NoSQL databases)
24. Do it in pure SQL !!
Use a clustered long, lat index :
o Select is done by the cluster on longitude
(whish is more selective than latitude !)
o Bounding box requests are handled on the
index level as latitude is included
o Filter with distance calculation can be
done by a stored procedure on the
database side or in application code
POI Search
25. Lucene via Hibernate Search
o Available in 4.2 beta 1
o Annotation based
o Simple to step in
o Refine by usage
o DSL supported
POI Search
26. @Indexed
@Spatial
public class Hotel {
@Latitude
Double latitude;
@Longitude
Double longitude;
[...]
Sample indexation code