Datomic R-trees

Datomic R-trees

James Sofra
@sofra
https://github.com/jsofra/datomic-rtree

Summary
●

Motivations

●

Datomic overview

●

Datomic R-tree implementation

●

Hilbert Curves

●

Bulk loading (via Hilbert Curves)

●

Future plans

Motivations
●

I have an interest in geospatial applications
–

●

e.g. Thunderstorm probability application
(THESPA)

Datomic is an interesting database that
makes different trade-offs to other
databases
–

Wonder how far we can take the ability to
describe arbitrary structures in Datomic

Datomic Overview
●

Immutable database

●

Time-base facts (stored as entites)

●

ACID transactions

●

Expressive queries using Datalog

●

Pluggable storage

●

Flexible enough to act as row, column or graph database

●

Schema that describes attributes that can be attached to
entities
–

●

Attributes have a type; String, Long, Double, Inst, Ref etc.

Database functions
–

Stored in the database, see the in transaction value

Datomic Motivations
●

Things that make Datomic appealing for spatial data
–

Time-base nature of Datomic is useful for time series data which we
often have

–

No need to add spatial operations (union, intersection, etc.) to the
database, can be handled by libraries in the peers

–

Spatial indexes can be stored as regular data, allows for a lot of
freedom over choice of index, handling multiple indexes over subsets
of the data in space and time

–

Flexible entity structures are useful because spatial data frequently
does not fit nicely in a table

–

Immutability is surprisingly useful in lots of different applications!

R-trees
●

●

●

●

●

●

Efficient query of
multi-dimensional data
Groups nearby objects
Balanced (all leaf nodes at
same level)
Aims for nodes minimise
empty space coverage and
overlap
Designed for storage on disk
(as used in databases)

"R-Trees: A Dynamic Index Structure for Spatial Searching"
–

Guttman, A (1984)

R-trees - Insertions
●
●

●

Choose a leaf node to insert
Insert entry into leaf node and enlarge
node
If node has more than max number of
children split the node and propagate
enlargement and splits up tree

Datomic R-tree - Schema
:rtree/root

:db.type/ref

:rtree/max-children

:db.type/long

:rtree/min-children

:db.type/long

:node/children

:db.type/ref

:node/is-leaf?

:db.type/boolean

:node/entry

:db.type/ref

:bbox/min-x

:db.type/double

:bbox/min-y

:db.type/double

:bbox/max-x

:db.type/double

:bbox/max-y

:db.type/double

Datomic R-tree –
regular transaction
Transaction for
adding new entry,
calls database
function
Database function

New entry with new ID

Add new entry as
child to leaf node

Datomic R-tree –
split transaction
New entry
Remove root
Create new
leaf nodes

Add new root

Bulk loading
●

Issues with single insertion loading of R-tree
–
–

●

●

●

Becomes slow with with many insertions
The resulting tree is not as always as efficient as it
could be

Bulk loading builds a tree once from a number
of entities
Two basic approaches top-down and
bottom-up
Bulk loading does not imply bulk insertion

Bulk loading – sort based loading
●

Aims for better R-tree performance

●

Bottom-up approach

●

Sorts all entities in an order that aims to preserve locality

●

●

●

Partitions the entities into clusters that are (hopefully)
spatially collocated
Recursively apply partitioning to build up the tree
“Sort-based Query-adaptive Loading of R-trees”
–

●

D. Achakeev; B. Seeger; P. Widmayer (2012)

“Sort-based parallel loading of R-trees”
–

D. Achakeev; M. Seidemann; M. Schmidt; B. Seeger (2012)

Hilbert Curves
●

●

●

●

a continuous fractal
space-filling curve
first described by
mathematician David Hilbert in
1891
useful because it enables
mapping from 2D to 1D
preserving some notion of
locality
Other options are; Peano
curve, Z-order curve (aka
Morton Curve)

Bulk loading – hilbert sort based

●

Better Hilbert partitioning

Bulk loading via Hilbert curves
●

●

●

●

Insert all entities into Datomic (or using
existing entities)
Entities include an indexed Hilbert value
attribute
Obtain a seq of the entities using the :avet
index with the Hilbert value
Perform partioning

Bulk - hilbert-ents

Takes advantage of Datomic index API to get direct
access to the Hilbert index

Bulk - min-cost-index

List of options for the
next partition point
Must be at least
min-children in the
partition

Conclusions
●

It works!
(install-single-insertions conn 50000 20 10)
–

"Elapsed time: 119114.342783 msecs"

(install-and-bulk-load conn 50000 20 10)
–


(time (naive-intersecting all-entries search-box))
–


(time (intersecting root search-box))
–

* note these times should be regarded with suspicion since they
only use the in memory database

Future plans
●

Retractions and updates

●

Bulk insertions

●

More search and query support

●

●

Schema for supporting Meridian Shapes
and Features
Investigate other R-trees; R* tree, R+ tree

Questions?

Thanks you! Any questions?
James Sofra
@sofra

Other Interesting
Resources
●

●

"The R*-tree: an efficient and robust access method for points
and rectangles"
“OMT: Overlap Minimizing Top-down Bulk Loading Algorithm for
R-tree.”
–

●

“The Priority R-Tree: A Practically Efficient and Worst-Case
Optimal R-Tree”
–

●

L. Arge; M. de Berg; K. Yi (2004)

“Compact Hilbert Indices”
–

●

T. Lee; S. Lee (2003)

Hamilton. C (2006)

“R-Trees: Theory and Applications”
–

Manolopoulos. Y; Nanopoulos. A; Papadopoulos. A. N; Theodoridis. Y
(2006)

Datomic R-trees

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

Similaire à Datomic R-trees

Similaire à Datomic R-trees (20)

Dernier

Dernier (20)

Datomic R-trees