R-Trees are an excellent data structure for managing geo-spatial data. Commonly used by mapping applications and any other applications that use the location to customize content. Minimum Bounding Rectangle (MBR) is a commonly used concept in R-trees, which are a modified form of B-trees.
2. Instructor
Prof. Amrinder Arora
amrinder@gwu.edu
Please copy TA on emails
Please feel free to call as well
TA
Iswarya Parupudi
iswarya2291@gwmail.gwu.edu
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 2
3. CS 6213
Basics
Record / Struct
/ Arrays / LLs
Stacks /
Queues
Graphs / Trees
/ BSTs
Heaps and
PQs
Advanced
Trie, B-Tree
Splay Trees
R-Tree
Union Find
Applications
Databases
Spatial
String
In Memory
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 3
4. Antonin Guttman, U. C. Berkeley
K. A. Mohamed
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 4
6. Given a city
map, „index‟
all university
buildings in
an efficient
structure for
quick
topological
search.
6L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
7. 7
“Index”
buildings in
an efficient
structure for
quick search
Spatial object:
Contour (outline) of the area
around the building(s).
Minimum bounding region
(MBR) of the object.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
8. 8
MBR of the city
neighbourhoods.
MBR of the city
defining the
overall search
region.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
9. Mostly involves 2D regions.
Need to support 2D range queries.
Multiple return values desired: Answering a query region by reporting
all spatial objects that are fully-contained-in or overlapping the query
region (Spatial-Access Method – SAM).
In general:
Spatial data objects often cover areas in multidimensional spaces.
Spatial data objects are not well-represented by point-location.
An „index‟ based on an object‟s spatial location is desirable.
9L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
Problem Summary: To retrieve data items quickly and efficiently
according to their spatial locations.
10. A B-Tree is an ordered, dynamic, multi-way structure of order m (i.e. each
node has at most m children).
The keys and the subtrees are arranged in the fashion of a search tree.
Each node may contain a large number of keys, and the number of subtrees
in each node, then, may also be large.
The B-Tree is designed (among other objectives):
to branch out this large number of directions, and
to contain a lot of keys in each node so that the height of the tree is relatively short.
10
M
P T X
B D F G K L N O Q S V W Y ZI
E H
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
11. A height-balanced tree, similar to a B-Tree.
Index records in the leaf nodes contain pointers to the actual
spatial-objects (entries) they represent.
Each entry has a unique identifier that points to one spatial object,
and its MBR; i.e., entry = (MBR, pointer).
Spatial searching requires visiting only a small number of nodes.
The index is completely dynamic: inserts and deletes can be
intermixed with searches. (No periodic reorganization is
required.)
11L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
12. Let M be the maximum number of entries that will fit in one node.
Let m ≤ M/2 be a parameter specifying the minimum number of entries in one
node.
Then an R-Tree must satisfy the following properties:
1. Every leaf node contains between m and M index records, unless it is the
root.
2. For each index-record Entry (I, tuple-identifier) in a leaf node, I is the MBR
that spatially contains the n-dimensional data object represented by the
tuple-identifier.
3. Every non-leaf node has between m and M children, unless it is the root.
4. For each Entry (I, child-pointer) in a non-leaf node, I is the MBR that
spatially contains the regions in the child node.
5. The root has two children unless it is a leaf.
6. All leaves appear on the same level.
12L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
13. An entry E in a leaf node is defined as:
E = (I, tuple-identifier)
Where I refers to the smallest binding n-dimensional region
(MBR) that encompasses the spatial data pointed to by its tuple-
identifier.
I is a series of closed-intervals that make up each dimension of
the binding region.
Example. In 2D, I = (Ix, Iy),
where Ix = [xa, xb], and Iy = [ya, yb].
13L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
14. [Not limited to 2D – higher dimensions are certainly possible.]
In general I = (I0, I1, …, In-1) for n-dimensions, and that Ik = [ka, kb].
If either ka or kb (or both) are equal to , this means that the
spatial object extends outward indefinitely along that dimension.
14L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
15. An entry E in a non-leaf node is defined as: E = (I, child-pointer)
Where the child-pointer points to the child of this node, and I is
the MBR that encompasses all the regions in the child-node‟s
pointer‟s entries.
15
I(A) I(B) … I(M)
I(a) I(b) I(c) I(d)
B
a
b
c
d
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
17. a b c d e f g h i j k l
m n o p
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 17
18. a
b
c
d
m
a b cd e f g h i j k l
m n o p
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 18
19. a
b
c
d
m
e f
n
a b cd e f g h i j k l
m n o p
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 19
20. a
b
c
d
m
e f
n
h
g
i
o p
a b cd e f g h i j k l
m n o p
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 20
21. 21
Typical query:
Find and report
all university
building sites that
are within 5km of
the city centre.
Approach:
i.Build the R-Tree
using rectangular
regions a, b, … i.
ii.Formulate the
query range Q.
iii.Query the R-
Tree and report
all regions
overlapping Q.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
22. Let Q be the query region.
Let T be the root of the R-Tree.
Search all entry-records whose regions overlaps Q.
Search sub-trees:
If T is not leaf, then apply Search on ever child-node entry E
whose I overlaps Q.
Search leaf nodes:
If T is leaf, then check each entry E in the leaf and return E if E.I
overlaps Q.
22L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
23. 23
r2
e
r5 r8
r3 r4r1 r7r0
ic gf hba d
@ r6
@ r2 @ r5 @ r8
@ r0 @ r1 @ r7 @ r3 @ r4
R-Tree settings:
M =
m =
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
24. 24
The search algorithm descends the tree from the root in a manner
similar to a B-Tree.
More than one subtree under a node visited may need to be
searched.
Cannot guarantee good worst-case performance.
Countered by the algorithms during insertion, deletion, and update
that maintain the tree in a form that allows the search algorithm to
eliminate irrelevant regions of the indexed space.
So that only data near the search area need to be examined.
Emphasis is on the optimal placement of spatial objects with respect
to the spatial location of other objects in the structure.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
25. A Node-Overflow happens when a new Entry is added to a fully
packed node, causing the resulting number of entries in the node
to exceed the upper-bound M.
The „overflow‟ node must be split, and all its current entries, as
well as the new one, consolidated for local optimum arrangement.
A Node-Underflow happens when one or more Entries are
removed from a node, causing the remaining number of entries in
that node to fall below the lower-bound m.
The underflow node must be condensed, and its entries
dispersed for global optimum arrangement.
25L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
26. 26
New index entry-records are added to the leaves.
Nodes that overflow are split, and splits propagate up the tree.
A split-propagation may cause the tree to grow in height.
The main Insert routine
Let E = (I, tuple-identifier) be the new entry to be inserted.
Let T be the root of the R-Tree.
[Ins_1] Locate a leaf L starting from T to insert E.
[Ins_2] Add E to L. If L is already full (overflow), split L into L and L‟.
[Ins_3] Propagate MBR changes (enlarged or reduced) upwards.
[Ins_4] Grow tree taller if node split propagation causes T to split.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
27. Similar to insertion into B+-tree but may insert into any leaf; leaf
splits in case capacity exceeded.
Which leaf to insert into? (Choose Leaf)
How to split a node? (Node Split)
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 27
28. m
n
o p
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 28
29. 29
[Ins_1] Locate a leaf L starting from T to insert E = (I, tuple-identifier).
Notion (i): Select the path that would require the least enlargement to include E.I.
Notion (ii): Resolve ties by choosing the child-node with the smallest MBR.
Invoke: L = ChooseLeaf (E, T).
A B C
@rN
A
C
B
E.I
rN
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
30. 30
Algorithm: ChooseLeaf (E, N)
Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N.
Output: The leaf L where E should be inserted.
If N is leaf Then Return N
Let FS be the set of current entries in the node N
Let F = (I, child-pointer) FS, so that F.I satisfies the Insertion-
Notions
Return ChooseLeaf (E, F.child-pointer)
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
31. 31
[Ins_2] Add E to L.
Notion (i): If L has room for another entry, install E.
Notion (ii): Otherwise split L to obtain L and L‟, which between
them, will contain all previous entries in L and the new E
(consolidated for local optima).
[Ins_3] Propagate MBR changes upwards by invoking
AdjustTree (L, L‟).
Notion (i): Ascend from leaf L to the root T while adjusting the
covering rectangles MBR.
Notion (ii): If L‟ exists, propagate node splits as necessary; i.e.
attempt to install a new entry in the parent of L to point to L‟.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
32. 32
Example. Found L = @Y to insert new E =
e. R-Tree settings: M = 3, m = 1.
K
@G
a b c
@Y
X Y Z
@K
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
33. 33
Algorithm: AdjustTree (N, N’)
Inputs: (i) A node N that has had its contents modified, (ii) The
resultant split node N‟, if not NULL, that accompanies N.
Outputs: (i) N as above, (ii) N‟ as above.
If N is the root Then Return {(i) N, (ii) N‟}
Let PN be the parent node of N.
Let EN = (I_N, child-pointer_N) in PN, where child-pointer_N points
to N.
Adjust I_N so that it tightly encloses all entry regions in N.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
34. 34
If N‟ is Not NULL Then
If number of entries in PN < M-1 Then
Create a new Entry EN‟ = (I_N’, child-pointer_N’)
Install EN‟ in PN
Return AdjustTree (PN, NULL)
Else
Set {PN, PN‟} = SplitNode (PN, EN‟)
Return AdjustTree (PN, PN‟)
End If
Else
Return AdjustTree (PN, NULL)
End If
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
35. [Ins_4] Grow Tree taller.
Notion: If the recursive node split propagation causes the root to
split, then create a new root whose children are the two resulting
nodes.
35
A B C
@T (root)
E F
@C
G H
@C’
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
36. 36
The height of the R-Tree containing n entry-records is at most
logm n – 1, because the branching factor of each node is at
least m.
The maximum number of nodes is:
Worst case space utilisation for all nodes except the root is:
Nodes will tend to have more than m entries, and this will:
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
37. 37
Current index entry-records are removed from the leaves.
Nodes that underflow are condensed, and its contents redistributed
appropriately throughout the tree.
A condense propagation may cause the tree to shorten in height.
The main Delete routine
Let E = (I, tuple-identifier) be a current entry to be removed.
Let T be the root of the R-Tree.
[Del_1] Find the leaf L starting from T that contains E.
[Ins_2] Remove E from L, and condense „underflow‟ nodes.
[Ins_3] Propagate MBR changes upwards.
[Ins_4] Shorten tree if T contains only 1 entry after condense propagation.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
38. [Del_1] Find the leaf L starting from T that contains E.
Algorithm: FindLeaf (E, N)
Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N.
Output: The leaf L containing E.
If N is leaf Then
If N contains E Then Return N
Else Return NULL
Else
Let FS be the set of current entries in N.
For each F = (I, child-pointer) FS where F.I overlaps E.I Do
Set L = FindLeaf (E, F.child-pointer)
If L is not NULL Then Return L
Next F
Return NULL
End If
38L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
39. [Del_2] Remove E from L, and condense „underflow‟ nodes.
[Del_3] Propagate MBR changes upwards.
Notion (i): Ascend from leaf L to root T while adjusting covering
rectangles MBR.
Notion (ii): If after removing the entry E in L and the number of
entries in L becomes fewer than m, then the node L has to be
eliminated and its remaining contents relocated.
39L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
40. Propagate these notions upwards by invoking CondenseTree (N,
QS), where N is an R-Tree node whose entries have been modified,
and QS is the set of eliminated nodes.
Start the propagation by setting N = L, and QS = .
Re-insert the entries from the eliminated nodes in QS back into the
tree.
Entries from eliminated leaf nodes are re-inserted as new entries
using the Insert routine discussed earlier.
Entries from higher-level nodes must be placed higher in the tree so
that leaves of their dependent subtrees will be on the same level as
the leaves on the main tree.
40L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
41. Example: Delete the index entry-record b. R-Tree settings: M = 4,
m = 2.
Spatial constraint: a.I will form smallest MBR with r4.
41
r2 r6
@ r7
a b
@ r0
r0 r1
@ r2
r3 r4 r5
@ r6
c d e
@ r1
f g h
@ r3
i j
@ r4
k l m
@ r5
n
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
42. 42
Algorithm: CondenseTree (N, QS)
Inputs: (i) A node N whose entries have been modified, (ii) A set of
eliminated nodes QS.
If N is NOT the root Then
Let PN be the parent node of N.
Let EN = (I_N, child-pointer_N) in PN.
If N.entries < m Then
Delete EN from PN
Add N to QS
Else
Adjust I_N so that it tightly encloses all entry regions in N.
End If
CondenseTree (PN, QS)
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
43. 43
Else If N is root AND Q is NOT Then
For each Q QS Do
For each E Q Do
If Q is leaf Then Insert (E)
Else Insert (E) as a node entry at the same node level as
Q
End If
Next E
Next Q
End If
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
44. Why ‘re-insert’ orphaned entries?
Alternatively, like the delete routine in B-Tree (Rosenberg & Snyder, 1981),
an „underflow‟ node can be merged with whichever adjacent sibling that will
have its area increased the least, or its entries re-distributed among sibling
nodes.
Both methods can cause the nodes to split.
Eventually all changes need to be propagated upwards, anyway.
Re-insertion accomplishes the same thing, and:
It is simpler to implement (and at comparable efficiency).
It incrementally refines the spatial structure of the tree.
It prevents gradual deterioration if each entry was located permanently under
the same parent node.
44L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
45. 45
A high value of m, nearer to M, is useful when the underlying
database represented by the R-Tree is mostly used for search
inquiries with very few updates.
The height of the tree will be kept to a minimum.
High search performance is maintained.
However, the risk of overflow and underflow is also high.
A small value of m is good when frequent updates and
modifications of the underlying database is required.
The nodes are less dense.
Maintenance is less costly.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
46. Avoids multiple paths during searching.
Objects may be stored in multiple nodes
MBRs of nodes at same tree level do not overlap
On insertion/deletion the tree may change downward or upward in
order to maintain the structure
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 46
R-TreeVariants
48. Similar to other R-Trees except that the Hilbert
value of its rectangle centroid is calculated.
That key is used to guide the insertion
On an overflow, evenly divide between two nodes
Experiments has shown that this scheme
significantly improves performance and decreases
insertion complexity.
Hilbert R-tree achieves up to 28% saving in the
number of pages touched compared to R*-tree.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 48
R-TreeVariants
49. The Hilbert value of an object is found by interleaving the bits of
its x and y coordinates, and then chopping the binary string into 2-
bit strings.
Then, for every 2-bit string, if the value is 0, we replace every 1 in
the original string with a 3, and vice-versa.
If the value of the 2-bit string is 3, we replace all 2‟s and 0‟s in a
similar fashion.
After this is done, you put all the 2-bit strings back together and
compute the decimal value of the binary string;
This is the Hilbert value of the object.
http://www-users.cs.umn.edu/research/shashi-
group/CS8715/exercise_ans.doc
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 49
R-TreeVariants
50. Proposed by Norbert Beckmann, Hans-Peter Kriegel, Ralf
Schneider, and Bernhard Seeger in 1990
Same algorithm as the regular R-tree for query and delete
operations.
When inserting, the R*-tree uses a combined strategy.
For leaf nodes, overlap is minimized
For inner nodes, enlargement and area are minimized.
When splitting, the R*-tree uses a topological split that chooses a
split axis based on perimeter, then minimizes overlap.
In addition to an improved split strategy, the R*-tree also tries to
avoid splits by reinserting objects and subtrees into the tree,
inspired by the concept of balancing a B-tree.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 50
R-TreeVariants
51. MBR: Minimum Bounding Rectangle
R-Trees are an extremely compelling data structure for spatial
data.
Largely based on B-Tree (Can be considered a generalization of
B-Tree)
Can support more than two dimensions
Support same basic operations (deletion, searching, insertion,
update, etc.)
Many variants of R-Trees are available
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 51