SlideShare a Scribd company logo
1 of 26
Disk Storage Management
1
Fall 2001 Database Systems 1
Indexing
• Indexing is a combination of methods for speeding up the
access to the data in a database
• The speed is determined by two factors
– where the data is stored on disk
– available access paths
• The primary access method to any table is a table scan,
reading a table by finding all tuples one by one.
• Indexing creates multiple access paths to the data, each of
which is called a secondary access method.
– indexing speeds up access to a table for a specific set of
attributes
Fall 2001 Database Systems 2
Indexing
• Example: items(itemid, description, price, city)
– to find all items in “Dallas”, read all tuples from secondary
storage and check if city=“Dallas” for each one (scan)
• may read many extra tuples that are not part of the answer
– suppose instead we create index itemcity for the city attribute of
the items relation
itemcity, Dallas:{t1,t5,t10}, Boston:{t2,t3,t15}, …
– to find all items for “Dallas”, we can now find the index entry for
“Dallas” and get the ids of just the tuples we want
• many fewer tuples are read from secondary storage
Disk Storage Management
2
Fall 2001 Database Systems 3
Disk terminology
rotation
Platters
(2 platters = 4 read/write surfaces
Read/write heads
one for each surface
Disk arm
controller
Track
Can read four tracks from four
surfaces at the same time (all
tracks at the same radius are
called a cylinder)
Fall 2001 Database Systems 4
Disk terminology
• Reading data from a disk involves:
– seek time -- move heads to the correct track/cylinder
– rotational latency -- wait until the required data spins under the
read/write heads (average is about half the rotation time)
– transfer time -- transfer the required data (read/write data to/from
a buffer)
• Tracks contain the same amount of information even though they
have different circumferences
track
Block/page
A block or a page is the
smallest unit of data
transfer for a disk.
Read a block / write a
block
Disk Storage Management
3
Fall 2001 Database Systems 5
Disk Space
• Disk is much cheaper than memory (1GB: $15 - $20)
• A fast disk today:
– 40 - 10 GB per disk
– 4.9 ms average read seek time
– 2.99 ms average latency
– 10,000 rpm
– 280 - 452 Mbits/sec transfer rate
– 7.04 bits per square inch density
– 12 - 3 heads, 6 - 2 disk platters
• Disk is non-volatile storage that
survives power failures
Fall 2001 Database Systems 6
Reading from disk
• Reading data from disk is extremely slow (compared to reading from
memory)
• To read a page from disk
– find the page on disk (seek+latency times)
– transfer the data to memory/buffer (total # bytes * transfer rate)
• Assume the average page size is 4KB. To retrieve a single row/tuple, we
need to load the page that contains it
– assume 300 Mbits/sec transfer rate, to read a page:
4KB=32764bits=0.03Mbits hence we take 1/10000 of a second
– 4.9 ms (seek) + 2.99 ms (latency) + 0.1 ms (transfer time) = 7.99 ms
(seek and latency times dominate!)
Disk Storage Management
4
Fall 2001 Database Systems 7
Reading from disk
• Assume the database saves a number of memory slots (each
holding exactly one page), which are called buffers
• To read / modify / write tuple t
– DISK: (read it from disk, write it to buffer)
– DB: (read it from buffer, modify)
– DISK: (write it to disk, free the buffer space)
Buffer slots
This buffer can hold
4 pages at any time
Fall 2001 Database Systems 8
Tablespaces
• Age old wisdom: if you store a set of pages in contiguous pages /
blocks on disk, then the transfer time will improve greatly (reduce
seek and latency times)
• A tablespace is an allocation of space in secondary storage
– when creating a tablespace, a DBMS requests contiguous
blocks of disk space from the OS
– the tablespace appears as a single file to the OS
– the management of physical addresses in a tablespace is
performed by the DBMS
– a DBMS can have many tablespaces
– when a table is created, it is placed in a tablespace, or
partitioned between multiple tablespaces
Disk Storage Management
5
Fall 2001 Database Systems 9
Tablespaces
CREATE TABLESPACE tspace1
DATAFILE ‘diska:file1.dat’ SIZE 20M,
DATAFILE ‘diska:file2.dat’ SIZE 40M REUSE;
CREATE TABLE temp1 (
…
TABLESPACE file1
STORAGE (initial 6144, next 6144,
minextents 1, maxextents 5)
) ;
CREATE TABLE temp2 (
…
TABLESPACE file2
STORAGE (initial 12144, next 6144,
minextents 1, maxextents 5)
) ;
tspace1
file1 file2
temp1 temp2
Actual data
Fall 2001 Database Systems 10
Tablespaces
• Create table -- assign the tuples in the table to a file in a tablespace
– when a table is created, a chunk of space is assigned to this
table, the size of this chunk is given by the “INITIALEXTENT”
– when the initial extent becomes full, a new chunk is allocated,
the size of all next chunks is given by the “NEXTEXTENT”
– can also specify
• maxextents, minextents
• pctincrease (increase the size of extents at each step)
• pctfree (how much of the extent must be left free
Disk Storage Management
6
Fall 2001 Database Systems 11
Data Storage on pages
• Layout of a single disk page (assume fixed size rows)
• To find a specific row in a page, must know
– page number (or block number) BBBBBBBB
– offset (slot number of record within the page) SSSS
– file name (which datafile/tablespace) FFFF
– ROWID is then a unique number
BBBBBBBB.SSSS.FFFF for a row
• B,S,F are hexadecimal numbers
Header
info row directory
1 2 N...
Free space
Data rows
Row N Row N-1 Row 1...
Fall 2001 Database Systems 12
Pseudocolumns
• Since each tuple has a unique rowid, we can refer to the
tuples with their rowid field
• However, rowid may
change if the tuple is
stored at a different
location (the valueof
its primary key is a
better identifier)
Disk Storage Management
7
Fall 2001 Database Systems 13
Indexing Concepts
• Indexing speeds up access to data residing on disk
– disk access is much slower than main memory access,
by orders of magnitude
– goal – minimize the number of disk accesses
• Primary access methods rely on the physical location of
data as stored in a relation
– finding “all” tuples with value “x” requires reading the
entire relation
• Secondary access methods use a directory to enable
tuples to be found more quickly based on the value of one
or more attributes (keys) in a tuple
Fall 2001 Database Systems 14
Secondary Index
• To create a simple index on column A of table T, make a
list of pairs of the form
(attribute A value, tuple rowid)
for each tuple in T
– example: secondary index for the SSN attribute
SSN ROWIDs (RID)
111-11-1111 AAAAqYAABAAAEPvAAH
222-22-2222 AAAAqYAABAAAEPvAAD
333-33-3333 AAAAqYAABAAAEPvAAG
. . . .
• This index is large and stored on the disk
Disk Storage Management
8
Fall 2001 Database Systems 15
Secondary Index
• Suppose a disk page can contain 200 index entries
from a secondary index
• To store a secondary index for a relation with 1
million tuples assuming no duplicate values requires:
1,000,000 / 200 = 5,000 disk pages
• To find a particular Person tuple in the SSN index
given his or her SSN, you must on average scan half
of the index (5,000 / 2 = 2500 disk accesses)
• If 20 tuples of the Person relation fit on a page, then
sequential scan of the relation itself needs to read on
average half the relation (50,000 / 2 = 25,000 disk
accesses)
• In this case, the secondary index helps a lot
Fall 2001 Database Systems 16
Efficiency
• Need to organize the index information in a
way that makes it efficient to access and
search
– scanning the index from the beginning is
not good enough
• Sorting the secondary index helps, but is not
sufficient
• Solution 1: build a tree index
• Solution 2: hash the index
Disk Storage Management
9
Fall 2001 Database Systems 17
Tree Indices
• Want to minimize number of disk accesses.
– each tree node requires a disk access
– therefore, trees that are broad and shallow
are preferred over trees that are narrow
and deep
• Balanced binary search tree, AVL tree, etc.
that are useful in main memory are too narrow
and deep for secondary storage.
• Need an m-way tree where m is large.
– also need a tree that is balanced
Fall 2001 Database Systems 18
B+-Tree
• A B+ -Tree of order d is a tree in which:
– each node has between d and 2d key values
– the keys values within a node are ordered
– each key in a node has a pointer immediately before
and after it
• leaf nodes: pointer following a key is pointer to
record with that key
• interior nodes: pointers point to other nodes in the
tree
– the length of the path from root to leaf is the same for
every leaf (balanced)
– the root may have fewer keys and pointers
Disk Storage Management
10
Fall 2001 Database Systems 19
Example B+ -Tree
66 69 71 762 7 11 15 22 30 41 53 54 63 78 84 93
53
11 30 66 78
B+-Tree of order 2
each node can hold up to four keys
Fall 2001 Database Systems 20
Searching in B+-Trees
Search(T, K) /* searching for tuple with key value K in tree T */
{
if T is a non-leaf node
search for leftmost key K’ in node T such that K < K’
if such a K’ exists
ptr = pointer in T immediately before K’
return the result of Search(ptr, K)
if no such K’ exists
ptr = rightmost pointer in node T
return the result of Search(ptr, K)
else if T is a leaf node
search for K in T
if found, return the pointer following K
else return NULL /* K not in tree T */
}
Disk Storage Management
11
Fall 2001 Database Systems 21
Insert Algorithm
• To insert a new tuple with key K and address rowid:
– use a modified Search algorithm to look for the leaf
node into which key K should be inserted
– insert key K followed by address rowid into the
proper place in this leaf node to maintain order and
rebalance the tree if necessary
• Rebalancing the tree
– if the leaf node has room for K and rowid, then no
rebalancing is needed
– if the leaf node has no room for K and rowid, then it
is necessary to create a new node and rebalance
the tree
Fall 2001 Database Systems 22
Rebalancing Algorithm
• Assume that K and rowid are to be inserted into leaf node L,
but L has no more room.
– create a new empty node
– put K and rowid in their proper place among the entries in
L to maintain the key sequence order -- there are 2d+1
keys in this sequence
– leave the first d keys with their rowids in node L and move
the final d+1 keys with their rowids to the new node
– copy the middle key K’ from the original sequence into the
parent node followed by a pointer to the new node
• put them immediately after the pointer to node L in the
parent node
– apply this algorithm recursively up the tree as needed
Disk Storage Management
12
Fall 2001 Database Systems 23
Insert Example
Insert record with key 57
B+–Tree of order 2
66 69 71 762 7 11 15 22 30 41 53 54 63 78 84
51
11 30 66 78
57
66 69 71 762 7 11 15 22 30 41 53 54 63 78 84
51
11 30 66 78
66 69 71 762 7 11 15 22 30 41 53 54 57 63 78 84
53
11 30 66 78
Fall 2001 Database Systems 24
Another Insert Example
66 69 71 762 7 11 15 22 30 41 53 54 57 63 78 84
51
11 30 66 78
Insert record with key 65
B+–Tree of order 2
66 69 71 762 7 11 15 22 30 41 53 54 57 63 78 84
51
11 30 66 78
65
66 69 71 762 7 11 15 22 30 41 53 54 57 63 78 84
51
11 30 66 78
66 69 71 762 7 11 15 22 30 41 53 54 78 84
51
11 30 66 78
57 63 65
57
66 69 71 762 7 11 15 22 30 41 53 54 78 84
53
11 30 57 66 78
57 63 65
Disk Storage Management
13
Fall 2001 Database Systems 25
Insertion Algorithm (1)
Insert (T, K, rowid, child)
/* insert new tuple with key K and address rowid into tree T */
/* child is NULL initially */
{
/* handle an interior node of the B+-Tree */
if T is a non-leaf node
find j such that Kj ≤ K ≤ Kj+1 for keys K1, …, Kn in T
ptr = pointer between Kj and Kj+1
Insert (ptr, K, rowid, child)
if child is NULL then return
/* must insert key and child pointer into T */
if T has space for another key and pointer
put child.key and child.ptr into T at proper place
child = NULL
return
Fall 2001 Database Systems 26
Insert Algorithm (2)
else /* must split node T */
construct sequence of keys and pointers from T with
child.key and child.ptr inserted at proper place
first d keys and d+1 pointers from sequence stay in T
last d keys and d+1 pointers from sequence move to
a new node N
child.key = middle key from sequence
child.ptr = pointer to N
if T is root
create new node containing pointer to T,
child.key, and child.ptr
make this node the new root node of the B+-Tree
return
/* handle leaf node of the B+-Tree */
If T is a leaf node
if T has space for another key and rowid
put K and rowid into T at proper place
return
Disk Storage Management
14
Fall 2001 Database Systems 27
Insert Algorithm (3)
else /* must split leaf node T */
construct sequence of keys and pointers from T with
K and rowid inserted at proper place
first d keys and d+1 pointers from sequence stay in T
last d+1 keys and d+2 pointers from sequence move to
a new node N
child.key = first key in new node N
child.ptr = pointer to N
if T was root
create new node containing pointer to T,
child.key, and child.ptr
make this node the new root node of the B+-Tree
return
}
Fall 2001 Database Systems 28
Deletion
• Assume that a tuple with key K and address rowid is
to be deleted from leaf node L. There is a problem if
after removing K and rowid from L it has fewer than d
keys remaining. To fix this:
– if a neighbor node has at least d+1 keys, then
evenly redistribute the keys and rowids with the
neighbor node and adjust the separator key in the
parent node
– otherwise, combine node L with a neighbor node
and discard the empty node
• the parent node now needs one less key and
node pointer, so recursively apply this algorithm
up the tree until all nodes have enough keys
and pointers
Disk Storage Management
15
Fall 2001 Database Systems 29
Deletion Example
Redistribute between
the second and
third leaf nodes.
66 69 71 762 7 53 54 63 78 84 93
53
11 30 66 78
B+-Tree of order 2
30
Delete key 30
30 4111 15 22 4122 4111 15
22
Fall 2001 Database Systems 30
Another Deletion Example
Cannot redistribute,
so combine the left
two leaf nodes
66 69 71 762 7 11 15 30 41 53 54 63 78 84 93
53
11 30 66 78
B+-Tree of order 2
Delete 7 from the B-Tree
2 11 15
Disk Storage Management
16
Fall 2001 Database Systems 31
Another Example Continued
B+-Tree of order 2
Delete 7 from the B+-Tree
66 69 71 7612 15 30 41 53 54 63 78 84 93
51
30 66 78
2 11 15
Node not valid,
too few pointers
Cannot redistribute,
so combine with
sibling node
66 69 71 7612 15 30 41 53 54 63 78 84 93
51
30 66 78
2 11 15 66 69 71 7612 15 30 41 53 54 63 78 84 93
30 51 66 78
2 11 15 66 69 71 7612 15 30 41 53 54 63 78 84 93
30 53 66 78
2 11 15
Fall 2001 Database Systems 32
Deletion Algorithm (1)
Delete (Parent, T, K, oldchild)
/* delete key K from Tree T */
/* Parent is parent node for T, initially NULL */
/* oldchild is discarded child node, initially NULL */
{
/* handle an interior node of the B+-Tree */
if T is a non-leaf node
find j such that Kj ≤ K ≤ Kj+1 for keys K1, …, Kn in T
ptr = pointer between Kj and Kj+1
Delete (T, ptr, K, oldchild)
if oldchild is NULL then return
/* must handle discarded child node of T */
remove oldchild and adjacent key from T
if T still has enough keys and pointers
oldchild = NULL
return
Disk Storage Management
17
Fall 2001 Database Systems 33
Deletion Algorithm (2)
/* must fix node T */
get a sibling node S of T using Parent
if S has entry keys /* redistribute S and T */
redistribute keys and adjacent pointers evenly
between S and T
K’ = middle unused key from the redistribution
replace the key in Parent between the pointers
to S and T with K’
oldchild = NULL
return
else /* merge S and T */
R = S or T, whichever is to the right of the other
oldchild = R
copy key from Parent node that is immediately
before R to the end of the node on the left
move all keys and adjacent pointers from R
to the node on the left
discard node R
return
Fall 2001 Database Systems 34
Deletion Algorithm (3)
/* handle leaf node of the B+-Tree */
if T is a leaf node
if T has extra keys
remove key K from T
oldchild = NULL
return
/* must fix node T */
get a sibling node S of T using Parent
if S has entry keys /* redistribute S and T */
redistribute keys and adjacent pointers evenly
between S and T
K’ = first key from node S or T, whichever is to
the right of the other
replace the key in Parent between the pointers
to S and T with K’
oldchild = NULL
return
Disk Storage Management
18
Fall 2001 Database Systems 35
Deletion Algorithm (4)
else /* merge S and T */
R = S or T, whichever is to the right of the other
oldchild = R
move all keys and adjacent rowids from R
to the node on the left
discard node R
return
}
Fall 2001 Database Systems 36
Analysis of B+-Trees
• Every access to a node is an access to disk and hence is
expensive.
• Analysis of Find:
– if there are n tuples in the tree, the height of the tree, h,
is bounded by h
 
ceil(log
d
(n))
– example: d = 50, tree contains 1 million records, then h
 
4
• Analysis of Insert and Delete
– finding the relevant node required h accesses
– rebalancing required O(h) accesses
– therefore, the total is O(log
d
n) accesses
Disk Storage Management
19
Fall 2001 Database Systems 37
B+-tree
• The create index command creates a B+-tree index
CREATE INDEX age_idx ON people(age)
TABLESPACE file1
PCTFREE 70
• PCTFREE defines how full each node should be
• Optimal operation is usually with nodes about 70% full
• To reduce disk accesses for sequential processing,
pointers are added to the leaf nodes that point to the
previous and next leaf nodes
Fall 2001 Database Systems 38
A B+-Tree Example
• Givens:
– disk page has capacity of 4K bytes
– each rowid takes 6 bytes and each key value takes 2
bytes
– each node is 70% full
– need to store 1 million tuples
• Leaf node capacity
– each (key value, rowid) pair takes 8 bytes
– disk page capacity is 4K, so (4*1024)/8 = 512 (key value,
rowid) pairs per leaf page
• in reality there are extra headers and pointers that we
will ignore
• Hence, the degree for the tree is about 256
Disk Storage Management
20
Fall 2001 Database Systems 39
Example Continued
• If all pages are 70% full, each page has about
512*0.7 = 359 entries
• To store 1 million tuples, requires
1,000,000 / 359 = 2786 pages at the leaf
level
2789 / 359 = 8 pages at next level up
1 root page pointing to those 8 pages
• Hence, we have a B+-tree with 3 levels, and a
total of 2786+8+1 = 2795 disk pages
Fall 2001 Database Systems 40
Duplicate Key Values
• Duplicate key values in a B+-tree can be handled.
– (key, rowid) pairs for same key value can span
multiple index nodes
• Search algorithm needs to be changed
– find leftmost entry at the leaf level for the searched
item, then scan the index from left to right following
leaf level pointers
• The insertion and deletion algorithms also require small
changes
– they are more costly and hence not always
implemented in practice
Disk Storage Management
21
Fall 2001 Database Systems 41
Bitmap Index
• For some attribute x with possible values A,B and C:
– create a list of all tuples in the relation and store their
rowids at some known location
– build an index for each value, for example for value A
• the bitmap contains a “1” at location k if tuple k has
value “A” for this attribute
• otherwise it contains a “0”
– indices with a lot of “0”s are called sparse and can be
compressed
Fall 2001 Database Systems 42
s2 15 3 s9 . . .
A
s5 15 2 s8 . . .
A
Bitmap Example
s1 10 6 s7 . . .
A
s4 10 3 s6 . . .
A
s3 15 4 s10 . . .
A
Tuples . . .
. . .Tuple
List
1 0 0 1 0 . . .
0 1 1 0 1 . . .
Bitmap for A=10
Bitmap for A=15
Disk Storage Management
22
Fall 2001 Database Systems 43
Querying with Bitmap Index
• Suppose have bitmap indices on attributes x and y
– Find if x=“A” or x=“B”, take the bitmaps for both
values and do a logical or
– Find if x=“A” and y<>“B”, compute the logical inverse
of bitmap for y=“B” and then do a logical and with
bitmap for x=“A”
• Bitmaps depend on the actual row ids of tuples
• If a tuple is deleted, its location can be removed or
swapped by another tuple (costly if the index is
compressed)
• Too many updates or attributes with too many values
lead to bitmaps that are not cost effective
Fall 2001 Database Systems 44
Row directory Tuple 1, Tuple 2, … , Tuple 10
B+-tree index
on attributes A1,…,Ak
Primary access methods
Heap: tuples are placed in the order
they are inserted
Cluster: tuples with the same values
for attributes A1,…,Ak are placed
close to each other on disk
Hash: tuples with the same hash value
are placed close to each other on disk
Secondary access
methods
The primary access
method can be anything.
Additional indexes are
created with entries that
point to actual tuples
Row directory Tuple 11, Tuple 12, … , Tuple 20
Disk Storage Management
23
Fall 2001 Database Systems 45
Clusters
• A cluster is a primary access method, it changes
the placement of tuples on disk
CREATE CLUSTER personnel
(department_number integer)
SIZE 512
STORAGE (INITIAL 100K NEXT 50K)
• In ORACLE, a cluster can be generated for many
tables containing the same set of attributes
• All tuples in different tables from the same cluster
will be placed closed to each other on disk (i.e. on
the same page and on consecutive pages)
Fall 2001 Database Systems 46
Adding tables to a cluster
Disk Storage Management
24
Fall 2001 Database Systems 47
Clusters
• Each table may belong to at most one cluster.
• Suppose we retrieve an employee tuple with deptno=10.
We find a page with this employee and read it into memory.
• If there are 20 employees in the department 10, then
chances are that all these employees are on the same
page.
• To find all employees in department 10 through 20, we can
simply read the necessary pages.
• A cluster is not an index, but we can also create a B+-tree
index on a cluster:
CREATE INDEX idx_personnel ON CLUSTER personnel;
Fall 2001 Database Systems 48
Hashing
• Hashing is another index method that changes the way tuples are
placed on disk
• A hash index on attribute A allocates an initial set of pages to store
the relation:
1
2
3
n
.
.
New tuple T
with key A
Hash function h
ranges between
1 and n
h(T. A)If multiple tuples map to the same
location/page, this is called a
collision. These tuples are placed
in an overflow page.
Disk Storage Management
25
Fall 2001 Database Systems 49
Hashing
• The number of key values is given by HASHKEYS
• Hashing is useful for finding a tuple with a given key value
• Hashing is not as useful for ranges or key values or for
sequential processing of tuples in key order
• In the best case, a tuple is found with one disk access
• In the average case, expect 1.2 disk accesses or more
(because of overflow pages)
Fall 2001 Database Systems 50
Extensible Hashing
• Assume that we originally allocate 2n pages for the hash
• Distribute tuples according to hash function mod 2
– hash the key to produce a bit string and then use the least
significant bit
• If a disk page becomes full, double the directory size instead of
creating overflow buckets
Page 0
Page 1
Hash directory
tuples
0
1
Disk Storage Management
26
Fall 2001 Database Systems 51
Extensible Hashing
• Insert into a full Page 1 – double the directory size
The full page is split into two. Its
contents are rehashed between
the original page and the new page,
using one additional bit from the string
produced by the hash function
Just created a new directory entry for
Page 3. Since Page 1 is not full, this
directory entry points to Page 1. The
contents of Page 1 will be rehashed
with Page 3 when Page 1 becomes full
rather than creating a new page.
Page 0
Page 1
Page 2
Page 3
Hash directory
00
01
10
11
New Page
Fall 2001 Database Systems 52
Extensible Hashing
• As the hash directory size grows it must be stored on
the disk
• At most two disk accesses are needed to retrieve any
tuple
– this is a better upper bound than for B+-Trees
• However, extensible hashing is not as good as B+-
Trees for range queries and sequential processing
where you want to process all the tuples of a relation
• Consequently, B+-Trees are used more frequently
than Extensible Hashing

More Related Content

What's hot

Implementation of page table
Implementation of page tableImplementation of page table
Implementation of page table
guestff64339
 

What's hot (20)

Csc4320 chapter 8 2
Csc4320 chapter 8 2Csc4320 chapter 8 2
Csc4320 chapter 8 2
 
Dynamic multi level indexing Using B-Trees And B+ Trees
Dynamic multi level indexing Using B-Trees And B+ TreesDynamic multi level indexing Using B-Trees And B+ Trees
Dynamic multi level indexing Using B-Trees And B+ Trees
 
File organisation
File organisationFile organisation
File organisation
 
Implementation of page table
Implementation of page tableImplementation of page table
Implementation of page table
 
Structure of the page table
Structure of the page tableStructure of the page table
Structure of the page table
 
File organisation
File organisationFile organisation
File organisation
 
DBMS
DBMSDBMS
DBMS
 
Hardware implementation of page table
Hardware implementation of page table Hardware implementation of page table
Hardware implementation of page table
 
Lecture storage-buffer
Lecture storage-bufferLecture storage-buffer
Lecture storage-buffer
 
File organization
File organizationFile organization
File organization
 
Operation System
Operation SystemOperation System
Operation System
 
Control dataset partitioning and cache to optimize performances in Spark
Control dataset partitioning and cache to optimize performances in SparkControl dataset partitioning and cache to optimize performances in Spark
Control dataset partitioning and cache to optimize performances in Spark
 
Memory management
Memory managementMemory management
Memory management
 
Physical architecture of sql server
Physical architecture of sql serverPhysical architecture of sql server
Physical architecture of sql server
 
11.file system implementation
11.file system implementation11.file system implementation
11.file system implementation
 
File Types in Data Structure
File Types in Data StructureFile Types in Data Structure
File Types in Data Structure
 
Concept of computer files
Concept of computer filesConcept of computer files
Concept of computer files
 
File structures
File structuresFile structures
File structures
 
Plam15 slides.potx
Plam15 slides.potxPlam15 slides.potx
Plam15 slides.potx
 
Mass Storage Structure
Mass Storage StructureMass Storage Structure
Mass Storage Structure
 

Viewers also liked (8)

[Www.pkbulk.blogspot.com]dbms04
[Www.pkbulk.blogspot.com]dbms04[Www.pkbulk.blogspot.com]dbms04
[Www.pkbulk.blogspot.com]dbms04
 
Relational Algebra,Types of join
Relational Algebra,Types of joinRelational Algebra,Types of join
Relational Algebra,Types of join
 
[Www.pkbulk.blogspot.com]file and indexing
[Www.pkbulk.blogspot.com]file and indexing[Www.pkbulk.blogspot.com]file and indexing
[Www.pkbulk.blogspot.com]file and indexing
 
Relational Algebra
Relational AlgebraRelational Algebra
Relational Algebra
 
Relational algebra
Relational algebraRelational algebra
Relational algebra
 
Relational Algebra-Database Systems
Relational Algebra-Database SystemsRelational Algebra-Database Systems
Relational Algebra-Database Systems
 
Relational algebra in dbms
Relational algebra in dbmsRelational algebra in dbms
Relational algebra in dbms
 
Entity relationship diagram (erd)
Entity relationship diagram (erd)Entity relationship diagram (erd)
Entity relationship diagram (erd)
 

Similar to [Www.pkbulk.blogspot.com]dbms12

Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data Base
Siva Rushi
 

Similar to [Www.pkbulk.blogspot.com]dbms12 (20)

Storage struct
Storage structStorage struct
Storage struct
 
Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data Base
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Unit 08 dbms
Unit 08 dbmsUnit 08 dbms
Unit 08 dbms
 
File organization 1
File organization 1File organization 1
File organization 1
 
files,indexing,hashing,linear and non linear hashing
files,indexing,hashing,linear and non linear hashingfiles,indexing,hashing,linear and non linear hashing
files,indexing,hashing,linear and non linear hashing
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
[Www.pkbulk.blogspot.com]dbms13
[Www.pkbulk.blogspot.com]dbms13[Www.pkbulk.blogspot.com]dbms13
[Www.pkbulk.blogspot.com]dbms13
 
Database (IT) Lecture Slide
Database (IT) Lecture SlideDatabase (IT) Lecture Slide
Database (IT) Lecture Slide
 
Data storage and indexing
Data storage and indexingData storage and indexing
Data storage and indexing
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
3620121datastructures.ppt
3620121datastructures.ppt3620121datastructures.ppt
3620121datastructures.ppt
 
OS Unit5.pptx
OS Unit5.pptxOS Unit5.pptx
OS Unit5.pptx
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 

More from AnusAhmad

More from AnusAhmad (18)

[Www.pkbulk.blogspot.com]dbms11
[Www.pkbulk.blogspot.com]dbms11[Www.pkbulk.blogspot.com]dbms11
[Www.pkbulk.blogspot.com]dbms11
 
[Www.pkbulk.blogspot.com]dbms10
[Www.pkbulk.blogspot.com]dbms10[Www.pkbulk.blogspot.com]dbms10
[Www.pkbulk.blogspot.com]dbms10
 
[Www.pkbulk.blogspot.com]dbms09
[Www.pkbulk.blogspot.com]dbms09[Www.pkbulk.blogspot.com]dbms09
[Www.pkbulk.blogspot.com]dbms09
 
[Www.pkbulk.blogspot.com]dbms07
[Www.pkbulk.blogspot.com]dbms07[Www.pkbulk.blogspot.com]dbms07
[Www.pkbulk.blogspot.com]dbms07
 
[Www.pkbulk.blogspot.com]dbms06
[Www.pkbulk.blogspot.com]dbms06[Www.pkbulk.blogspot.com]dbms06
[Www.pkbulk.blogspot.com]dbms06
 
[Www.pkbulk.blogspot.com]dbms05
[Www.pkbulk.blogspot.com]dbms05[Www.pkbulk.blogspot.com]dbms05
[Www.pkbulk.blogspot.com]dbms05
 
[Www.pkbulk.blogspot.com]dbms03
[Www.pkbulk.blogspot.com]dbms03[Www.pkbulk.blogspot.com]dbms03
[Www.pkbulk.blogspot.com]dbms03
 
[Www.pkbulk.blogspot.com]dbms02
[Www.pkbulk.blogspot.com]dbms02[Www.pkbulk.blogspot.com]dbms02
[Www.pkbulk.blogspot.com]dbms02
 
[Www.pkbulk.blogspot.com]dbms01
[Www.pkbulk.blogspot.com]dbms01[Www.pkbulk.blogspot.com]dbms01
[Www.pkbulk.blogspot.com]dbms01
 
9. java server faces
9. java server faces9. java server faces
9. java server faces
 
8. java script
8. java script8. java script
8. java script
 
7. struts
7. struts7. struts
7. struts
 
5. servlets
5. servlets5. servlets
5. servlets
 
4. jsp
4. jsp4. jsp
4. jsp
 
3. applets
3. applets3. applets
3. applets
 
2. http, html
2. http, html2. http, html
2. http, html
 
1. intro
1. intro1. intro
1. intro
 
6. hibernate
6. hibernate6. hibernate
6. hibernate
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

[Www.pkbulk.blogspot.com]dbms12

  • 1. Disk Storage Management 1 Fall 2001 Database Systems 1 Indexing • Indexing is a combination of methods for speeding up the access to the data in a database • The speed is determined by two factors – where the data is stored on disk – available access paths • The primary access method to any table is a table scan, reading a table by finding all tuples one by one. • Indexing creates multiple access paths to the data, each of which is called a secondary access method. – indexing speeds up access to a table for a specific set of attributes Fall 2001 Database Systems 2 Indexing • Example: items(itemid, description, price, city) – to find all items in “Dallas”, read all tuples from secondary storage and check if city=“Dallas” for each one (scan) • may read many extra tuples that are not part of the answer – suppose instead we create index itemcity for the city attribute of the items relation itemcity, Dallas:{t1,t5,t10}, Boston:{t2,t3,t15}, … – to find all items for “Dallas”, we can now find the index entry for “Dallas” and get the ids of just the tuples we want • many fewer tuples are read from secondary storage
  • 2. Disk Storage Management 2 Fall 2001 Database Systems 3 Disk terminology rotation Platters (2 platters = 4 read/write surfaces Read/write heads one for each surface Disk arm controller Track Can read four tracks from four surfaces at the same time (all tracks at the same radius are called a cylinder) Fall 2001 Database Systems 4 Disk terminology • Reading data from a disk involves: – seek time -- move heads to the correct track/cylinder – rotational latency -- wait until the required data spins under the read/write heads (average is about half the rotation time) – transfer time -- transfer the required data (read/write data to/from a buffer) • Tracks contain the same amount of information even though they have different circumferences track Block/page A block or a page is the smallest unit of data transfer for a disk. Read a block / write a block
  • 3. Disk Storage Management 3 Fall 2001 Database Systems 5 Disk Space • Disk is much cheaper than memory (1GB: $15 - $20) • A fast disk today: – 40 - 10 GB per disk – 4.9 ms average read seek time – 2.99 ms average latency – 10,000 rpm – 280 - 452 Mbits/sec transfer rate – 7.04 bits per square inch density – 12 - 3 heads, 6 - 2 disk platters • Disk is non-volatile storage that survives power failures Fall 2001 Database Systems 6 Reading from disk • Reading data from disk is extremely slow (compared to reading from memory) • To read a page from disk – find the page on disk (seek+latency times) – transfer the data to memory/buffer (total # bytes * transfer rate) • Assume the average page size is 4KB. To retrieve a single row/tuple, we need to load the page that contains it – assume 300 Mbits/sec transfer rate, to read a page: 4KB=32764bits=0.03Mbits hence we take 1/10000 of a second – 4.9 ms (seek) + 2.99 ms (latency) + 0.1 ms (transfer time) = 7.99 ms (seek and latency times dominate!)
  • 4. Disk Storage Management 4 Fall 2001 Database Systems 7 Reading from disk • Assume the database saves a number of memory slots (each holding exactly one page), which are called buffers • To read / modify / write tuple t – DISK: (read it from disk, write it to buffer) – DB: (read it from buffer, modify) – DISK: (write it to disk, free the buffer space) Buffer slots This buffer can hold 4 pages at any time Fall 2001 Database Systems 8 Tablespaces • Age old wisdom: if you store a set of pages in contiguous pages / blocks on disk, then the transfer time will improve greatly (reduce seek and latency times) • A tablespace is an allocation of space in secondary storage – when creating a tablespace, a DBMS requests contiguous blocks of disk space from the OS – the tablespace appears as a single file to the OS – the management of physical addresses in a tablespace is performed by the DBMS – a DBMS can have many tablespaces – when a table is created, it is placed in a tablespace, or partitioned between multiple tablespaces
  • 5. Disk Storage Management 5 Fall 2001 Database Systems 9 Tablespaces CREATE TABLESPACE tspace1 DATAFILE ‘diska:file1.dat’ SIZE 20M, DATAFILE ‘diska:file2.dat’ SIZE 40M REUSE; CREATE TABLE temp1 ( … TABLESPACE file1 STORAGE (initial 6144, next 6144, minextents 1, maxextents 5) ) ; CREATE TABLE temp2 ( … TABLESPACE file2 STORAGE (initial 12144, next 6144, minextents 1, maxextents 5) ) ; tspace1 file1 file2 temp1 temp2 Actual data Fall 2001 Database Systems 10 Tablespaces • Create table -- assign the tuples in the table to a file in a tablespace – when a table is created, a chunk of space is assigned to this table, the size of this chunk is given by the “INITIALEXTENT” – when the initial extent becomes full, a new chunk is allocated, the size of all next chunks is given by the “NEXTEXTENT” – can also specify • maxextents, minextents • pctincrease (increase the size of extents at each step) • pctfree (how much of the extent must be left free
  • 6. Disk Storage Management 6 Fall 2001 Database Systems 11 Data Storage on pages • Layout of a single disk page (assume fixed size rows) • To find a specific row in a page, must know – page number (or block number) BBBBBBBB – offset (slot number of record within the page) SSSS – file name (which datafile/tablespace) FFFF – ROWID is then a unique number BBBBBBBB.SSSS.FFFF for a row • B,S,F are hexadecimal numbers Header info row directory 1 2 N... Free space Data rows Row N Row N-1 Row 1... Fall 2001 Database Systems 12 Pseudocolumns • Since each tuple has a unique rowid, we can refer to the tuples with their rowid field • However, rowid may change if the tuple is stored at a different location (the valueof its primary key is a better identifier)
  • 7. Disk Storage Management 7 Fall 2001 Database Systems 13 Indexing Concepts • Indexing speeds up access to data residing on disk – disk access is much slower than main memory access, by orders of magnitude – goal – minimize the number of disk accesses • Primary access methods rely on the physical location of data as stored in a relation – finding “all” tuples with value “x” requires reading the entire relation • Secondary access methods use a directory to enable tuples to be found more quickly based on the value of one or more attributes (keys) in a tuple Fall 2001 Database Systems 14 Secondary Index • To create a simple index on column A of table T, make a list of pairs of the form (attribute A value, tuple rowid) for each tuple in T – example: secondary index for the SSN attribute SSN ROWIDs (RID) 111-11-1111 AAAAqYAABAAAEPvAAH 222-22-2222 AAAAqYAABAAAEPvAAD 333-33-3333 AAAAqYAABAAAEPvAAG . . . . • This index is large and stored on the disk
  • 8. Disk Storage Management 8 Fall 2001 Database Systems 15 Secondary Index • Suppose a disk page can contain 200 index entries from a secondary index • To store a secondary index for a relation with 1 million tuples assuming no duplicate values requires: 1,000,000 / 200 = 5,000 disk pages • To find a particular Person tuple in the SSN index given his or her SSN, you must on average scan half of the index (5,000 / 2 = 2500 disk accesses) • If 20 tuples of the Person relation fit on a page, then sequential scan of the relation itself needs to read on average half the relation (50,000 / 2 = 25,000 disk accesses) • In this case, the secondary index helps a lot Fall 2001 Database Systems 16 Efficiency • Need to organize the index information in a way that makes it efficient to access and search – scanning the index from the beginning is not good enough • Sorting the secondary index helps, but is not sufficient • Solution 1: build a tree index • Solution 2: hash the index
  • 9. Disk Storage Management 9 Fall 2001 Database Systems 17 Tree Indices • Want to minimize number of disk accesses. – each tree node requires a disk access – therefore, trees that are broad and shallow are preferred over trees that are narrow and deep • Balanced binary search tree, AVL tree, etc. that are useful in main memory are too narrow and deep for secondary storage. • Need an m-way tree where m is large. – also need a tree that is balanced Fall 2001 Database Systems 18 B+-Tree • A B+ -Tree of order d is a tree in which: – each node has between d and 2d key values – the keys values within a node are ordered – each key in a node has a pointer immediately before and after it • leaf nodes: pointer following a key is pointer to record with that key • interior nodes: pointers point to other nodes in the tree – the length of the path from root to leaf is the same for every leaf (balanced) – the root may have fewer keys and pointers
  • 10. Disk Storage Management 10 Fall 2001 Database Systems 19 Example B+ -Tree 66 69 71 762 7 11 15 22 30 41 53 54 63 78 84 93 53 11 30 66 78 B+-Tree of order 2 each node can hold up to four keys Fall 2001 Database Systems 20 Searching in B+-Trees Search(T, K) /* searching for tuple with key value K in tree T */ { if T is a non-leaf node search for leftmost key K’ in node T such that K < K’ if such a K’ exists ptr = pointer in T immediately before K’ return the result of Search(ptr, K) if no such K’ exists ptr = rightmost pointer in node T return the result of Search(ptr, K) else if T is a leaf node search for K in T if found, return the pointer following K else return NULL /* K not in tree T */ }
  • 11. Disk Storage Management 11 Fall 2001 Database Systems 21 Insert Algorithm • To insert a new tuple with key K and address rowid: – use a modified Search algorithm to look for the leaf node into which key K should be inserted – insert key K followed by address rowid into the proper place in this leaf node to maintain order and rebalance the tree if necessary • Rebalancing the tree – if the leaf node has room for K and rowid, then no rebalancing is needed – if the leaf node has no room for K and rowid, then it is necessary to create a new node and rebalance the tree Fall 2001 Database Systems 22 Rebalancing Algorithm • Assume that K and rowid are to be inserted into leaf node L, but L has no more room. – create a new empty node – put K and rowid in their proper place among the entries in L to maintain the key sequence order -- there are 2d+1 keys in this sequence – leave the first d keys with their rowids in node L and move the final d+1 keys with their rowids to the new node – copy the middle key K’ from the original sequence into the parent node followed by a pointer to the new node • put them immediately after the pointer to node L in the parent node – apply this algorithm recursively up the tree as needed
  • 12. Disk Storage Management 12 Fall 2001 Database Systems 23 Insert Example Insert record with key 57 B+–Tree of order 2 66 69 71 762 7 11 15 22 30 41 53 54 63 78 84 51 11 30 66 78 57 66 69 71 762 7 11 15 22 30 41 53 54 63 78 84 51 11 30 66 78 66 69 71 762 7 11 15 22 30 41 53 54 57 63 78 84 53 11 30 66 78 Fall 2001 Database Systems 24 Another Insert Example 66 69 71 762 7 11 15 22 30 41 53 54 57 63 78 84 51 11 30 66 78 Insert record with key 65 B+–Tree of order 2 66 69 71 762 7 11 15 22 30 41 53 54 57 63 78 84 51 11 30 66 78 65 66 69 71 762 7 11 15 22 30 41 53 54 57 63 78 84 51 11 30 66 78 66 69 71 762 7 11 15 22 30 41 53 54 78 84 51 11 30 66 78 57 63 65 57 66 69 71 762 7 11 15 22 30 41 53 54 78 84 53 11 30 57 66 78 57 63 65
  • 13. Disk Storage Management 13 Fall 2001 Database Systems 25 Insertion Algorithm (1) Insert (T, K, rowid, child) /* insert new tuple with key K and address rowid into tree T */ /* child is NULL initially */ { /* handle an interior node of the B+-Tree */ if T is a non-leaf node find j such that Kj ≤ K ≤ Kj+1 for keys K1, …, Kn in T ptr = pointer between Kj and Kj+1 Insert (ptr, K, rowid, child) if child is NULL then return /* must insert key and child pointer into T */ if T has space for another key and pointer put child.key and child.ptr into T at proper place child = NULL return Fall 2001 Database Systems 26 Insert Algorithm (2) else /* must split node T */ construct sequence of keys and pointers from T with child.key and child.ptr inserted at proper place first d keys and d+1 pointers from sequence stay in T last d keys and d+1 pointers from sequence move to a new node N child.key = middle key from sequence child.ptr = pointer to N if T is root create new node containing pointer to T, child.key, and child.ptr make this node the new root node of the B+-Tree return /* handle leaf node of the B+-Tree */ If T is a leaf node if T has space for another key and rowid put K and rowid into T at proper place return
  • 14. Disk Storage Management 14 Fall 2001 Database Systems 27 Insert Algorithm (3) else /* must split leaf node T */ construct sequence of keys and pointers from T with K and rowid inserted at proper place first d keys and d+1 pointers from sequence stay in T last d+1 keys and d+2 pointers from sequence move to a new node N child.key = first key in new node N child.ptr = pointer to N if T was root create new node containing pointer to T, child.key, and child.ptr make this node the new root node of the B+-Tree return } Fall 2001 Database Systems 28 Deletion • Assume that a tuple with key K and address rowid is to be deleted from leaf node L. There is a problem if after removing K and rowid from L it has fewer than d keys remaining. To fix this: – if a neighbor node has at least d+1 keys, then evenly redistribute the keys and rowids with the neighbor node and adjust the separator key in the parent node – otherwise, combine node L with a neighbor node and discard the empty node • the parent node now needs one less key and node pointer, so recursively apply this algorithm up the tree until all nodes have enough keys and pointers
  • 15. Disk Storage Management 15 Fall 2001 Database Systems 29 Deletion Example Redistribute between the second and third leaf nodes. 66 69 71 762 7 53 54 63 78 84 93 53 11 30 66 78 B+-Tree of order 2 30 Delete key 30 30 4111 15 22 4122 4111 15 22 Fall 2001 Database Systems 30 Another Deletion Example Cannot redistribute, so combine the left two leaf nodes 66 69 71 762 7 11 15 30 41 53 54 63 78 84 93 53 11 30 66 78 B+-Tree of order 2 Delete 7 from the B-Tree 2 11 15
  • 16. Disk Storage Management 16 Fall 2001 Database Systems 31 Another Example Continued B+-Tree of order 2 Delete 7 from the B+-Tree 66 69 71 7612 15 30 41 53 54 63 78 84 93 51 30 66 78 2 11 15 Node not valid, too few pointers Cannot redistribute, so combine with sibling node 66 69 71 7612 15 30 41 53 54 63 78 84 93 51 30 66 78 2 11 15 66 69 71 7612 15 30 41 53 54 63 78 84 93 30 51 66 78 2 11 15 66 69 71 7612 15 30 41 53 54 63 78 84 93 30 53 66 78 2 11 15 Fall 2001 Database Systems 32 Deletion Algorithm (1) Delete (Parent, T, K, oldchild) /* delete key K from Tree T */ /* Parent is parent node for T, initially NULL */ /* oldchild is discarded child node, initially NULL */ { /* handle an interior node of the B+-Tree */ if T is a non-leaf node find j such that Kj ≤ K ≤ Kj+1 for keys K1, …, Kn in T ptr = pointer between Kj and Kj+1 Delete (T, ptr, K, oldchild) if oldchild is NULL then return /* must handle discarded child node of T */ remove oldchild and adjacent key from T if T still has enough keys and pointers oldchild = NULL return
  • 17. Disk Storage Management 17 Fall 2001 Database Systems 33 Deletion Algorithm (2) /* must fix node T */ get a sibling node S of T using Parent if S has entry keys /* redistribute S and T */ redistribute keys and adjacent pointers evenly between S and T K’ = middle unused key from the redistribution replace the key in Parent between the pointers to S and T with K’ oldchild = NULL return else /* merge S and T */ R = S or T, whichever is to the right of the other oldchild = R copy key from Parent node that is immediately before R to the end of the node on the left move all keys and adjacent pointers from R to the node on the left discard node R return Fall 2001 Database Systems 34 Deletion Algorithm (3) /* handle leaf node of the B+-Tree */ if T is a leaf node if T has extra keys remove key K from T oldchild = NULL return /* must fix node T */ get a sibling node S of T using Parent if S has entry keys /* redistribute S and T */ redistribute keys and adjacent pointers evenly between S and T K’ = first key from node S or T, whichever is to the right of the other replace the key in Parent between the pointers to S and T with K’ oldchild = NULL return
  • 18. Disk Storage Management 18 Fall 2001 Database Systems 35 Deletion Algorithm (4) else /* merge S and T */ R = S or T, whichever is to the right of the other oldchild = R move all keys and adjacent rowids from R to the node on the left discard node R return } Fall 2001 Database Systems 36 Analysis of B+-Trees • Every access to a node is an access to disk and hence is expensive. • Analysis of Find: – if there are n tuples in the tree, the height of the tree, h, is bounded by h   ceil(log d (n)) – example: d = 50, tree contains 1 million records, then h   4 • Analysis of Insert and Delete – finding the relevant node required h accesses – rebalancing required O(h) accesses – therefore, the total is O(log d n) accesses
  • 19. Disk Storage Management 19 Fall 2001 Database Systems 37 B+-tree • The create index command creates a B+-tree index CREATE INDEX age_idx ON people(age) TABLESPACE file1 PCTFREE 70 • PCTFREE defines how full each node should be • Optimal operation is usually with nodes about 70% full • To reduce disk accesses for sequential processing, pointers are added to the leaf nodes that point to the previous and next leaf nodes Fall 2001 Database Systems 38 A B+-Tree Example • Givens: – disk page has capacity of 4K bytes – each rowid takes 6 bytes and each key value takes 2 bytes – each node is 70% full – need to store 1 million tuples • Leaf node capacity – each (key value, rowid) pair takes 8 bytes – disk page capacity is 4K, so (4*1024)/8 = 512 (key value, rowid) pairs per leaf page • in reality there are extra headers and pointers that we will ignore • Hence, the degree for the tree is about 256
  • 20. Disk Storage Management 20 Fall 2001 Database Systems 39 Example Continued • If all pages are 70% full, each page has about 512*0.7 = 359 entries • To store 1 million tuples, requires 1,000,000 / 359 = 2786 pages at the leaf level 2789 / 359 = 8 pages at next level up 1 root page pointing to those 8 pages • Hence, we have a B+-tree with 3 levels, and a total of 2786+8+1 = 2795 disk pages Fall 2001 Database Systems 40 Duplicate Key Values • Duplicate key values in a B+-tree can be handled. – (key, rowid) pairs for same key value can span multiple index nodes • Search algorithm needs to be changed – find leftmost entry at the leaf level for the searched item, then scan the index from left to right following leaf level pointers • The insertion and deletion algorithms also require small changes – they are more costly and hence not always implemented in practice
  • 21. Disk Storage Management 21 Fall 2001 Database Systems 41 Bitmap Index • For some attribute x with possible values A,B and C: – create a list of all tuples in the relation and store their rowids at some known location – build an index for each value, for example for value A • the bitmap contains a “1” at location k if tuple k has value “A” for this attribute • otherwise it contains a “0” – indices with a lot of “0”s are called sparse and can be compressed Fall 2001 Database Systems 42 s2 15 3 s9 . . . A s5 15 2 s8 . . . A Bitmap Example s1 10 6 s7 . . . A s4 10 3 s6 . . . A s3 15 4 s10 . . . A Tuples . . . . . .Tuple List 1 0 0 1 0 . . . 0 1 1 0 1 . . . Bitmap for A=10 Bitmap for A=15
  • 22. Disk Storage Management 22 Fall 2001 Database Systems 43 Querying with Bitmap Index • Suppose have bitmap indices on attributes x and y – Find if x=“A” or x=“B”, take the bitmaps for both values and do a logical or – Find if x=“A” and y<>“B”, compute the logical inverse of bitmap for y=“B” and then do a logical and with bitmap for x=“A” • Bitmaps depend on the actual row ids of tuples • If a tuple is deleted, its location can be removed or swapped by another tuple (costly if the index is compressed) • Too many updates or attributes with too many values lead to bitmaps that are not cost effective Fall 2001 Database Systems 44 Row directory Tuple 1, Tuple 2, … , Tuple 10 B+-tree index on attributes A1,…,Ak Primary access methods Heap: tuples are placed in the order they are inserted Cluster: tuples with the same values for attributes A1,…,Ak are placed close to each other on disk Hash: tuples with the same hash value are placed close to each other on disk Secondary access methods The primary access method can be anything. Additional indexes are created with entries that point to actual tuples Row directory Tuple 11, Tuple 12, … , Tuple 20
  • 23. Disk Storage Management 23 Fall 2001 Database Systems 45 Clusters • A cluster is a primary access method, it changes the placement of tuples on disk CREATE CLUSTER personnel (department_number integer) SIZE 512 STORAGE (INITIAL 100K NEXT 50K) • In ORACLE, a cluster can be generated for many tables containing the same set of attributes • All tuples in different tables from the same cluster will be placed closed to each other on disk (i.e. on the same page and on consecutive pages) Fall 2001 Database Systems 46 Adding tables to a cluster
  • 24. Disk Storage Management 24 Fall 2001 Database Systems 47 Clusters • Each table may belong to at most one cluster. • Suppose we retrieve an employee tuple with deptno=10. We find a page with this employee and read it into memory. • If there are 20 employees in the department 10, then chances are that all these employees are on the same page. • To find all employees in department 10 through 20, we can simply read the necessary pages. • A cluster is not an index, but we can also create a B+-tree index on a cluster: CREATE INDEX idx_personnel ON CLUSTER personnel; Fall 2001 Database Systems 48 Hashing • Hashing is another index method that changes the way tuples are placed on disk • A hash index on attribute A allocates an initial set of pages to store the relation: 1 2 3 n . . New tuple T with key A Hash function h ranges between 1 and n h(T. A)If multiple tuples map to the same location/page, this is called a collision. These tuples are placed in an overflow page.
  • 25. Disk Storage Management 25 Fall 2001 Database Systems 49 Hashing • The number of key values is given by HASHKEYS • Hashing is useful for finding a tuple with a given key value • Hashing is not as useful for ranges or key values or for sequential processing of tuples in key order • In the best case, a tuple is found with one disk access • In the average case, expect 1.2 disk accesses or more (because of overflow pages) Fall 2001 Database Systems 50 Extensible Hashing • Assume that we originally allocate 2n pages for the hash • Distribute tuples according to hash function mod 2 – hash the key to produce a bit string and then use the least significant bit • If a disk page becomes full, double the directory size instead of creating overflow buckets Page 0 Page 1 Hash directory tuples 0 1
  • 26. Disk Storage Management 26 Fall 2001 Database Systems 51 Extensible Hashing • Insert into a full Page 1 – double the directory size The full page is split into two. Its contents are rehashed between the original page and the new page, using one additional bit from the string produced by the hash function Just created a new directory entry for Page 3. Since Page 1 is not full, this directory entry points to Page 1. The contents of Page 1 will be rehashed with Page 3 when Page 1 becomes full rather than creating a new page. Page 0 Page 1 Page 2 Page 3 Hash directory 00 01 10 11 New Page Fall 2001 Database Systems 52 Extensible Hashing • As the hash directory size grows it must be stored on the disk • At most two disk accesses are needed to retrieve any tuple – this is a better upper bound than for B+-Trees • However, extensible hashing is not as good as B+- Trees for range queries and sequential processing where you want to process all the tuples of a relation • Consequently, B+-Trees are used more frequently than Extensible Hashing