These slides afford in shallow depth the index management question. There are some example on how your choice can change your relation in terms of I/O accesses
Disk manager
The disk manager provides the following commands for a page
•
•
•
•
allocate a page
deallocate a page
read a page
write a page
The size of a page is chosen to be the size of a disk block and pages are stored as disk
blocks so that reading or writing a page can be done in one disk I/O
Andrea Giuliano @bit_shark
Buffer manager
is the software layer responsible for bringing pages from disk to main memory
as needed and manages the available main memory by partitioning it into a
collection of pages
the buffer pool
Andrea Giuliano @bit_shark
Dense index
An index is dense if every value of the search key that appears in the data
file appears also in at least one data entry of the index
Andrea Giuliano @bit_shark
Sparse index
An index is sparse if every value of the search key that appears in the data
entry points to a page of the data record
Andrea Giuliano @bit_shark
Search in a b-tree
fan-out
n leaves
Cost: logF n
Andrea Giuliano @bit_shark
Insert and delete
Insert and delete
operations must keep the
tree balanced towards
split, redistribution and
coalesce techniques.
..too deep
Andrea Giuliano @bit_shark
How can I compute I/O accesses?
Andrea Giuliano @bit_shark
Book case
BOOK
code
author
cost
publisher
•
•
•
•
•
2.000.000 records (tuples)
200.000 pages
10 data record in a page
200 records with the same value of the attribute cost (on
average)
dense non-clustering B+-tree index with search key cost
Query:
ask for code, author, publisher af all books with a given cost
how many page accesses do we
need to answer to the query?
Andrea Giuliano @bit_shark
Book case
BOOK
code
author
cost
publisher
•
•
•
•
•
2.000.000 records (tuples)
200.000 pages
10 data record in a page
200 records with the same value of the attribute cost (on
average)
dense non-clustering B+-tree index with search key cost
Let’s build the index structure
!
•
•
•
we know that each tuple has 4 field so in each page there are 40 fields
we can infer that 20 data entries fit in one leaf page of the index
so we have a fan-out of 20
Andrea Giuliano @bit_shark
Book case
fan-out: 20
…
We know there is an occupancy factor of 67% we have 13 data
entries in the leaves (each of which can contain 20 data entries)
How many leaves are there in the tree?
2.000.000/13 = 153.846 leaves
Andrea Giuliano @bit_shark
Book case
fan-out: 20
…
There are 153.846 leaves
In order to go to the leaves we need
log20 (153.846) = 4 I/O page accesses
Andrea Giuliano @bit_shark
Book case
fan-out: 20
…
Remember, we have on average 200 records with the same value of the
attribute cost therefore 200/13 = 15 pages (on average)
We need to visit these leaves because the index is dense and for each tuple
we have to access the 200 data record in order to obtain the other attributes
Andrea Giuliano @bit_shark
Book case
BOOK
code
author
cost
publisher
•
•
•
•
•
2.000.000 records (tuples)
200.000 pages
10 data record in a page
200 records with the same value of the attribute cost (on
average)
sparse clustering B+-tree index with search key cost
Query:
ask for code, author, publisher af all books with a given cost
how many page accesses do we
need to answer to the query?
Andrea Giuliano @bit_shark
Book case
BOOK
code
author
cost
publisher
•
•
•
•
•
2.000.000 records (tuples)
200.000 pages
10 data record in a page
200 records with the same value of the attribute cost (on
average)
sparse clustering B+-tree index with search key cost
Let’s build the index structure
!
•
•
•
we know that each tuple has 4 field so in each page there are 40 fields
we can infer that 20 data entries fit in one leaf page of the index
so we have a fan-out of 20
Andrea Giuliano @bit_shark
Book case
fan-out: 20
…
We know there is an occupancy factor of 67% we have 13 data
entries in the leaves (each of which can contain 20 data entries)
BUT each data entry points to a data record page (and not to a tuple)
How many data record pages do we have?
2.000.000/10 = 200.000 data record pages
Andrea Giuliano @bit_shark
Book case
fan-out: 20
…
We know there is an occupancy factor of 67% we have 13 data
entries in the leaves (each of which can contain 20 data entries)
BUT each data entry points to a data record (and not to a tuple)
How many leaves there are in the tree?
200.000/13 = 15.384 leaves
Andrea Giuliano @bit_shark
Book case
fan-out: 20
…
There are 15.384 leaves
In order to go to the leaves we need
log20 (15.384) = 3 I/O pages accesses
Remember, we have on average 200 data records with the same value of
the attribute cost therefore 200/10 = 20 data record pages to visit
Andrea Giuliano @bit_shark