CS215 - Lec 9 indexing and reclaiming space in files
1.
2. Maintain Indexes.
Adding a data record with Indexing.
Deleting a data record with Indexing.
Reclaiming space.
Multilevel Index.
Dr. Hussien M.
Sharaf
2
3. Dr. Hussien M.
Sharaf
3
Structure of Indexes
Indexes must be sorted on ascending or descending
order with respect to a (one or more ) field(s).
CompanyName offset
Google 211Record1
n
n
IBM 0Record2 n
ITE 643Record3 n
Microsoft 462Record4 n
Apple Mac 985
New
record n
4. Dr. Hussien M.
Sharaf
4
Operations needed for an Index:
1. Create an index at memory by
looping on all records from the
original data file.
2. If the there is an index file, load it
into memory before using it.
3. Write the index into file at the
closing of the program.
5. Dr. Hussien M.
Sharaf
5
-Now Index is loaded at memory, the following
operations are needed:
1. Add: Add data records to the data file and
insert an index record at the correct position.
2. Delete: mark the record at data file as
deleted and delete the related record from
the index.
3. Deleting and updating data records requires
updating the offsets of all index records. Is it
the same for the adding a data record?
8. Dr. Hussien M.
Sharaf
8
1. Go to the end of data file, get current offset.
2. Data record is appended to the end of data
file.
3. An index entry is built using offset and key
of the new data record. (offset, Key)
4. The new index entry is inserted into its
correct position at sorted index list.
5. At the end of the program the index list is
saved into disk.
9. Dr. Hussien M.
Sharaf
9
1. Search for index entry by comparing target
value with the key field value.
2. Mark the index entry as deleted.
3. Get the offset of the target data record.
4. Seek for the target offset , mark the data
record as deleted.
NOTE: Data record is not actually deleted
immediately. Space reclaiming function is
required to run.
11. Dr. Hussien M.
Sharaf
11
A. Create a new file stream.
B. While not end of records
1. Read a collection of records into buffer.
2. For each record in the buffer:
If record is marked deleted, go to the next record.
Else copy record to the new file stream.
C. End While
D. Rebuild all indexes based on the new data
file.
NOTE: in the process of copying data to the
new stream, buffering is used.
12. Dr. Hussien M.
Sharaf
12
When an Index gets very big, it can not
be stored in RAM.
It should be stored on file, hence another
level of index that can be loaded into
memory is required.
Hence we need multilevel of indexing.