File Structure Concepts

MINIMIZE MAXIMIZE
ACCESS TIME SPACE USAGE
??? ???

So when we have a huge data,
We can use file structures to
access them quickly , thereby
making it more efficient than
data structures.

Data Management in Files

USN : 1MS10ISO34
Name : Dileep Kodira
College : MSRIT
Place : Bangalore

FIXED LENGTH FIELDS - force the
fields into predicable (fixed) length
FOUR MOST
LENGTH INDICATOR FIELDS -begin
COMMON WAYS
each field with the length indicator
OF STRUCTURING at the
DELIMITED FIELDS - place delimiter
FIELDS ARE
end of each field to separate the fields
SELF-DESCRIBING FIELDS- use
“keyword=value” expression to identify
each field and its contents

Fixing the Length of Fields
• This method relies on creating fields of
predictable fixed size.
• E.G. One may define the following class:
class Person {
public:
char last[11];
char first[11];
char address[16];
char city[16];
char state[3];
char zip[10];
}

Fixing the Length of Fields
• Disadvantages:
• a lot of wasted space due to “padding” of fields
with “blanks”
• data values may not fit into the field sizes:
» e.g. Michalopoulos is too long to fit in the array
char last[11]

• Thus the fixed-size field approach is
inappropriate for data that inherently
contains a large amount of variability in the
length of fields such as names or addresses.

Beginning Each Field with a
Length Indicator
• This method requires that each field data
be preceded with an indicator of its length
(in bytes).
E.G.
04Ames04Mary09123 Maple10StillWater02OK0574075
• One of the disadvantages of this method is that it
is more complex since it requires extracting of
numbers and strings from a single string
representing a record.

Separating Fields with
Delimiters
• The method of separating fields with a
delimiter is often used. However choosing a
right delimiter is very important.

• In many cases white-space characters
(blanks) are excellent delimiters because
they provide a clean separation between
fields when we list them on the console.

Using a “keyword = value” expression
• This method requires that each field data be
preceded with the field identifier (keyword).
E.G.
last=Amesfirst=Maryaddress=123
Maplecity=StillWaterstate=OKzip=574075
• Can be used with the delimiter method to
mark the field ends.
last=Ames|first=Mary|address=123
Maple|City=StillWater|state=OK|zip=574075

Using a “keyword = value” expression

• Advantages:
• each field provides information about itself
• good format for dealing with missing fields

• Disadvantages:
• In some application a lot of space may be wasted on
field keywords (up 50%).

Record Structures
• Files may be viewed as collections of records
which are sets of fields
• Some of the most often used methods for
organizing the records of a file are:
– require that the records be a predictable (fixed)
number of bytes in length
– require that the records be a predicable
number of fields in length

Organizing the Records of a File
– begin each record with its length indicator
(count of the of bytes in the record)
– use a second file to keep track of the beginning
byte address for each record
– place a delimiter at the end of each record to
separate it from the next record

Fixed-Length Records
• This method is a counterpart of is
analogous method for organizing files with
fix length fields.
• Fixing the sizes of fields in a record will
produce a fixed-size record.

• E.G.
class Person {
public:
char last[11];
char first[11];
char address[16];
char city[16];
char state[3];
char zip[10];
}
Will produce a fixed size record of size 67 bytes.

• The fixed length record structure, however,
does NOT imply, the fixed -length field
structure.
• Fixed-length records are frequently used as
“containers” to hold variable numbers of
variable-length fields.
• Fixed-length record structures are among
the most commonly used methods for
organizing files.

Records with a Predicable
Number of Fields
• The method specifies the number of fields
in each record.
• Regardless of the method for storing fields,
this approach allows for relatively easy
means for calculating record boundaries.

Records with a Length Indicator

• This method requires that each record
begin with a length indicator.

• This method is commonly used for handling
variable-length records.

Index File to Keep Track of
Record Addresses

• This method uses an index file (or an index
block) to keep a byte offset for each record
in the original data file. The byte offsets
(record addresses) allow us to find the
beginning of each successive record and
compute the length of each record.

Records Separated with
Delimiters
• This method is analogous to the use of
delimiters to separate fields.
• As with fields the delimiter must be well
chosen and it cannot be a part of data.
• Common delimiter is the end-of-line
character ‘n’, since records often are read
directly to the console.

A Record Structure that Uses a
Length Indicator
• Use a memory buffer to store the data that
is going to be written to the disk.
• Write down the size of the record at the
beginning of it.
• Write down the buffer contents after
writing the size.

Name : Dileep
FIELDS
RECORDS

USN : 1MS10IS034
USN : 1MS10ISO34
Name : Dileep
USN : 1MS10IS034
College Dileep
Name : : MSRIT
Name : Dileep
Kodira
Place : Bangalore
College : MSRIT
College : MSRIT
Place : Bangalore
Place : Bangalore

USN : 1MS10IS034
Name : Dileep Kodira
College : MSRIT
Place : Bangalore

USN :
1MS10IS034
Name : Dileep
Kodira
College : MSRIT
Place :
Bangalore

UNPACKING

Name : Dileep
USN : 1MS10IS034
Name : Dileep
College : MSRIT
Place : Bangalore

RUN LENGTH ENCODING
– Represents data using value and run length
– Run length defined as number of consecutive equal
values

RLE
1110011111 130215

Run Lengths

Values

RUN LENGTH ENCODING
Applications
• Useful for compressing data that contains
repeated values
– e.g. output from a filter, many consecutive values are
0.
• Very simple compared with other compression
techniques
• Reversible (Lossless) compression
– decompression is just as easy

MORSE CODING

REPRESENTS ANY ALPHA-NUMERICAL
CHARACTER USING TWO SYMBOLS
AND VERIETY OF SPACES BETWEEN THEM

HUFFMAN CODING
• Suppose we have a message consisting of 5 symbols, e.g.
[ ]
• How can we code this message using 0/1 so the coded
message will have minimum length (for transmission or
saving!)

• 5 symbols  at least 3 bits
• For a simple encoding,
length of code is 10*3=30 bits

HUFFMAN CODING
• Intuition: Those symbols that are more frequent should have
smaller codes, yet since their length is not the same, there
must be a way of distinguishing each code

• For Huffman code,
length of encoded message
will be
=3*2 +3*2+2*2+3+3=24bits

Thank you

Dileep Kodira

File Structure Concepts

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à File Structure Concepts

Similaire à File Structure Concepts (20)

Dernier

Dernier (20)

File Structure Concepts

Notes de l'éditeur