FILE ORGANISATION

File organisation refers to the way in which data is stored in a file and the methods by which it can be accessed. Database is stored as a collection of files. Each file is a sequence of records. A record is a sequence of fields. Data is usually stored in the form of records.

Records usually describe entities and their attributes e.g. an employee record represents an employee entity and each field value in the record specifies some attributes of that employee such as Name, Birth_date, salary etc. Organizing a file depends on what kind of file it happens to be.

A file in the simplest form can be a text file i.e. a file which is composed of ASCII (American Standard Code for Information Interchange Text). Files can be created as binary or executable types (containing elements other than plain text).

File are keyed with attributes which help determine their use by the host operating system.

Types/Techniques of file organisation

There are three types/techniques of file organisation.

Heap (unordered)
Sorted
Hashed or Direct

Heap file organisation

This is the simplest file organisation where records are inserted at the end of the file as and when they are arrived. It is also called an unordered file. Heap file can be created or destroyed, opened and closed, inserted and deleted.

To insert record in a heap file is very fast because the incoming record is written at the end of the last page of the file.

To search or update record is very slow because linear search is performed on. To delete record in heap is also very slow because the record to be deleted is first searched for, marked as deleted and the page written back to disk. The space with deleted records is not used.

Heap files are one of the best orgainsation for bulk loading data into a table.

Sorted (ordered file organisation)

There are four methods of sorted file organisation

Sequential Accessed Method (SAM)
Line-Sequential (LSAM)
Inverted List Organisation
Indexed Sequential Access Method (ISAM)

Sequential Accessed Method (SAM) Organisation

This contains records organized by the order in which they were entered. The order of the record is fixed and a record can only be accessed by reading all the previous records i.e. reading or written the record sequentially if a record is stored in sequential file, it cannot be made shorter or longer, or deleted.

Line-sequential (LSAM)

This is file organisation that records contain only characters as data. It is like the sequential file organisation. Line sequential files are maintained by the native byte stream files of the operating system.

Indexed Sequential Accessed Method (ISAM) organisation

This is the file organisation that enables one to access records randomly or dynamically by primary and alternate key values. Using primary key, the records are sorted. For each primary key, an index value is generated and mapped with the record.

There are three areas in the disc storage.

Primary Area: This contains file records stored by key or ID numbers.
Over flow Area: Contains records area that cannot be placed in primary area.
Index Area: This contains keys of records and there locations on the disc. This method of file organisation gives flexibility of using any column as key field and index will be generated based on that. It also supports range and partial retrieval of records but an extra cost to maintain index has to be afforded.

Inverted List

This file organisation like the indexed sequential storage method, the inverted list organisation maintains an index but the indexed sequential method has a multiple index for a given key whereas the inverted list method has a single index for each key.

In an inverted list, records are not necessarily stored in a particular sequence. They are placed in the data storage area, but indexed are updated for the record keys and location here, searching is fast but updating is much slower

Hash File Organisation

In this file organisation, a hash function is used to calculate the address of the page in which the record is to ba stored instead of storing records sequentially.

The hash function can be any simple or complex mathematical function. The field on which hash function is calculated is called a hash field and if that field acts as the key of the relation then it is called a hash key. Hash file organisation is also called random or direct files because the records are randomly distributed in the file.

Advantages of file organisation

It gives room for rapid access to a record or a number of records which are related to each other.
The adding, modification or deletion of records is made easy.
There is easy storage and retrieval of records.
It reduces redundancy with the method of ensuring data integrity.