Parallel Database improve processing and input/output speeds by using multiple CPU and disks in parallel. A Parallel Database system seeks to improve performance through parallelization of various operations, such as loading data, building indexes and evaluating queries. In Parallel processing, many operations are performed simultaneously, as opposed to serial processing, in which the computational steps are performed sequentially.
Table of Contents
Organizations of every size benefit from databases because they improve the management of information. The database has a server, a specialized program that oversees all user requests. Organization use parallel database approach for a large user base and millions of records to process. They are fast, flexible and reliable.
Architecture For Parallel Databases
There are three main architectures for building parallel DBMS
(i) Shared Memory (ii)Shared Disk System(iii) Shared Nothing System
1.Shared Memory System: This is where multiple processors are attached to an interconnected network and access a common region of memory.
Share Memory advantages
- It is closer to conventional machine and easy to program.
- Overhead is low.
- OS Services are leveraged to utilized the additional CPU
- It leads to bottleneck problem.
- Expensive to build.
- It is less sensitive to partitioning
- Shared disk system: where each processor has its own main memory, and direct access to all disks through an interconnected network.
Shared disk advantages: The same with shared memory
- More interference
- Increases N/ W band width.
- Shared disk is less sensitive to partitioning.
- Shared Nothing: This is where each processor has local main memory and disk space, but no two processors can access the same storage area and all communication between processor is through a network connection. It has its own mass storage as well as main memory.
Shared Nothing Advantages:
- It provides linear scale up and linear speed up.
- Shared nothing benefit from “ good” partitioning.
- Cheap to build.
Shared Nothing Disadvantages:
- It is hard to program.
- Addition of new nodes requires reorganization
PARALLEL QUERY EVALUATION
A relational query execution plan is a graph/ tree of relational algebra operators ( based on this operators can execute in parallel) and the operators in a graph can be executed in parallel. If an operator consumes the output of a second operator, we have pipelined parallelism.
In this case large database are partitioned horizontally across several disk, this enables us to exploit the I/O bandwidth of the disk by reading and writing them in parallel. This can be done in the following ways:
- Round Robin Partitioning: If there are n processors, the 1th tuple is assigned to processor i mod n round-robin partitioning. Round-robin partitioning is suitable for efficiently evaluating queries that access the entire relation. If only a subset of the tuples is required, hash partitioning and range partitioning are better than round-robin partitioning.
- Hash partitioning: A hash function is applied to (selected fields of) a tuple to determine its processor. Hash partitioning has the additional virtue that it keeps data evenly distributed even if the data grows and shrinks over time.
- Range Partitioning: Tuples are sorted and ranges are chosen for the sort key values so that each range contains roughly the same number of tuples, tuples in range, I re assigned to processor i. Range Partitioning can lead to data skew.
The Advantages of Parallel Databases
A parallel database runs on many computers at the same time.
- High Performances
The Disadvantages of Parallel database
1.Implementation is highly expensive.
- Handling Parallel database simultaneously is difficult and complex.
- A lot of resources are needed to support and maintain the database.
- Define parallel database.
- Enumerate the three architectures for database.
- State three methods data can be partitioned.
- What are the advantages and disadvantages of parallel database.