A Distributed database (DDB) is a collection of multiple, logical interrelated database distributed over a computer network.
Table of Contents
A Distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. A distributed database system is a system that permits physical data storage across several sites and each site/node is managed by a DBMS that is capable of running independently of the other sites. It is a database in which storage devices are not all attached to a common processing unit as the CPU, controlled by a distributed database management system. It may be stored in multiple computers, located in the same physical location; or may be dispersed over a networkof interconnected computers. System administrators can distribute collections of data (e. g in a database) across multiple physical locations. A distributed database can reside on network servers on the internet, on corporate intranets, or on other company networks.
Two processing ensure that the distributed database remain up- to data and current:
- Replication: involves using specialized software that looks for changes in the distributed database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be complex and time –consuming depending on the size and number of the distributed databases
- Duplication: This process has less complexity, it basically identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time hours. This is to ensure that each distributed location has the same data. In the duplication process, users may change only the master database, which ensures that local data will not be overwritten. .
A Distributed Database management system is designed for heterogeneous database platforms that focus on heterogeneous database management systems. The following property are considered desirable:
Distributed Data Independence: Users should be able to ask queries without specifying where the referenced relations, or copies or fragments of the relations are located.
Distributed Transaction Atomicity: User should be able to write transactions that access and update at several sites just as they would write transactions over purely local data
Types of Distributed Database
There are two major types of distributed database systems: they are:
- Homogenous distributed database
- Heterogenous distributed database.
Homogenous distributed database.
The following conditions must be satisfied for homogeneous database:
The operating system use, at each location must be the same.
the operating system, must , data structures and database application used at each location must be same or compatible.
Heterogenous distributed database.
The following conditions must be satisfied for heterogenous database:
Different sites may use different schema and software.
In heterogenous systems, different nodes may have different hardware, software and data structure at various nodes or locations.
Architectures of Distributed Database Systems
The three major distributed DBMS architectures are:
(i)Client Server (ii) Collaborating Server (iii) Middleware
- Client Server Architecture: In this architecture, the Client (front end) does data presentation or processing, while the Server (back- end) does storage, security and major data processing. Client are held responsible for user-interface issues and servers manage data and execute transactions. A client-server system has one or more client processes and one or more server processes, and a client process can send a query to any one server process. Thus a client process could run on a personal computer and send queries to a server running on a mainframe.
- Always initiate requests to servers.
- Waits for replies.
- Receives replies.
- Usually connects to a small number of servers at one time.
- Always wait for a request from one of the clients
- Servers client request then replies with requested data to the clients
- A server may communicate with other servers in order to serve a client request.
- A server is a source which sends request to client to get the needed data users.
Advantages of Client – Server Architecture
- Very easy to implement because of its clear separation of functionally and centralized server.
- Allow user to run a graphical user interface.
- It enables the roles and responsibilities of a computing system to be distributed among several independent computers known to each other only through network. It also provides greater ease of maintenance.
- Servers provides better security control access and resources to guarantee that only those clients with the appropriate permissions may access and change data.
- Since data storage is centralized, updates to that data are much easier to administrators.
- Many advanced client-server technologies are designed to ensure security, user friendly interfaces and ease of use.
- It works with multiple different clients of different specifications.
Disadvantages of Client-Server
- The client-Server architecture does not permit a single query to span multiple servers.
- Some times to separate and distinguish between clients and server architecture become harder.
- Problem of overlapping, the client process and the server.
- Networks traffic blocking is one of the problems related to the client-server model.
- Collaborating Server System: This is a collection of database servers, each capable of running transactions against local data, which cooperatively execute transactions spanning multiple servers. This overcomes the problem of client-server architecture.
- Middleware architecture: All web transactions takes place on the servers. The web server is responsible for communicating with the browser while the database server is responsible for storing the required information.
Advantages of distributed databases
- Data is stored at a number of sites, also referred to as nodes.
- The processors at nodes are interconnected by a computer network rather than a multiprocessor configuration.
- The distributed database is indeed a true database, not a collection of files that can be stored individually at each node.
- The overall system has the full functionality of a database management system.
- Reliable transactions due to replication of database
- Hardware, operating system, network, fragmentation, DBMS, replication and location independence.
- Continuous operation, even if some nodes go off line.
- Distributed query processes can improve performance.
- Easier expansion.
- Local autonomy or site autonomy: a department can control the data about them.
- Protection of valuable data if there is a fire outbreak as a result of the distributed data in multiple sites.
- Modularity systems can be modified added and removed fro the distributed database without affecting other systems or modules.
- It is very economical.
Disadvantages of distributed databases
- Data integrity is difficult to maintain.
- Distributed data is very complex in nature. For example, extra work must be done to maintain multiple desperate systems, instead of one big one.
- It is not really economical because a more extensive infrastructure implies extra labour costs.
- Absence of standards right.
- Additional software are needed.
- Complexity in database design.
- Operating system should support distributed environment.
Storing Data in DDBS
Data storage in distributed database involve two concepts
- Fragmentation: This is a process of splitting a relation into smaller relation or fragments, and storing the fragment possibly at different sites. In horizontal fragmentation, each fragment consists of a subset of rows of the original relation. While in vertical fragmentation, each fragment consists of a subset of columns of the original relations.
- Replication: This means that several copies of a relation or relation fragment can be stored. An entire relation can be replicated at one or more sites. Similarly, one or more fragments of a relation can be replicated at other sites. For example, if a relation R is fragments into R1, R2 and R3, there might be just one copy of R1, whereas R2 is replicated at two other sites and R3 is replicated at all sites.
Parallel DBMS against distributed DBMS
- Parallel Distributed System: seeks to improve performance through parallelization of various operations, such as data loading, index building and query evaluating.
- Distributed Database System: Data is physically stored across several sites, and each site is typically managed by a DBMS capable of running independent of the other sites. The distribution of data is governed by factors such as local ownership and increased availability.
Distributed DBMS consists of many Geo-distributed, low –bandwidth link connected, autonomic site. While parallel DBMS consists of tightly coupled, high- bandwidth link connected, non- autonomic node.
Sites in distributed DBMS can work independently to handle local transaction or work together to handle global transactions. While nodes in parallel DBMS can only work together to handle global transactions.
Distributed DBMS is for sharing data, local autonomy, high availability, while parallel DBMSA is for high performance high availability.