Traditionally, architectures for parallel DBMS were categorised by the way in which processors share hardware resources like disk devices and memory. This categorisation initially appeared in [Stonebraker, 1986] and was mainly meant to be a discussion around the most appropriate parallel hardware architecture. Many researchers have participated in this discussion in the following years; see e.g. [Bhide and Stonebraker, 1988], [DeWitt and Gray, 1990], [Hua et al., 1991], [DeWitt and Gray, 1992], [Bergsten et al., 1993], [Valduriez, 1993b], [Baru et al., 1995], [Gray, 1995] and many others, base their arguments on it. In this section, we briefly describe the architectural categories and summarise the conclusions that have been drawn. We note that many arguments that were brought forward are of a historical nature because they reflect on the state of the technology in the late 1980s and early 1990s and do not consider recent developments. The categories are:
Shared-memory (SM) - some
authors, like [Hua et al., 1991] or [Bergsten et al., 1993], prefer the
equivalent term shared-everything -
means that all disks and all memory modules are shared by the
processors as shown in figure 8.3. This means that all
disks are equally accessible by all processors and that there is a
global address space for main memory. The latter can be implemented as
a physically distributed memory in which each processor has a local
memory which forms a part of the global memory . There are two
forms for accessing this distributed shared memory [Tannenbaum, 1994]:
The following arguments have been raised when discussing SM-architectures: It is said that SM is simple to program, essentially because of the global address space in main memory. Load balancing can be arranged relatively easily because each processor has equal access to all disks. Communication among the processors is fast (and incurs low overhead) as they can cooperate via main memory. However, system costs are high for big systems because the bus becomes a bottleneck and various hardware mechanism have to be employed to tackle this problem. Conflicting accesses to main memory can decrease the performance. It is also argued that access to main memory is the reason why SM-architectures do not scale up very well: [Bhide and Stonebraker, 1988] showed that beyond a certain number of processors, access to main memory can become a bottleneck that limits the system's processing speed. SM systems are therefore limited to a small number of processors ([Valduriez, 1993a] mentions 20; [Baru et al., 1995] argues that the limit is around 10 RISC System/6000 processors).
In a shared-disk (SD) system, each processor has its private memory. The access to disks, however, is shared by all of them. Figure 8.4 shows this architecture. Actually it shows the way in which the SM is frequently implemented, namely as a distributed shared memory with each processor holding one part of the global memory. For that reason SD and SM can be considered as synonymous nowadays.
In the following, we summarise the arguments that have been brought up in favour or against SD systems: It is argued that the costs for SD system are relatively low as the interconnect could be a bus system based on standard technology. It is also argued that load balancing is also relatively easy for the same reason as in the SM case, and that the availability of data is higher than in an SN system (see below) as a crash on one processor does not result in the data of a particular disk being unavailable. Software from uniprocessor systems can be easily migrated since the data on disk need not be reorganised [Valduriez, 1993a]. Much of the down-side of SD systems is said to relate to an increase in complexity, e.g. caused by the cache coherency control mechanisms that are necessary to maintain consistent disk pages in the processors' individual caches. A centralised lock management is also required. All this limits the scalability of a SD system. Finally, the access to the shared disks might result in a bottleneck through a limited interconnect capacity.
In a shared-nothing (SN) system, each processor has its private memory and has at least one disk connected; the processor acts as a server for the data on this disk. Figure 8.5 shows this type of architecture.
The arguments around the SN-architecture are as follows: It is said
that the costs of a SN system are low because it can essentially be
constructed from commodity components. Theoretically, SN can scale and
speed up linearly; in practice it is argued that the interconnect
becomes saturated beyond a certain volume of communication.
Availability is also often considered to be a serious problem. Load
balancing is another problem: it is argued that data skew can cause serious imbalances. Furthermore, load
imbalance can be caused by the the execution of
operations being in some way predetermined by the data placement on
the disks and the necessity to avoid huge data shipping through the
network to other processors.
For a long time, SN has been considered as the best parallel architecture, mainly because of its allegedly unlimited scalability. There are two reasons why this advantage has not manifested itself in practice: