next up previous contents index
Next: Summary of the Architectural Up: The Architectural Model Previous: The Architectural Model

Introduction

  Nowadays, high-performance database management systems (DBMS) are running on a variety of hardware platforms. There are two categories:

Machines of the first category usually employ a single, but very powerful processor. Although there are still many DBMS installations running on uniprocessors, the use of multiprocessor systems is vital for performance whenever the database size or the workload cause the CPU of a uniprocessor system to be the performance bottleneck. Multiprocessor servers combine the raw computing power of many (commodity) processors in order to achieve high performance. However, parallelism is not restricted to the CPU but also to I/O and main memory access. There are many ways in which processors, disks, memory modules, buses etc. can be combined in order to build a parallel database server and this section will discuss some of the resulting architectural categories.

At the end of the 1980s and the beginning of the 1990s there was a wide and controversial discussion[*] about the question ``Which is the most suitable parallel architecture to support parallel database systems?'' It was expected that one could draw conclusions on the system's performance by analysing its underlying architecture. For a while, there was a confusion about what the term ``architecture'' actually comprised: only the system's hardware or also the software? Therefore many researchers mixed hard- and software aspects within this discussion. Actually, this was not a problem as the first parallel database system prototypes used to have matching hard- and software architectures. However, things changed when parallel DBMS technology started to be commercially exploited.

In the last few years, many vendors have tried to make their parallel DBMS products independent from specific parallel hardware platforms in order to achieve a wider acceptance in the market of high-end DBMS products. This resulted in the fact that a DBMS's software architecture does not necessarily match the underlying hardware architecture. Similarly, vendors of parallel hardware moved to general-purpose architectures that can run software of any type but with certain software architectures being more favourable than others.

This development made it even more difficult to predict a system's performance from an analysis of the underlying architectures. Alternatives were proposed such as the 5-layer-model by Norman and Thanisch. They suggest to base a performance analysis on 5 layers with each layer representing a system's hard- and software components (see figure 8.2). Lines between the components describe dataflows. By describing a system on top of this model one can now see in which way workloads are balanced between the components within the system [Norman and Thanisch, 1995].


  
Figure: The 5 layers of the generic model. Source: [Norman and Thanisch, 1995].

A further issue, which makes performance modeling a difficult task, is the following: practical experience shows that a system's performance is quite often also a result of tuning, i.e. the proper configuration of the hardware with respect to the software and the workload and the configuration of the software with respect to the hardware and the workload. Tuning is an important issue as practical evidence shows that there is a huge difference in performance between a well-tuned and a poorly-tuned system [Witkowski, 1993].

As a conclusion from the above, it becomes obvious that it is a difficult and complex task to determine a system's performance; there is a huge number of factors and facts to be considered. However, in this thesis we are not concerned with the overall performance of a DBMS under a certain workload but with the performance of one particular operation. Consequently, we do not need to make assumptions about the system's software architecture; we can merely concentrate on its hardware. This might stand in contrast to what we said above but is purely justified by the fact that we concentrate on one single operation rather than an entire DBMS.

This allows us to make some simplifying assumptions: we perceive the system environment as a set of hardware resources that are available for processing the temporal join. These resources are characterised by parameters, e.g. the (current) amount of free memory, the (current) communication bandwidth, the number of processing nodes that are available for processing the join at that particular moment etc. We assume these parameters to be dynamic, i.e. they describe the current potential of the system, rather than static, i.e. they are not supposed to be constant all the time. In that way, we incorporate the system's load / workload without making any assumptions about it. A high workload, for example, might imply a small amount of free memory, low communication and I/O bandwidths etc., whereas low workloads imply more beneficial parameter values. What remains to be defined is how these components interact, i.e. the hardware architecture.

In section 8.2.2, we summarise the architectural discussion that was mentioned earlier. It provides an overview over the various basic architectural types that can be considered. The arguments in favour and against these basic types explain the convergence to hybrid architectures which are presented in section 8.2.3. The latter incorporate concepts of various basic types. We pick one of these architectures to be the one on which our performance model is based. As outlined in section 8.1, it is parameterised by two variables. These parameters provide us with the flexibility to setting up either a single-processor architecture, a parallel shared-memory  (SMP)  architecture, a parallel shared-nothing  architecture or a hybrid, two-level architecture incorporating advantages of the shared-memory and the shared-nothing approaches.


next up previous contents index
Next: Summary of the Architectural Up: The Architectural Model Previous: The Architectural Model

Thomas Zurek