A series of four experiments was conducted in order to explore the contribution of the various cost components to the overall costs under the assumption of a uniform workload. For this purpose, we assumed a parallel architecture comprising processors and performance parameters as outlined in table 8.9. We used average values as provided by manufacturers of parallel hardware and commodity products, e.g. as in [Compaq Computer Corp., 1997], [Tandem Computers GmbH, 1997] or [Seagate Technology, 1997]. In each of the experiments a certain parameter of the workload was varied. The following parameter values were used unless the respective parameter was varied in the experiment:
In the first experiment, the dependency of the hardware configuration, i.e. the distribution of the 16 processors between nodes, was investigated. We used nodes. The results for the various cost components are shown in table 8.10 and in figure 8.15. As it became obvious that the repartitioning costs, , would only play a minor role in the overall costs we omitted to present the cost component values for the repartitioning stage (see tables 8.1 - 8.4). The only difference between the four configurations is in the memory access costs: the and node configuration suffer from all processors accessing the same physical memory. This underlines the viability of our performance model as, in practice, the bus becomes frequently the bottleneck in such configurations. This problem gradually goes away with an increasing number of nodes and therefore with memory being more widely distributed.
The second experiment explored the dependency of the cost components
on the number m of partial computation into which the join was divided. The results are shown in
table 8.10 and figure 8.16. A
general trend is that decreases for increasing values of
m. However, the best performance results are achieved if m is a
multiple of . Such configurations achieve an optimal
match between the number of processors and the number of computations
(see figure 8.10). A further significant result is
that the CPU and memory access times for the joins (i.e. C2(b),cpu and C2(b),mem) decrease for an increasing m
while C2(a),cpu, C2(a),mem, C2(c),cpu and C2(c),mem remain
almost constant and become increasingly dominant for the total costs
during this process. This is a significant conclusion as it underlines
the necessity and the potential for a minimisation (or at least
reduction) of overlapping intervals by choosing an adequate partition.
In the third experiment, the dependency of the costs on the size of the participating relation was analysed. |R| and |Q| were increased by 20000 tuples in each step with an initial number of 20000 tuples. The results are shown in table 8.10 and figure 8.17. As for conventional joins, most components show a quadratic behaviour with linearly increasing relation sizes. This reflects the quadratic time complexity of the nested-loops join algorithm.
Finally, in the fourth experiment, we looked at the dependency of costs with respect to the (average) interval length . The results are shown in table 8.10 and figure 8.18. As one can expect, the joins are not affected; therefore all components of C2(b) remain constant when varying . All other cost components show a linear increase with a linearly increasing value of .
Parameter | Description | Value |
---|---|---|
total number of processors | 16 | |
processor speed in MIPS | 200 MIPS | |
free main memory per node | 32 MB | |
disk I/O bandwidth per node | 20 MB/sec | |
communication bandwidth | 40 MB/sec | |
memory bandwidth per node | 400 MB/sec | |
number of CPU instructions for processing a tuple in each step | 1000 | |
number of CPU instructions for hashing a tuple | 1000 | |
number of CPU instructions for initiating a data transfer | 500 | |
number of CPU instructions for initiating a disk I/O | 500 | |
b | page size | 4 kB |