next up previous contents index
Next: Conclusions Up: Evaluation of Characteristics Previous: Uniform Workloads

Experiments

 

A series of four experiments was conducted in order to explore the contribution of the various cost components to the overall costs under the assumption of a uniform workload. For this purpose, we assumed a parallel architecture comprising processors and performance parameters as outlined in table 8.9. We used average values as provided by manufacturers of parallel hardware and commodity products, e.g. as in [Compaq Computer Corp., 1997], [Tandem Computers GmbH, 1997] or [Seagate Technology, 1997]. In each of the experiments a certain parameter of the workload was varied. The following parameter values were used unless the respective parameter was varied in the experiment:

In the first experiment, the dependency of the hardware configuration, i.e. the distribution of the 16 processors between nodes, was investigated. We used nodes. The results for the various cost components are shown in table 8.10 and in figure 8.15. As it became obvious that the repartitioning costs, , would only play a minor role in the overall costs we omitted to present the cost component values for the repartitioning stage (see tables 8.18.4). The only difference between the four configurations is in the memory access costs: the and node configuration suffer from all processors accessing the same physical memory. This underlines the viability of our performance model as, in practice, the bus becomes frequently the bottleneck in such configurations. This problem gradually goes away with an increasing number of nodes and therefore with memory being more widely distributed.

The second experiment explored the dependency of the cost components on the number m of partial computation into which the join $R
\Join_{\scriptscriptstyle C}Q$ was divided. The results are shown in table 8.10 and figure 8.16. A general trend is that decreases for increasing values of m. However, the best performance results are achieved if m is a multiple of . Such configurations achieve an optimal match between the number of processors and the number of computations (see figure 8.10). A further significant result is that the CPU and memory access times for the joins (i.e. C2(b),cpu and C2(b),mem) decrease for an increasing m while C2(a),cpu, C2(a),mem, C2(c),cpu and C2(c),mem remain almost constant and become increasingly dominant for the total costs during this process. This is a significant conclusion as it underlines the necessity and the potential for a minimisation (or at least reduction) of overlapping intervals by choosing an adequate partition.

In the third experiment, the dependency of the costs on the size of the participating relation was analysed. |R| and |Q| were increased by 20000 tuples in each step with an initial number of 20000 tuples. The results are shown in table 8.10 and figure 8.17. As for conventional joins, most components show a quadratic behaviour with linearly increasing relation sizes. This reflects the quadratic time complexity of the nested-loops join algorithm.

Finally, in the fourth experiment, we looked at the dependency of costs with respect to the (average) interval length . The results are shown in table 8.10 and figure 8.18. As one can expect, the joins are not affected; therefore all components of C2(b) remain constant when varying . All other cost components show a linear increase with a linearly increasing value of .


 
Table: The parameters describing the parallel architecture that is used in the experiments.
Parameter Description Value
total number of processors 16
processor speed in MIPS 200 MIPS
free main memory per node 32 MB
disk I/O bandwidth per node 20 MB/sec
communication bandwidth 40 MB/sec
memory bandwidth per node 400 MB/sec
number of CPU instructions for processing a tuple in each step 1000
number of CPU instructions for hashing a tuple 1000
number of CPU instructions for initiating a data transfer 500
number of CPU instructions for initiating a disk I/O 500
b page size 4 kB


next up previous contents index
Next: Conclusions Up: Evaluation of Characteristics Previous: Uniform Workloads

Thomas Zurek