Experiments

A series of four experiments was conducted in order to explore the contribution of the various cost components to the overall costs under the assumption of a uniform workload. For this purpose, we assumed a parallel architecture comprising processors and performance parameters as outlined in table 8.9. We used average values as provided by manufacturers of parallel hardware and commodity products, e.g. as in [Compaq Computer Corp., 1997], [Tandem Computers GmbH, 1997] or [Seagate Technology, 1997]. In each of the experiments a certain parameter of the workload was varied. The following parameter values were used unless the respective parameter was varied in the experiment:

In the first experiment, the dependency of the hardware configuration, i.e. the distribution of the 16 processors between nodes, was investigated. We used nodes. The results for the various cost components are shown in table 8.10 and in figure 8.15. As it became obvious that the repartitioning costs, , would only play a minor role in the overall costs we omitted to present the cost component values for the repartitioning stage (see tables 8.1 - 8.4). The only difference between the four configurations is in the memory access costs: the and node configuration suffer from all processors accessing the same physical memory. This underlines the viability of our performance model as, in practice, the bus becomes frequently the bottleneck in such configurations. This problem gradually goes away with an increasing number of nodes and therefore with memory being more widely distributed.

The second experiment explored the dependency of the cost components on the number m of partial computation into which the join $R \Join_{\scriptscriptstyle C}Q$ was divided. The results are shown in table 8.10 and figure 8.16. A general trend is that decreases for increasing values of m. However, the best performance results are achieved if m is a multiple of . Such configurations achieve an optimal match between the number of processors and the number of computations (see figure 8.10). A further significant result is that the CPU and memory access times for the joins (i.e. C_2(b),cpu and C_2(b),mem) decrease for an increasing m while C_2(a),cpu, C_2(a),mem, C_2(c),cpu and C_2(c),mem remain almost constant and become increasingly dominant for the total costs during this process. This is a significant conclusion as it underlines the necessity and the potential for a minimisation (or at least reduction) of overlapping intervals by choosing an adequate partition.

In the third experiment, the dependency of the costs on the size of the participating relation was analysed. |R| and |Q| were increased by 20000 tuples in each step with an initial number of 20000 tuples. The results are shown in table 8.10 and figure 8.17. As for conventional joins, most components show a quadratic behaviour with linearly increasing relation sizes. This reflects the quadratic time complexity of the nested-loops join algorithm.

Finally, in the fourth experiment, we looked at the dependency of costs with respect to the (average) interval length . The results are shown in table 8.10 and figure 8.18. As one can expect, the joins are not affected; therefore all components of C_2(b) remain constant when varying . All other cost components show a linear increase with a linearly increasing value of .

**Table:** The parameters describing the parallel architecture that is used in the experiments.
Parameter	Description	Value
	total number of processors	16
	processor speed in MIPS	200 MIPS
	free main memory per node	32 MB
	disk I/O bandwidth per node	20 MB/sec
	communication bandwidth	40 MB/sec
	memory bandwidth per node	400 MB/sec
	number of CPU instructions for processing a tuple in each step	1000
	number of CPU instructions for hashing a tuple	1000
	number of CPU instructions for initiating a data transfer	500
	number of CPU instructions for initiating a disk I/O	500
b	page size	4 kB