next up previous contents index
Next: The Architectural Influence Up: Experimental Evaluation Previous: Dependency on

Dependency on |R| and |Q|

     

In this section, we want to find out whether the sizes of the participating relations can make a difference to the optimisation decision on the most suitable partitioning strategy. The experiments were set up as in the previous section but this time remained constant at its original value of 300. Here, we varied the number of tuples within the participating relations, i.e. |R| and |Q|. The relations were created by randomly picking tuples from the original sets which have 121728 tuples each. In this section's experiments the sample sizes are 20000, 40000, 60000, 80000 and 100000.

Table 10.13 shows the performance results for the three joins on the parallel architecture; table 10.14 shows the corresponding figures for the single-processor machine. There are no big surprises within these numbers. Figures 10.32 and 10.33 therefore show the averages for the three joins respectively. The quadratic time complexity of the underlying join algorithm can be easily recognised on both machines. In figures 10.34 and 10.35 the normalised results are shown. Here, we do not see any surprising effects either. In total, we can conclude that the relation sizes have no impact on the choice of the best partitioning strategy.

As a ``side-result'' of these experiments we note that the numbers for the skewed data  (table 10.13) are quite different from those that were obtained under the assumption of uniformity (see table 8.10 on page [*]): for example, for |R|,|Q| = 100000 we got 386.5 seconds with uniform data and 538, 753 and 545 seconds with the primary underflow strategy with the skewed data. This difference underlines how important it is to consider data skew and to look at techniques that can cope with this effect. Especially the results in table 10.13 clearly show how badly the uniform strategy performs as it does not consider any characteristics of the underlying data apart from the lifespan (or range or startpoints span).


  




  
Figure: Performance averages for the three joins on the parallel architecture for varying |R| and |Q|.




  
Figure: Performance averages for the three joins on the single-processor architecture for varying |R| and |Q|.




  
Figure: Comparison between the performances of the three strategies on a parallel architecture for varying |R| and |Q|.




  
Figure: Comparison between the performances of the three strategies on a single-processor architecture for varying |R| and |Q|.


next up previous contents index
Next: The Architectural Influence Up: Experimental Evaluation Previous: Dependency on

Thomas Zurek