next up previous contents index
Next: Dependency on XR and XQ Up: Experimental Evaluation Previous: A General Comparison between

Dependency on m

   

In the experiment of previous section, the number m of fragments / partial joins was fixed to a value of 16 which matched the number of processors in the parallel architecture. Here, we want to find out whether this value was a good choice or whether higher values of m lead to better performances. In other words: we want to find out whether it is better to have a small number of relatively big partial joins or a large number of relatively small ones. There are no indications neither from the performance model (chapter 8) nor from the strategies themselves (chapter 9) whether one way or the other is better. Therefore, we have to rely on the observations made in the experiments. In order to reduce the complexity of the experiments and to concentrate on the main issues, we chose four of the eleven strategies on offer, namely the uniform lifespan, basic underflow, primary underflow and the primary minimum-overlaps strategies. The choice was led by the goal to select at least one member of each family and also those that are the most promising with respect to good performances.

Six experiments were conducted to test the dependency on m of these strategies, i.e. one per combination of join and hardware. The performance results are given in tables 10.4, 10.5 and 10.6 and visualised in figures 10.10 to 10.15.

First, we want to analyse the behaviour of the strategies on the parallel architecture. In all cases, the primary underflow strategy is a narrow winner before the primary minimum-overlaps strategy. Both strategies show relatively high costs on low values of m and perform better for higher values, with almost constant costs in the second half of the chart. The costs of the uniform lifespan strategy are between 2 and 3 times (join ), between 1.3 and 2 times (join $R
\Join_{\scriptscriptstyle C}Q$) and between 2.2 and 3.5 times (join )higher than those of the primary underflow strategy. In general, it also performs better for higher values of m. However, there are some exceptions to this rule such as a sharp rise from m=896 to m=1024 for the join or some significant shaking in the area between m=384 and m=1024 for the join $R
\Join_{\scriptscriptstyle C}Q$. A much more irregular behaviour is shown by the basic underflow strategy. The exact shape of its cost function can only be explained by looking into details of the data. However, it seems that it performs best for lower values of m, which stands in contrast to the other three strategies. In general, it performs worse than the primary underflow and minimum-overlaps strategies.

Now, we turn our attention to the results for the single-processor hardware architecture. Here, the scene looks different: for lower values of m the uniform lifespan strategy has between 10% and 30% higher costs in comparison to the primary underflow strategy, whereas for high values of m it performs between 1% and 4% better. This is a surprising result. It means in other words that the uniform lifespan strategy outperforms all other strategies if simply the value of m is chosen to be high enough. However, the advantage in comparison to other strategies is only very minor (see joins and ) or does not exist (see join $R
\Join_{\scriptscriptstyle C}Q$). A further interesting result is that the primary minimum-overlaps strategy is better (joins and $R
\Join_{\scriptscriptstyle C}Q$) or at least as good (join ) as the primary underflow strategy. This is due to the facts that (a) a good load balance  between the partial joins is far less important on a single-processor machine in comparison to a parallel one and that (b) the advantage of having a reduced total number of overlapping intervals materialises in the single-processor case.

From the results in tables 10.4, 10.5 and 10.6 we can also derive some information about the speed-up . It is defined as the ratio between the processing times that are required for the same problem on a machine with n processors and on one with 1 processor. Ideally, the speed-up is n in this case. In realistic situations, however, such a speed-up is never achieved because of communication and synchronisation overheads. If we compare the respective lowest costs on the parallel and single-processor machines for the three joins, then we get speed-ups of 9.5 for , 4.6 for $R
\Join_{\scriptscriptstyle C}Q$ and 10.6 for .


 
Table: Performance results (in sec.) for the join depending on m.
  4c!width 2pt, 4c|,                        
m 1c|Uniform 1c|Primary 1c|Primary 1c!width 2ptBasic 1c|Uniform 1c|Primary 1c|Primary 1c|Basic            
  1c|Lifespan 1c|Underflow 1c|Min.-O. 1c!width 2ptUnderflow 1c|Lifespan 1c|Underflow 1c|Min.-O. 1c|Underflow            
16 3189 1077 1988 1490 12783 11370 11286 11807            
32 2111 891 1359 1326 10580 9589 9464 9852            
48 1930 843 1141 1439 9675 8839 8750 9650            
64 1837 797 1178 1384 9117 8458 8394 9437            
80 1743 788 1011 1529 8783 8228 8177 9299            
96 1699 777 967 1749 8559 8009 8029 9389            
112 1668 776 986 1713 8395 7952 7929 9328            
128 1700 763 963 1791 8241 7847 7855 9133            
256 1613 757 926 2884 7194 7633 7618 9209            
384 1587 752 910 2808 7258 7580 7569 8980            
512 1630 770 916 2465 7310 7578 7568 8773            
640 1629 788 903 2143 7351 7586 7399 8708            
768 1577 788 898 1612 7379 7586 7493 8633            
896 1554 788 923 1657 7416 7586 7543 8649            
1024 2501 788 935 1582 7093 7586 7594 8666            


 
Table: Performance results (in sec.) for the join $R
\Join_{\scriptscriptstyle C}Q$ depending on m.
  4c!width 2pt, 4c|,                        
m 1c|Uniform 1c|Primary 1c|Primary 1c!width 2ptBasic 1c|Uniform 1c|Primary 1c|Primary 1c|Basic            
  1c|Lifespan 1c|Underflow 1c|Min.-O. 1c!width 2ptUnderflow 1c|Lifespan 1c|Underflow 1c|Min.-O. 1c|Underflow            
16 3779 1901 2458 1884 10322 9248 9029 8629            
32 2771 1454 1747 1647 7793 6966 6995 7229            
48 2431 1339 1556 1531 6942 6395 6451 6693            
64 2238 1295 1380 1468 6522 6111 6109 6320            
80 2160 1253 1424 1476 6258 5930 5931 6235            
96 2089 1224 1351 1575 6096 5814 5814 5612            
112 2040 1209 1389 1697 5974 5740 5736 5616            
128 1914 1202 1396 1613 5874 5679 5685 5625            
256 1795 1169 1324 2398 5333 5501 5336 5688            
384 1763 1147 1301 2274 5335 5376 5358 5704            
512 1523 1170 1239 2544 5353 5402 5390 5760            
640 1551 1174 1247 3251 5375 5436 5425 5827            
768 1758 1171 1260 3432 5393 5472 5462 5863            
896 1598 1171 1263 3051 5420 5503 5474 5886            
1024 1699 1171 1263 2417 5462 5503 5474 5939            


 
Table: Performance results (in sec.) for the join depending on m.
  4c!width 2pt, 4c|,                        
m 1c|Uniform 1c|Primary 1c|Primary 1c!width 2ptBasic 1c|Uniform 1c|Primary 1c|Primary 1c|Basic            
  1c|Lifespan 1c|Underflow 1c|Min.-O. 1c!width 2ptUnderflow 1c|Lifespan 1c|Underflow 1c|Min.-O. 1c|Underflow            
16 4327 1243 1753 2122 19029 14551 14621 14739            
32 3274 1058 1277 2165 14506 12153 12150 12878            
48 2940 1001 1217 2281 12954 11389 11394 12330            
64 2800 975 1111 2354 12189 10989 10986 12126            
80 2683 964 1102 2590 11724 10773 10769 12076            
96 2622 949 1060 2875 11413 10619 10616 12074            
112 2575 936 1053 3111 11191 10514 10519 12030            
128 2559 941 1047 4434 11010 10442 10441 11995            
256 2451 938 1021 5912 10449 10214 10213 11973            
384 2426 948 1013 6503 9964 10180 10179 12145            
512 2447 963 1015 6522 9973 10195 10194 12062            
640 2413 968 1025 6371 9992 10229 10228 12080            
768 2430 997 1031 6117 10009 10272 10269 12095            
896 2476 1009 1032 5626 10033 10323 10322 12001            
1024 2252 1009 1048 2122 10072 10374 10376 12013            




  
Figure: Dependency on m of the performance results for the join on a parallel architecture.




  
Figure: Dependency on m of the performance results for the join on a single-processor architecture.




  
Figure: Dependency on m of the performance results for the join $R
\Join_{\scriptscriptstyle C}Q$ on a parallel architecture.




  
Figure: Dependency on m of the performance results for the join $R
\Join_{\scriptscriptstyle C}Q$ on a single-processor architecture.




  
Figure: Dependency on m of the performance results for the join on a parallel architecture.




  
Figure: Dependency on m of the performance results for the join on a single-processor architecture.


next up previous contents index
Next: Dependency on XR and XQ Up: Experimental Evaluation Previous: A General Comparison between

Thomas Zurek