next up previous contents index
Next: Dependency on Up: Experimental Evaluation Previous: Dependency on m

Dependency on XR and XQ

          

In the previous section, we conducted experiments in which m was varied. Whereas m is an input parameter for the uniform strategies, it is an output parameter for the other strategies: it is implied by the value that we choose for X  (in the case of the basic underflow and the basic minimum-overlaps strategies) or for XR and XQ (in the case of the primary underflow and the primary minimum-overlaps strategies). In section 10.3, we found out that high values for m usually result in a good performance when using the primary underflow or primary minimum-overlaps strategies. For practical purposes, this is a little bit vague and does not give a clear indication how the values for XR and XQ, i.e. the respective maximum sizes for the primary fragments R'k and Q'k, should be chosen. In this section, we want to overcome this deficit. We will try to empirically determine a rule that provides a good choice for XR and XQ. This is also an important piece of knowledge that we require for the experiments in the following sections.

XR and XQ are certainly parameters that depend on the numbers of tuples in R and Q, i.e. |R| and |Q|. We set up a series of experiments in which we used a value Z  to control XR and XQ using the following equations:

Put in a different way: Z is the ratio (in percent) between the maximum number of tuples allowed per primary fragment and the number of tuples in the relation. For example Z=5 means that XR and XQ are 5% of the value of |R| and |Q| respectively.

Several experiments were conducted, using 10, 9, ..., 1, 0.75, 0.5, 0.25, 0.2 as values for Z. Table 10.7 shows the performances for the three joins using the primary underflow strategy on the parallel architecture. The experiment was run for R and Q and then again for samples of 40000 tuples of these two relations. From this variation we hope to see whether a good choice of a value for Z is really independent of |R| and |Q|. Figures 10.16 and 10.17 visualise the data of table 10.7. In all cases, a value of Z=0.75 provides the best performance.

We then moved on to see whether a similarly clear result can be obtained for the primary minimum-overlaps strategy. Table 10.8 gives the performances and figures 10.18 and 10.19 show them graphically. Here, we find that Z-values between 0.2 and 0.75 provide best performances. Z=0.2 is the best choice if near-optimal performances are to be achieved for all cases.

Next, we looked at the performances that are achieved on a single-processor machine. First, the primary underflow strategy was tested. The results are shown in table 10.9 and visualised by the graphs in figures 10.20 and 10.21. Here, we find the best performances for Z-values in the range 0.2 to 1. If we consider all performances for these Z-values then Z=0.5 is best because it near-optimal performances in all cases.

Finally, the primary minimum-overlaps strategy was investigated. The results are shown in table 10.10 and visualised by the graphs in figures 10.22 and 10.23. Similar to the primary underflow case, it is the Z-values between 0.2 and 1 that perform best with Z=0.5 achieving near-optimal performances in all cases.

In summary, we conclude that values for Z=0.75 on the parallel architecture and Z=0.5 on the single-processor machine seem to be good choices in general. We will use these values to determine XR and XQ for the primary underflow and primary minimum-overlaps strategies in the remaining experiments.


 
Table: Performance results (in sec.) depending on Z for the three joins and the primary underflow strategy on the parallel architecture.
  3c!width 2pt|R| = |Q| = 121728 3c||R| = |Q| = 40000                    
Z $R
\Join_{\scriptscriptstyle C}Q$ $R
\Join_{\scriptscriptstyle C}Q$            
  1c|(Join 1) 1c|(Join 2) 1c!width 2pt(Join 3) 1c|(Join 1) 1c|(Join 2) 1c|(Join 3)            
10 1771 1823 2140 193 198 231            
9 1681 1675 1876 183 181 202            
8 1433 1601 1606 156 173 172            
7 1203 1495 1356 132 161 146            
6 1249 1657 1377 137 179 149            
5 1455 1477 1583 160 160 170            
4 1146 1435 1367 126 156 147            
3 1051 1178 1044 115 129 113            
2 836 1260 964 92 138 105            
1 753 1174 930 85 131 103            
0.75 708 1137 896 80 127 100            
0.50 775 1147 949 88 131 108            
0.25 766 1158 942 94 141 113            
0.20 782 1165 956 96 144 118            


 
Table: Performance results (in sec.) depending on Z for the three joins and the primary minimum-overlaps strategy on the parallel architecture.
  3c!width 2pt|R| = |Q| = 121728 3c||R| = |Q| = 40000                    
Z $R
\Join_{\scriptscriptstyle C}Q$ $R
\Join_{\scriptscriptstyle C}Q$            
  1c|(Join 1) 1c|(Join 2) 1c!width 2pt(Join 3) 1c|(Join 1) 1c|(Join 2) 1c|(Join 3)            
10 2523 2576 3164 274 276 336            
9 2367 2251 2780 263 255 295            
8 2523 2124 2343 275 207 250            
7 2082 2433 1948 229 261 208            
6 2150 2302 1964 236 248 212            
5 1871 1773 1970 205 192 211            
4 1759 1640 1694 192 198 182            
3 1454 1545 1327 159 167 140            
2 1389 1491 1201 153 157 141            
1 1190 1373 1097 133 150 121            
0.75 1148 1337 1114 126 149 116            
0.50 985 1324 1058 114 146 120            
0.25 922 1256 1014 111 151 121            
0.20 912 1225 1025 114 148 125            


 
Table: Performance results (in sec.) depending on Z for the three joins and the primary underflow strategy on the single-processor architecture.
  3c!width 2pt|R| = |Q| = 121728 3c||R| = |Q| = 40000                    
Z $R
\Join_{\scriptscriptstyle C}Q$ $R
\Join_{\scriptscriptstyle C}Q$            
  1c|(Join 1) 1c|(Join 2) 1c!width 2pt(Join 3) 1c|(Join 1) 1c|(Join 2) 1c|(Join 3)            
10 13352 9211 17218 1452 999 1848            
9 14308 8704 16337 1553 941 1749            
8 12865 8023 15571 1402 868 1666            
7 12471 7845 14892 1356 849 1597            
6 11371 7427 14163 1239 804 1514            
5 10631 7084 13471 1155 768 1443            
4 10148 6691 12744 1110 724 1364            
3 9406 6291 12004 1022 682 1285            
2 8753 5947 11284 956 647 1208            
1 8014 5604 10578 880 591 1138            
0.75 7858 5533 10410 867 593 1123            
0.50 7665 5362 10262 854 602 1114            
0.25 7580 5444 10182 861 635 1128            
0.20 7578 5496 10199 871 653 1142            


 
Table: Performance results (in sec.) depending on Z for the three joins and the primary minimum-overlaps strategy on the single-processor architecture.
  3c!width 2pt|R| = |Q| = 121728 3c||R| = |Q| = 40000                    
Z $R
\Join_{\scriptscriptstyle C}Q$ $R
\Join_{\scriptscriptstyle C}Q$            
  1c|(Join 1) 1c|(Join 2) 1c!width 2pt(Join 3) 1c|(Join 1) 1c|(Join 2) 1c|(Join 3)            
10 12833 9216 17043 1394 995 1829            
9 12482 8702 16242 1358 949 1736            
8 12310 8444 15462 1336 895 1656            
7 11497 7782 14840 1250 843 1589            
6 11084 7484 14095 1203 807 1506            
5 10381 7073 13426 1119 765 1435            
4 9781 6769 12691 1058 729 1360            
3 9296 6357 11995 1014 689 1284            
2 8611 5967 11281 941 652 1209            
1 7990 5626 10578 778 587 1138            
0.75 7826 5335 10410 864 589 1123            
0.50 7668 5342 10260 802 597 1114            
0.25 7567 5420 10180 834 628 1128            
0.20 7569 5467 10195 848 645 1141            




  
Figure: Dependency on Z of the performance results for |R|=|Q|=121728 and the primary underflow strategy on the parallel architecture.




  
Figure: Dependency on Z of the performance results for |R|=|Q|=40000 and the primary underflow strategy on the parallel architecture.




  
Figure: Dependency on Z of the performance results for |R|=|Q|=121728 and the primary minimum-overlaps strategy on the parallel architecture.




  
Figure: Dependency on Z of the performance results for |R|=|Q|=40000 and the primary minimum-overlaps strategy on the parallel architecture.




  
Figure: Dependency on Z of the performance results for |R|=|Q|=121728 and the primary underflow strategy on the single-processor machine.




  
Figure: Dependency on Z of the performance results for |R|=|Q|=40000 and the primary underflow strategy on the single-processor machine.




  
Figure: Dependency on Z of the performance results for |R|=|Q|=121728 and the primary minimum-overlaps strategy on the single-processor machine.




  
Figure: Dependency on Z of the performance results for |R|=|Q|=40000 and the primary minimum-overlaps strategy on the single-processor machine.


next up previous contents index
Next: Dependency on Up: Experimental Evaluation Previous: Dependency on m

Thomas Zurek