next up previous contents index
Next: Impact of Black-Out Preprocessing Up: Experimental Evaluation Previous: The Architectural Influence

Influence of the Condensation Factor a

       

In section 7.3.3 we proposed to `condense' IP-tables in order to reduce their size. The idea was to collapse a of the original entries into one new entry. The new IP-table would have only of the entries and therefore only of the size of the original IP-table. A special form of condensation is the endpoint IP-table in which only those entries appear which correspond to interval endpoints. Theoretically, condensation can be expected to have two effects:

We set up experiments that test the impact of condensation on the primary underflow and the minimum-overlaps strategies (both with , ) when applied to the three joins. As usual, the experiments were conducted for a parallel architecture and on a single-processor machine. We used the set as values for a; `end' is thereby used as a symbolic value meaning that the respective IP-table was condensed by using endpoints only. The performance results are listed in tables 10.20 and 10.21 and visualised in figures 10.44, 10.45, 10.46 and 10.47. Figures 10.44 and 10.45 contain a lot of noise which makes it difficult to observe a certain trend. We therefore converted them to the graphs of figures 10.48 and 10.49 in which a value for a certain join and a certain a was computed by taking the average of the four preceding a values and a itself (`moving average'). This process smoothes the graphs and makes the general trends more visible. Finally, there is figure 10.50 that shows the times that were spent on the optimisation process only. Times are only given for as the times were almost constant for a>30.

First, we analyse the graph of figure 10.48 which shows the performances for primary underflow partitioning on a parallel architecture. As expected, the performances become worse for an increasing a, but only for the joins and .For the join $R
\Join_{\scriptscriptstyle C}Q$ there is a rather unexpected effect: for most values of a, the performance is slightly better in comparison to a=1 or . This is rather surprising and cannot be explained from an algorithmic point of view. Its origin lies either in the data itself or in the fact that the chosen XR and XQ values (see above) were not optimal for $R
\Join_{\scriptscriptstyle C}Q$ and that the summarising effect of condensation corrected this a little bit as outlined above. Also, when processing , $R
\Join_{\scriptscriptstyle C}Q$ and , the strategies use the timepoint sets

respectively. However, the set obviously contains much more timepoints to choose from than V'(R,a) or V'(Q,a). Because of condensation there are hardly coinciding timepoints which leads to

for . This advantage translates into a much better resistance against the negative effects of condensation for the join $R
\Join_{\scriptscriptstyle C}Q$in comparison to the joins and . Actually, we can observe this advantage in all the other charts in figures 10.49, 10.46 and 10.47 too.

Now we turn to figure 10.49 which shows the averaged performances on the parallel architecture for the primary minimum-overlaps strategy depending on a. We note that there is a rather positive effect of condensation in the first part of the chart. The best performances are achieved around a=16 (for ), a=22 (for $R
\Join_{\scriptscriptstyle C}Q$) and a=19 (for ). Previously, we observed that there is a relatively severe penalty caused by the concessions that the strategy makes with respect to the load balance in order to achieve a minimum number of overlaps. It seems that the strategy can make less concessions (a) if there are less timepoints from which it can choose and (b) if the values $s_{\scriptscriptstyle R}(t)$ and [*] are larger. Naturally, if the choice becomes too restricted then there is a negative impact on the performance. Therefore, there is a tradeoff between restricting the choice a little bit but not too much. This is exactly what can be seen in the graphs of figure 10.49.

The scenery changes when it comes to the single-processor architecture (figures 10.46 and 10.47). For the joins (join 1) and (join 3), the impact of an increasing condensation of the IP-tables on the performance is generally negative[*]. Again, processing of the join $R
\Join_{\scriptscriptstyle C}Q$(join 2) resists much better against the negative effects of condensation because of the reasons mentioned above. We can even observe a slight performance benefit which is probably caused by the same reasons as mentioned earlier.

Finally, we turn our attention to the effect that condensed IP-tables have on the performance of the optimisation itself. Figure 10.50 shows the times that the optimisation process took when being run on a two-processor Sun SS20 computing server. As expected, we see that the primary minimum-overlaps partitioning benefits most of it due to its time complexity of O(N2). It is interesting that - in terms of elapsed time - it is roughly as fast as primary underflow partitioning for .

In summary, it is fair to say that experiments show that condensation does not only have positive effects on the performance the optimisation process itself but also that the anticipated negative effects are not that severe at all. The primary minimum-overlaps strategy in particular can draw performance benefits when using condensed IP-tables. Generally, it is possible to say that a condensation factor a between approx. 10 and 20 can improve the performances in many cases or at least does not severely penalise the performances in the other cases. This is an interesting result with respect to section 7.3 in which we addressed the problem of IP-table size. It means that the sizes of IP-tables which were already comparable or lower than those of data samples could be further decreased to between to of the original size without paying a severe performance penalty and sometimes even improving the join performances.


 
 
Table: Performance results (in sec.) on the parallel architecture depending on a varying condensation factor a for the IP-tables.
  3c!width 2ptPrimary Underflow 3c|Primary Min.-Overlaps                    
a $R
\Join_{\scriptscriptstyle C}Q$ $R
\Join_{\scriptscriptstyle C}Q$            
  1c|(Join 1) 1c|(Join 2) 1c!width 2pt(Join 3) 1c|(Join 1) 1c|(Join 2) 1c|(Join 3)            
end 716 1141 896 1111 1336 1114            
1 716 1141 896 1111 1336 1114            
2 807 1131 976 1085 1327 1109            
3 795 1178 962 1026 1346 1063            
4 801 1140 957 1007 1302 1044            
5 777 1105 947 979 1234 1014            
6 758 1144 921 968 1237 948            
7 819 1083 925 981 1229 1021            
8 780 1107 889 932 1299 1063            
9 787 1138 1026 920 1302 1028            
10 823 1063 950 935 1139 926            
11 825 1091 942 986 1282 975            
12 791 1096 1004 931 1318 1045            
13 814 1095 1020 908 1205 1117            
14 749 1089 1067 848 1180 1058            
15 802 1160 1041 853 1217 966            
16 771 1038 1006 936 1176 970            
17 793 1050 913 952 1207 1039            
18 792 1179 933 964 1122 960            
19 820 1117 966 985 1132 1019            
20 788 1242 1002 959 1128 1055            
21 755 1090 1041 1049 1159 1101            
22 804 1055 1027 1024 1111 1144            
23 774 1016 1061 1011 1301 1100            
24 864 1036 1047 1079 1274 1141            
25 859 999 1027 1065 1172 1166            
26 762 1030 1089 1076 1188 1112            
27 856 1061 1098 1097 1175 1154            
28 918 1102 1179 1117 1147 1173            
29 1000 1009 1092 1087 1206 1096            
30 959 1073 1114 1054 1067 1152            
32 871 1103 1236 969 1069 1227            
36 1015 1057 1249 1134 1071 1238            
40 971 1064 1217 1161 1196 1196            
44 1070 1016 1327 1331 1283 1283            
48 1058 1116 1260 1334 1154 1265            
52 1139 1043 1352 1096 1277 1299            
56 1140 1165 1420 1410 1201 1420            
60 1237 1235 1288 1643 1212 1350            


 
 
Table: Performance results (in sec.) on the single-processor machine depending on a varying condensation factor a for the IP-tables.
  3c!width 2ptPrimary Underflow 3c|Primary Min.-Overlaps                    
a $R
\Join_{\scriptscriptstyle C}Q$ $R
\Join_{\scriptscriptstyle C}Q$            
  1c|(Join 1) 1c|(Join 2) 1c!width 2pt(Join 3) 1c|(Join 1) 1c|(Join 2) 1c|(Join 3)            
a Join 1 Join 2 Join 3 Join 1 Join 2 Join 3            
end 7898 5560 10410 7877 5362 10410            
1 7898 5560 10410 7877 5361 10410            
2 7897 5555 10406 7880 5572 10406            
3 7934 5550 10402 7882 5564 10400            
4 7922 5554 10394 7888 5567 10394            
5 7976 5550 10394 7888 5564 10389            
6 7970 5541 10390 7900 5371 10383            
7 7961 5531 10381 7898 5371 10379            
8 7985 5532 10378 7894 5562 10366            
9 7993 5377 10370 7896 5373 10379            
10 7957 5386 10366 7898 5370 10361            
11 7991 5378 10365 7914 5546 10345            
12 8005 5377 10372 7913 5380 10355            
13 8013 5363 10368 7916 5376 10365            
14 8009 5381 10354 7922 5389 10359            
15 8037 5380 10344 7915 5388 10341            
16 8034 5386 10331 7556 5390 10320            
17 8065 5388 10331 7922 5389 10311            
18 8088 5396 10332 7928 5402 10311            
19 8098 5382 10333 7952 5389 10314            
20 8107 5360 10330 7977 5371 10311            
21 8144 5377 10332 7999 5400 10313            
22 8177 5367 10344 8023 5397 10322            
23 8191 5368 10353 8047 5404 10331            
24 8204 5377 10363 8075 5419 10342            
25 8214 5368 10369 8087 5414 10353            
26 8245 5367 10382 8115 5415 10364            
27 8282 5374 10393 8140 5421 10377            
28 8314 5354 10405 8163 5404 10392            
29 8321 5350 10416 8188 5376 10403            
30 8335 5354 10429 8215 5404 10418            
32 8395 5358 10457 8259 5390 10441            
36 8490 5362 10511 8370 5380 10499            
40 8583 5335 10567 8468 5344 10554            
44 8687 5306 10627 8574 5313 10617            
48 8785 5321 10685 8669 5316 10677            
52 8859 5306 10744 8770 5302 10735            
56 8961 5325 10807 8863 5314 10796            
60 9044 5293 10871 8952 5281 10863            


 

 

 

 

 

 

 


next up previous contents index
Next: Impact of Black-Out Preprocessing Up: Experimental Evaluation Previous: The Architectural Influence

Thomas Zurek