In the experiment of previous section, the number m of fragments / partial joins was fixed to a value of 16 which matched the number of processors in the parallel architecture. Here, we want to find out whether this value was a good choice or whether higher values of m lead to better performances. In other words: we want to find out whether it is better to have a small number of relatively big partial joins or a large number of relatively small ones. There are no indications neither from the performance model (chapter 8) nor from the strategies themselves (chapter 9) whether one way or the other is better. Therefore, we have to rely on the observations made in the experiments. In order to reduce the complexity of the experiments and to concentrate on the main issues, we chose four of the eleven strategies on offer, namely the uniform lifespan, basic underflow, primary underflow and the primary minimum-overlaps strategies. The choice was led by the goal to select at least one member of each family and also those that are the most promising with respect to good performances.
Six experiments were conducted to test the dependency on m of these strategies, i.e. one per combination of join and hardware. The performance results are given in tables 10.4, 10.5 and 10.6 and visualised in figures 10.10 to 10.15.
First, we want to analyse the behaviour of the strategies on the
parallel architecture. In all cases, the primary underflow strategy is
a narrow winner before the primary minimum-overlaps strategy. Both
strategies show relatively high costs on low values of m and perform
better for higher values, with almost constant costs in the second
half of the chart. The costs of the uniform lifespan strategy are
between 2 and 3 times (join ), between 1.3 and 2 times
(join ) and between 2.2 and 3.5 times (join )higher than those of the primary underflow strategy. In general, it
also performs better for higher values of m. However, there are some
exceptions to this rule such as a sharp rise from m=896 to m=1024
for the join or some significant shaking in the area
between m=384 and m=1024 for the join
. A much more
irregular behaviour is shown by the basic underflow strategy. The
exact shape of its cost function can only be explained by looking into
details of the data. However, it seems that it performs best for lower
values of m, which stands in contrast to the other three strategies.
In general, it performs worse than the primary underflow and
minimum-overlaps strategies.
Now, we turn our attention to the results for the single-processor
hardware architecture. Here, the scene looks different: for lower
values of m the uniform lifespan strategy has between 10% and 30%
higher costs in comparison to the primary underflow strategy, whereas
for high values of m it performs between 1% and 4% better. This is
a surprising result. It means in other words that the uniform lifespan
strategy outperforms all other strategies if simply the value of m
is chosen to be high enough. However, the advantage in comparison to
other strategies is only very minor (see joins and ) or does not exist (see join ). A further
interesting result is that the primary minimum-overlaps strategy is
better (joins and
) or at least as good (join
) as the primary underflow strategy. This is due to the
facts that (a) a good load balance between the
partial joins is far less important on a single-processor machine in
comparison to a parallel one and that (b) the advantage of having a
reduced total number of overlapping intervals materialises in the
single-processor case.
From the results in tables 10.4, 10.5
and 10.6 we can also derive some information about the
speed-up . It is defined as the ratio between the
processing times that are required for the same problem on a machine
with n processors and on one with 1 processor. Ideally, the speed-up
is n in this case. In realistic situations, however, such a speed-up
is never achieved because of communication and synchronisation
overheads. If we compare the respective lowest costs on the parallel
and single-processor machines for the three joins, then we get
speed-ups of 9.5 for , 4.6 for and 10.6 for
.
4c!width 2pt, | 4c|, | |||||||||||||
m | 1c|Uniform | 1c|Primary | 1c|Primary | 1c!width 2ptBasic | 1c|Uniform | 1c|Primary | 1c|Primary | 1c|Basic | ||||||
1c|Lifespan | 1c|Underflow | 1c|Min.-O. | 1c!width 2ptUnderflow | 1c|Lifespan | 1c|Underflow | 1c|Min.-O. | 1c|Underflow | |||||||
16 | 3189 | 1077 | 1988 | 1490 | 12783 | 11370 | 11286 | 11807 | ||||||
32 | 2111 | 891 | 1359 | 1326 | 10580 | 9589 | 9464 | 9852 | ||||||
48 | 1930 | 843 | 1141 | 1439 | 9675 | 8839 | 8750 | 9650 | ||||||
64 | 1837 | 797 | 1178 | 1384 | 9117 | 8458 | 8394 | 9437 | ||||||
80 | 1743 | 788 | 1011 | 1529 | 8783 | 8228 | 8177 | 9299 | ||||||
96 | 1699 | 777 | 967 | 1749 | 8559 | 8009 | 8029 | 9389 | ||||||
112 | 1668 | 776 | 986 | 1713 | 8395 | 7952 | 7929 | 9328 | ||||||
128 | 1700 | 763 | 963 | 1791 | 8241 | 7847 | 7855 | 9133 | ||||||
256 | 1613 | 757 | 926 | 2884 | 7194 | 7633 | 7618 | 9209 | ||||||
384 | 1587 | 752 | 910 | 2808 | 7258 | 7580 | 7569 | 8980 | ||||||
512 | 1630 | 770 | 916 | 2465 | 7310 | 7578 | 7568 | 8773 | ||||||
640 | 1629 | 788 | 903 | 2143 | 7351 | 7586 | 7399 | 8708 | ||||||
768 | 1577 | 788 | 898 | 1612 | 7379 | 7586 | 7493 | 8633 | ||||||
896 | 1554 | 788 | 923 | 1657 | 7416 | 7586 | 7543 | 8649 | ||||||
1024 | 2501 | 788 | 935 | 1582 | 7093 | 7586 | 7594 | 8666 |
4c!width 2pt, | 4c|, | |||||||||||||
m | 1c|Uniform | 1c|Primary | 1c|Primary | 1c!width 2ptBasic | 1c|Uniform | 1c|Primary | 1c|Primary | 1c|Basic | ||||||
1c|Lifespan | 1c|Underflow | 1c|Min.-O. | 1c!width 2ptUnderflow | 1c|Lifespan | 1c|Underflow | 1c|Min.-O. | 1c|Underflow | |||||||
16 | 3779 | 1901 | 2458 | 1884 | 10322 | 9248 | 9029 | 8629 | ||||||
32 | 2771 | 1454 | 1747 | 1647 | 7793 | 6966 | 6995 | 7229 | ||||||
48 | 2431 | 1339 | 1556 | 1531 | 6942 | 6395 | 6451 | 6693 | ||||||
64 | 2238 | 1295 | 1380 | 1468 | 6522 | 6111 | 6109 | 6320 | ||||||
80 | 2160 | 1253 | 1424 | 1476 | 6258 | 5930 | 5931 | 6235 | ||||||
96 | 2089 | 1224 | 1351 | 1575 | 6096 | 5814 | 5814 | 5612 | ||||||
112 | 2040 | 1209 | 1389 | 1697 | 5974 | 5740 | 5736 | 5616 | ||||||
128 | 1914 | 1202 | 1396 | 1613 | 5874 | 5679 | 5685 | 5625 | ||||||
256 | 1795 | 1169 | 1324 | 2398 | 5333 | 5501 | 5336 | 5688 | ||||||
384 | 1763 | 1147 | 1301 | 2274 | 5335 | 5376 | 5358 | 5704 | ||||||
512 | 1523 | 1170 | 1239 | 2544 | 5353 | 5402 | 5390 | 5760 | ||||||
640 | 1551 | 1174 | 1247 | 3251 | 5375 | 5436 | 5425 | 5827 | ||||||
768 | 1758 | 1171 | 1260 | 3432 | 5393 | 5472 | 5462 | 5863 | ||||||
896 | 1598 | 1171 | 1263 | 3051 | 5420 | 5503 | 5474 | 5886 | ||||||
1024 | 1699 | 1171 | 1263 | 2417 | 5462 | 5503 | 5474 | 5939 |
4c!width 2pt, | 4c|, | |||||||||||||
m | 1c|Uniform | 1c|Primary | 1c|Primary | 1c!width 2ptBasic | 1c|Uniform | 1c|Primary | 1c|Primary | 1c|Basic | ||||||
1c|Lifespan | 1c|Underflow | 1c|Min.-O. | 1c!width 2ptUnderflow | 1c|Lifespan | 1c|Underflow | 1c|Min.-O. | 1c|Underflow | |||||||
16 | 4327 | 1243 | 1753 | 2122 | 19029 | 14551 | 14621 | 14739 | ||||||
32 | 3274 | 1058 | 1277 | 2165 | 14506 | 12153 | 12150 | 12878 | ||||||
48 | 2940 | 1001 | 1217 | 2281 | 12954 | 11389 | 11394 | 12330 | ||||||
64 | 2800 | 975 | 1111 | 2354 | 12189 | 10989 | 10986 | 12126 | ||||||
80 | 2683 | 964 | 1102 | 2590 | 11724 | 10773 | 10769 | 12076 | ||||||
96 | 2622 | 949 | 1060 | 2875 | 11413 | 10619 | 10616 | 12074 | ||||||
112 | 2575 | 936 | 1053 | 3111 | 11191 | 10514 | 10519 | 12030 | ||||||
128 | 2559 | 941 | 1047 | 4434 | 11010 | 10442 | 10441 | 11995 | ||||||
256 | 2451 | 938 | 1021 | 5912 | 10449 | 10214 | 10213 | 11973 | ||||||
384 | 2426 | 948 | 1013 | 6503 | 9964 | 10180 | 10179 | 12145 | ||||||
512 | 2447 | 963 | 1015 | 6522 | 9973 | 10195 | 10194 | 12062 | ||||||
640 | 2413 | 968 | 1025 | 6371 | 9992 | 10229 | 10228 | 12080 | ||||||
768 | 2430 | 997 | 1031 | 6117 | 10009 | 10272 | 10269 | 12095 | ||||||
896 | 2476 | 1009 | 1032 | 5626 | 10033 | 10323 | 10322 | 12001 | ||||||
1024 | 2252 | 1009 | 1048 | 2122 | 10072 | 10374 | 10376 | 12013 |