Next: Temporal-Specific Join Optimisation Issues Up: Temporal Join Processing Previous: Spatially Partitioned Temporal Join

A Short Summary

Having discussed several temporal join algorithms and techniques in the last two sections, we want to compare them in order to summarise their characteristics. We confirm our conclusions by reviewing the situations that were illustrated in figures 4.2, 4.4, 4.6, 4.8, 4.11, 4.15 and 4.16. Certainly, this considers only one specific example and therefore does not represent a thorough analysis. Nevertheless, it will give some insights into the characteristics of the algorithms.

First, we look at the sequential techniques. One significant performance characteristic then is the total number of tuple comparisons that are performed by the matching stage of the respective algorithm in order to get the result tuples. The nested-loop join is still the worst because it performs an exhaustive search. The difference to other methods is not that big as in the case of a comparable equi-join due to the higher selectivity factor of the temporal intersection in comparison to the equality predicate. The join index is still the clear winner as the join is virtually precomputed; the algorithm only composes the result. This leads to a minimal number of tuple comparisons. Although the example suggests the sort-merge, the improved range partitioning and the spatial partitioning technique to be at equal levels, there is a significant difference which lies in the partitioning method by which they achieve their performance:

If the relations are not sorted then the sort-merge might require a possibly expensive sorting stage prior to the matching stage.
Range partitioning needs to replicate tuples and imposes therefore an additional overhead apart from the partitioning effort itself. When processed sequentially, the physical replication of tuples can be replaced by a logical one which might still cause replicated tuple comparisons.
The spatial partitioning approach does not require replication. However, it creates fragments that have to be joined not only with one but several fragments from other relations that participate in the join. This problem is very severe if there are more than two relations involved.

Hence, the relatively low number of tuple comparisons in the matching stages of these algorithms are achieved by moving processing effort to a partitioning stage that precedes the matching. The partitioning differs widely and has its drawbacks:

sorting - if necessary - is generally more expensive than range or spatial partitioning,
range partitioning needs to replicate tuples and imposes an overhead on the matching stage,
spatial partitioning requires the matching stage to perform a large number of partial joins with tuples of one relation being accessed more than once.

Now, we turn to the parallel techniques. Although the fragment-and-replicate method performs equally with the others in terms of maximum tuple comparisons per partial join, there is the replication effort that is significantly higher in comparison to the other techniques. The partial joins of the fragment-and-replicate joins are better balanced than those based on symmetric partitioning. This is no coincidence as there are no constraints on partitioning. Therefore, it is easy to balance the fragments. The spatially partitioned join performs relatively well for the example. But remember that the example was particularly nice (see matrix in figure 4.14(b)). And again, in the case of three or more relations participating in the join, there might be a huge number of fragment combinations that have to be joined in order to compute the global result. This makes it also much more difficult to combine several partial joins to a kind of `super-partial join', as in figure 4.16 where two of the original partial joins were respectively combined.

The extent of the replication overhead of the fragment-and-replicate approach and the problems of spatially partitioned n-way joins for $n \ge 3$ makes the range partitioning approach a possible compromise. In fact, one can expect its performance to be among the best. Furthermore it is fairly robust, e.g. it can be easily adapted to process n-way joins for $n \ge 3$ . Finally, it is a straightforward adaption of the equi-join range (hash) partitioning method. Therefore many equi-join optimisation techniques can be applied, too. However, tuple replication has to be of concern, as a poor choice of ranges can increase the replication rate. Optimisation issues for range partitioned temporal joins are discussed in the following section.

Next: Temporal-Specific Join Optimisation Issues Up: Temporal Join Processing Previous: Spatially Partitioned Temporal Join

Thomas Zurek