Next: The Interval Partitioning Problem Up: Temporal Join Processing Previous: A Short Summary

Temporal-Specific Join Optimisation Issues

The discussions of temporal join techniques in sections 4.3 and 4.4 have shown that many performance related issues are not specific to temporal joins but similar to the case of the equi-join which is well investigated. Hence, many well-known optimisation techniques can be applied when processing temporal joins.

However, the necessary replication of tuples in the case of range partitioned joins is a problem which is specific to temporal joins. As outlined in section 4.4.1, it produces overheads of various types. Therefore, controlling the extent of replication has to be a major task of the optimisation process for these joins.

There are two problems that have to be solved:

1.: The choice of the ranges, i.e. the partition points on the time line, is a very delicate one. Consider, for example, figure 4.17 which shows the scenario for our join example when an alternative set of partitioning ranges is used: more tuples have to be replicated, there are more mis-hits and the load balance is slightly worse in comparison to the situation in figure 4.11.
The main part of this thesis will look at this problem. In chapter 5 we analyse the problem of finding partitioning ranges that minimise the total number of replicated tuples while preserving a certain maximum number of intervals per fragment. Afterwards, in chapters 6-10, the problem is tackled from a practical point of view: we develop and describe an optimisation process for choosing a `good' partition for processing a temporal join and show how this process can be efficiently implemented.
2.: Query optimisers usually estimate the sizes of join results . This information is used for taking optimisation decisions and sometimes also to provide the user with an estimate of the result size. This might help him/her to check if his/her query delivers the desired result. If the optimiser's estimate predicts a result size of 1 million rows, for example, while the user expects only a handful then there is a good reason for the user to assume that the query might have to be rewritten.
The estimation of final and intermediate join results is already a challenging task because of the nonequi-character of temporal join conditions. It becomes even more difficult when tuples are replicated: the sizes of the join fragments and the partial join results cannot be computed by incorporating `static' facts - i.e. facts that are known in advance from metadata information such as relation size, number of different attribute values, distribution of these values (e.g. in form of histograms) etc. - but also on query evaluation parameters such as the replication of tuples imposed by the partition ranges.
In chapter 11 we will show a method to estimate temporal join selectivities. It is based on the same concepts as the optimisation of the partitioning process that was motivated in 1.

**Figure:** Search strategy for the improved (range) partitioned temporal join with different partitioning ranges (compare with figure 4.11).
$\begin{figure} \epsfxsize=0.9\textwidth \centerline{ \epsffile{/home/tz/work/thesis/fig/tj-imp-2.ps}} \centerline{ } \end{figure}$

Next: The Interval Partitioning Problem Up: Temporal Join Processing Previous: A Short Summary

Thomas Zurek