In this chapter, we build a bridge between
First, we want to focus on the main conclusions that we can draw from what has been discussed so far. In chapter 2, the importance and significance of temporal databases for many applications has been motivated. One obstacle to the incorporation of temporal features into commercial products is the poor performance of operations involving temporal data. One performance critical operator is the temporal join. In chapter 3, we looked at algorithms that are traditionally used for the joins involving an equality join condition. This is the most frequent situation in conventional join processing and most algorithmic techniques have been tuned to perform well in these cases. In this context, explicit partitioning of the data has frequently proved to give the best performance results. In chapter 4, we analysed if and how the techniques that are used for processing equi-joins can be applied to processing temporal joins. In most cases, this transfer was straightforward. However, techniques that are based on explicit partitioning prove to be tricky: although they can still be expected to be amongst the most efficient, they impose a significant overhead as tuples have to be replicated between the relation fragments. The rate of tuple replication depends (i) on the characteristics of the temporal data and (ii) on the choice of the partition that is used for creating the fragments. While we cannot do anything about (i) we have seen that the choice (ii) of an appropriate partition is a delicate one. In chapter 5, we looked at this choice in more detail and analysed the complexity of the problem of finding an optimal partition. Optimal means that the partition should minimise the total number of tuple replications while creating fragments that do not exceed a certain maximum size. It was shown that this problem has a polynomial solution: there is an algorithm IP-opt with a run-time complexity of O(N2) where N is the number of different start- and endpoints occurring in the relation(s) that are to be partitioned. In practical terms, IP-opt is likely to be too inefficient as N is probably huge. From this evolves the need to have heuristic partitioning strategies which are more efficient with respect to the expense of creating only semi-optimal, rather than optimal, partitions.
Such heuristic partitioning strategies form part of a wider optimisation approach. The idea is that a query optimiser can choose the cheapest partition among those produced by various partitioning strategies. In order to determine which partition is the cheapest, we require a cost model of the respective temporal join processing technique. This cost model has to consider
A further possibility is to get some meta-information on the data which might be stored in the database catalog. We follow this approach and define IP-tables for this purpose. Chapter 7 discusses them in more detail.
A crucial part in this stage is the performance model for the respective partitioned temporal join processing technique. In reality, there will be a lot of such techniques and, consequently, the same number of cost models. Frequently, these techniques will be adapted to a target (sequential or parallel) hardware platform. Consequently, there is no single and generally usable cost model for partitioned temporal join processing but several. In chapter 8, we will model the performance of a sequential and a parallel technique and try to be as general as possible with respect to assumptions about the underlying hardware. These two cost models will also be used for evaluation purposes in chapter 10.
When looking at the dataflow in figure 6.1 we note that the optimisation process itself is highly parallel: each partitioning strategy initiates an independent thread. Only the final stage is the point of synchronisation when the results of each thread are analysed and the optimisation decision is taken.
In the following chapters, we will use the following simplified version of figure 6.1 to guide you through the optimisation approach by highlighting the stage that is respectively dealt with.