Optimisation of Partitioned Temporal Joins

PhD Thesis, November 1997, Department of Computer Science, University of Edinburgh

Abstract

Joins are the most expensive and performance-critical operations in relational database systems. In this thesis, we investigate processing techniques for joins that are based on a temporal intersection condition. Intuitively, such joins are used whenever one wants to match data from two or more relations that is valid at the same time. This is a scenario which is likely to appear in data warehouses.

This work is divided into two parts. First, we analyse techniques that have been proposed for equi-joins. Some of them have already been adapted for temporal join processing by other authors. However, hash-based and parallel techniques -- which are usually the most efficient ones in the context of equi-joins -- have only found little attraction and leave several temporal-specific issues unresolved. Hash-based and parallel techniques are based on explicit symmetric partitioning. In the case of an equi-join condition, partitioning can guarantee that the relations are split into disjoint fragments; in the case of a temporal intersection condition, partitioning usually results in non-disjoint fragments with a large number of tuples being replicated between fragments. This causes a considerable overhead for partitioned temporal join processing. This problem is an instance of the `min-max dilemma': minimising the number of replicated tuples means minimising the number of fragments, thus minimising the degree of parallelism -- however, increasing the number of fragments and therefore the degree of parallelism also increases the number of tuple replications. We analyse this problem and show that there is an algorithm of polynomial time complexity that computes an optimal solution for the interval partitioning problem (IP). This result concludes the analytical part.

In the second, the synthetical part of this work, we focus on the conclusions that can be drawn from the results of the first part. We propose an optimisation process that

analyses the temporal relations that participate in a temporal join,
proposes several possible partitions for these relations,
analyses these partitions and predicts their performance implications on the basis of a parameterised cost model, and
chooses the cheapest partition to process the temporal join.

We also show how this process can be efficiently implemented by using a new index structure, called the IP-table. The thesis is concluded by a thorough experimental evaluation of the optimisation process and a chapter that shows the suitability of IP-tables in a wider context of temporal query optimisation, namely using them to estimate selectivities of temporal join conditions.

Keywords: temporal join, parallel join, hash join, interval partitioning, temporal databases, parallel databases, data warehousing

Back to the PhD thesis index page

Thomas Zurek, mail to <tz@dcs.ed.ac.uk>, last change 26.11.97