Optimisation of Partitioned Temporal Joins

Thomas Zurek

PhD Thesis, November 1997, Department of Computer Science, University of Edinburgh

Synopsis

This thesis comprises eleven chapters apart from this introduction. They can be divided into two major parts:

the analytical part in chapters 2 to 6, in which we motivate the research problem review the literature and establish the relevance of this research. The results of this analysis lead to the creation of an optimisation process which is the starting point of
the synthetical part in chapters 7 to 11, in which the optimisation process is elaborated and evaluated.

We now give an overview over these two parts.

Analytical Part

In chapter 2, we give an introduction to the area of temporal databases. To this end, we explain basic concepts and notations that will be used throughout this thesis. Temporal databases are contrasted with conventional databases in order to elaborate the temporal-specific research issues. One of these is lack of efficiency of performance-critical temporal operators, such as the temporal join. The latter will be the focus of this work. Finally, we look at the relevance of temporal databases in the commercial world.

In chapter 3, the huge variety of general join techniques is described. The join operation has been intensively analysed by a large number of researchers. However, most of the algorithms that have been proposed are tuned to perform well with equi-joins, as empirical research suggests that equality conditions appear in over 90% of joins in conventional database processing. Usually, temporal join conditions are based on nonequi-join conditions. Therefore, one needs to investigate how well equi-join techniques can cope with nonequi-joins.

This problem is the focus of chapter 4. Here, we review the literature and describe the many adaptions that have been proposed for temporal join processing. It emerges that parallel and hash join techniques -- usually the most efficient techniques -- behave in a significantly different way when applied to temporal joins. The basic problem is that most temporal join conditions require tuples to be replicated between relation fragments. This is not the case for equi-join conditions. Tuple replications, however, cause an overhead in join processing. It is a major task of this work to investigate and to tackle this overhead.

To this end, we analyse the problem of partitioning a collection of intervals in such a way that the resulting fragments have a limited size while the number of necessary tuple replications is minimised. This is called the interval partitioning (IP) problem. The analysis in chapter 5 provides a variety of interesting results that are used at various stages of this thesis.

The analytical part is concluded in chapter 6 which summarises the main results. These lead to the creation of a process that optimises partitioned temporal join processing by choosing the most appropriate partition for the data. The structure of the remainder of the thesis is mainly based on the structure of this process.

Synthetical Part

In chapter 7, we look at the first stage of the optimisation process, namely the stage of data analysis. Along with the information that describe the characteristics of the timestamp intervals, we propose to maintain a new metadata-structure called the IP-table. We give a formal definition of an IP-table, compare it to alternative approaches, such as using data samples, show how the size of an IP-table can be decreased by condensing timepoints, provide algorithms for the maintenance of IP-tables and finally describe how two or more IP-tables can be merged. All these operations are required in the context of the optimisation process.

In chapter 8, we develop a detailed performance model for partitioned temporal join processing. This is done in three stages: firstly, we discuss issues concerning the underlying hardware architecture and create an architectural model; secondly, we describe how a partitioned temporal join is processed on the architectural model -- this is referred to as the temporal join processing model; finally, we derive a detailed cost model for temporal join processing. The chapter is concluded by a simple experimental evaluation that provides some insight into the characteristics of the performance model. This information helps us to identify certain performance-critical issues.

Chapter 9 discusses several families of partitioning strategies. Every strategy can provides a partition candidate for the optimisation process. Obviously, there is a huge variety of such strategies as each one has a certain optimisation goal as its target.

In chapter 10, we conduct a thorough experimental evaluation of the techniques and methods that have been developed in preceding chapters. This gives some insight into the characteristics of the partitioning strategies and their performance impact.

In chapter 11, we show that IP-tables can be used not only for partitioning purposes, but also to estimate the selectivity of temporal joins. In conventional database management systems, selectivity estimation is an important tool for a query optimiser to distribute the load and balance query processing.

Finally, the thesis is concluded in chapter 12 which summarises the work and its major contributions and holds an outlook to future work.

Back to the PhD thesis index page

Thomas Zurek, mail to <tz@dcs.ed.ac.uk>, last change 21.11.97