Next: Motivation Up: T. Zurek: Optimisation of Previous: Integration into a Query

IP-Tables

In this chapter, we look at stage 1 of the optimisation process as it has been outlined in the previous chapter: the stage of data analysis.

In general, there are two possibilities for representing the characteristics of the temporal data:

empirically, e.g. by a data sample , or
analytically, e.g. by functions $s_{\scriptscriptstyle R}$ , $e_{\scriptscriptstyle R}$ , $i_{\scriptscriptstyle R}$ , $o_{\scriptscriptstyle R}$ , etc.

We opt for the analytical approach because it avoids the fact that implicit information has to be made explicit at one point, thus imposing additional computational effort on later stages of the optimisation process. If the information about the temporal data is to be held explicitly then it has to be computed before it is actually required. Computing figures, such as the $s_{\scriptscriptstyle R}(t)$ , $e_{\scriptscriptstyle R}(t)$ , ..., from the temporal relation R at optimisation time would be very inefficient. This means that we need a data structure that can store this information efficiently in some convenient place within the database environment, e.g. in the database catalog.

In the following sections, we will motivate and define a data structure that serves for this purpose (sections 7.1 and 7.2). It is called an IP-table with IP referring to interval partitioning. Then we address the question of the size of IP-tables and how it can be reduced if this becomes necessary. This results in two new types of IP-tables (section 7.3). IP-tables, once they are created, can be updated whenever new tuples are inserted into or removed from the corresponding temporal relation. This avoids the necessity to recompute IP-tables after such updates. Section 7.4 looks at this issue. Finally, we present a way in which IP-tables of individual temporal relations can be merged to derive an IP-table that characterises the collection of intervals that arises from the union of the participating relations (section 7.5). IP-tables can be regarded as a form of histogram for interval data. In section 7.6, we will look at this relationship and try to identify similarities.

Next: Motivation Up: T. Zurek: Optimisation of Previous: Integration into a Query

Thomas Zurek