next up previous contents index
Next: Simple Temporal Hash Join Up: Explicit-Partitioning Join Algorithms Previous: Explicit-Partitioning Join Algorithms

Overview

  In this section, we consider join techniques that include an explicit partitioning stage as an integral part of the join algorithm. They are shown in grey boxes in figure 4.7. In the case of the equi-join these techniques have proved to be very efficient and very versatile, especially allowing the parallelisation of the join operation. We might expect them to be similarly successful in the case of temporal joins.




  
Figure: Explicit partitioning joins.
\begin{figure}
 \epsfxsize=0.9\textwidth
 \centerline{
 \epsffile{/home/tz/work/thesis/fig/explicit.ps}}
 \centerline{
 } \end{figure}

We will merely concentrate on the symmetric partitioning  approach, i.e. both participating relations are partitioned; the fragment-and-replicate strategy  that was presented in section 3.5.1 is not affected by the actual join condition and therefore works in the same way for temporal joins as it does for any other type of join.

The basic temporal join strategy employing symmetric partitioning is built upon equation (3.6). It will be discussed in section 4.4.2. This will reveal the problem that the fragments Rk ($k=1,\dots,m$) cannot be disjoint because of the temporal intersection condition. This leads to certain negative implications:

1.
Replication Overhead:  During the partitioning process, a lot of tuples have to be (logically) replicated to be placed into several fragments. Please note that this logical replication   does not necessarily translate into a physical replication : when working on a shared-memory machine tuple fragments might be represented as an index, i.e. a set of pointers that refer to the actual locations in memory or on disk where the tuples are stored. In this case, the logical replication causes an additional effort when building these indices. When working on a shared-nothing architecture, however, logical replication is likely to be translated into a physical replication. In both cases, logical replication causes an overhead.  
2.
Processing Overhead:  Because of tuple replication, the individual fragments become larger. Hence, processing the partial joins $R_k \Join_{\scriptscriptstyle C}Q_k$ requires more effort. This causes a processing overhead.  
3.
Duplicates Overhead:  Tuple replication can also produce duplicates in the result which either cause a further overhead in subsequent stages of a query evaluation or which have to be removed which itself is a potentially expensive process.  
We will present algorithms that partially tackle these effects: As mentioned above, we will concentrate on the temporal intersection join. Many of its subtypes allow optimisations such as restricting the replication of tuples to one relation. During or contain joins are examples for that  . Leung and Muntz defined an assymmetry property  in order to identify join conditions which lead into such situations [Leung and Muntz, 1992]. Here, however, we assume the situation of the intersection join  in which both (or, more generally, all) participating relations require replication.


next up previous contents index
Next: Simple Temporal Hash Join Up: Explicit-Partitioning Join Algorithms Previous: Explicit-Partitioning Join Algorithms

Thomas Zurek