next up previous contents index
Next: Summary Up: Size and Selectivity Calculations Previous: Composite Joins

Parallel and Other Partitioned Joins

  

In this thesis, we have been focusing on temporal joins that are processed by symmetrically partitioning the participating relations in order to create a number of smaller and independent joins. This is based on the algebraic expression  
 \begin{displaymath}
R \Join_{\scriptscriptstyle C}Q \;=\;
R_1 \Join_{\scriptscri...
 ..._1 \,\cup\, \cdots \,\cup\, R_m \Join_{\scriptscriptstyle C}Q_m\end{displaymath} (67)
where equally indexed fragments Rk and Qk are created in such a way that they hold tuples whose timestamps  can possibly join with each other ($k=1,\dots,m$). In the case of the temporal intersection join , for example, this means that an Rk holds those tuples whose timestamps intersect with a certain range of the timeline; the corresponding Qk is created in the same way.

With respect to join selectivity, this creates a new problem: a fragment Rk represents a number of tuples of R that have been selected according to certain rules. Therefore, an Rk does not necessarily have the same statistical properties as R. Thus, if we want to estimate the join selectivity of a partial join $R_k \Join_{\scriptscriptstyle C}Q_k$ we cannot use statistical approaches such as in [Segev et al., 1993] because they assume the same statistical properties for all fragments. To see why this is not necessarily the case, consider again the phone calls example mentioned in sections 2.1, 7.3.1 and 9.1.4. A fragment Rk can, for example, hold phone calls made during Christmas time or during a period with a promotional offer of cheap rates or during any other type of period which causes a different consumer behaviour. This means that Rk is likely to have widely different properties than many other Rj with or R itself. However, one has to keep in mind that the partial joins in (11.6) are processed in parallel. Therefore it is the most expensive one among these that determines the overall performance. Thus, if there is at least one partial join whose result size ``gets out of hand'' for the reasons mentioned above then this translates into an immediate performance penalty. In order to avoid such a situation one requires a method that can cope with statistical properties that vary over time.

Our analytical approach can tackle this problem. Consider, for example, a partial temporal intersection join where Rk and Qk hold tuples from R and Q, respectively, whose timestamps  intersect with a time period that runs from time x to time y. Let

i.e. jx and jy are the indices of the start- or endpoints within the set that are closest to x and y, respectively, but inside the range between x and y, i.e. .

At the beginning, at time x, we have those tuples whose timestamps  start before time x. There are and of these in Rk and Qk respectively. All their timestamps intersect. Thus all their combinations are in the result of . Furthermore and similar to the derivation of equation (11.5), there are those tuples in Rk whose timestamps start between x and y. These join with those tuples in Qk that intersect the respective startpoint. Therefore the result size of the partial intersection join is given by


next up previous contents index
Next: Summary Up: Size and Selectivity Calculations Previous: Composite Joins

Thomas Zurek