Composite Joins

Next: Parallel and Other Partitioned Up: Size and Selectivity Calculations Previous: Elementary Joins

Composite Joins

We now turn to the calculation of result sizes of composite temporal joins. Actually, one can concentrate on showing how the intersections and unions of two (elementary, and later also composite) join results translate into formulas for calculating the respective result sizes. With composite joins we will have to rely on estimations rather than exact values because we cannot assume to the existence of IP-tables for the join results. Unfortunately, they cannot be calculated from the initial IP-tables and would have to be created by scanning the join result which is too expensive.

First, we discuss the intersection of two joins, as in the case of an equal join . As stated in table 11.2, it can be considered as an intersection of a start- and a finish join. We take the view that a selectivity factor of a join $R \Join_{\scriptscriptstyle C}Q$ gives the probability with which a tuple combination $r \circ q$ satisfies C. If $r \circ q$ satisfies the start join condition with a probability of and the finish join condition with a probability of then it qualifies with a probability of

(65)

for the equal join according the multiplication rule for independent probabilities [Bronstein and Semendjajew, 1987]. In order to emphasise the fact that this step is an approximation rather than an exact analytical result we use the symbol in (11.2). In fact, (11.2) requires some careful consideration: and are independent if and only if a tuple combination $r \circ q$ satisfies the start join condition irrespective from the question whether it satisfies the finish join condition. In other words, the interval startpoints must follow a probability distribution as well as the intervals' lengths and therefore the endpoints. If this is the case, then (11.2) means that the result size of the equal join is

This is only an instance of the more general formula

(66)

if the join selectivities of the two joins are independent from each other as outlined above.

Now we look at the union of two joins as in the case of an overlap join . As shown in table 11.2, it can be regarded as the union of a left-overlap join and a right-overlap join . This case can be treated by the rule from set theory for calculating the size of the union of two sets and :

This translates into

An example for applying (11.4) is the overlap join. However, we cannot just add up numbers because some tuples might appear in both joins, in this case these are those combinations $r \circ q$ in which r's timestamp is contained inside q's timestamp. So we have to deduct the number of tuples falling in the intersection of the elementary joins. Thus the result size is

which reflects (11.4).

Using (11.3) and (11.4), we can now break down complex temporal join conditions into smaller ones until we have only a set of elementary ones which can be computed according to the formulas given in section 11.3.1. This can be, for example, applied to the contain , during and intersection joins .

The latter one, however, can alternatively be treated as one of the elementary ones as the IP-tables provide sufficient information for calculating its size exactly: consider a timepoint t with $s_{\scriptscriptstyle R}(t)$ tuple timestamps in R starting at t. Then these intervals intersect with exactly tuple timestamps of Q. Alternatively, one can start with tuples of Q: timestamps start at t and intersect with $i_{\scriptscriptstyle R}(t)$ tuple timestamps in R. Further alternatives involve the consideration of intervals' endpoints. For reasons mentioned above, one can concentrate on the t_j. All in all, we get the following equations for computing the result size of an intersection join:

Next: Parallel and Other Partitioned Up: Size and Selectivity Calculations Previous: Elementary Joins

Thomas Zurek