We now turn to the calculation of result sizes of composite temporal joins. Actually, one can concentrate on showing how the intersections and unions of two (elementary, and later also composite) join results translate into formulas for calculating the respective result sizes. With composite joins we will have to rely on estimations rather than exact values because we cannot assume to the existence of IP-tables for the join results. Unfortunately, they cannot be calculated from the initial IP-tables and would have to be created by scanning the join result which is too expensive.
First, we discuss the intersection of two joins, as in the case of an equal join . As stated in table 11.2, it can be
considered as an intersection of a start- and a finish join. We take
the view that a selectivity factor of a join gives the
probability with which a tuple combination
satisfies C.
If
satisfies the start join condition with a probability
of and the finish join condition with a probability
of then it qualifies with a probability of
(65) |
This is only an instance of the more general formula
(66) |
Now we look at the union of two joins as in the case of an overlap join . As shown in table 11.2, it can be regarded as the union of a left-overlap join and a right-overlap join . This case can be treated by the rule from set theory for calculating the size of the union of two sets and :
This translates into
in which r's timestamp is contained inside
q's timestamp. So we have to deduct the number of tuples falling in
the intersection of the elementary joins. Thus the result size is
Using (11.3) and (11.4), we can now break down complex temporal join conditions into smaller ones until we have only a set of elementary ones which can be computed according to the formulas given in section 11.3.1. This can be, for example, applied to the contain , during and intersection joins .
The latter one, however, can alternatively be treated as one of the
elementary ones as the IP-tables provide sufficient information for
calculating its size exactly: consider a timepoint t with tuple timestamps in R starting at t. Then these
intervals intersect with exactly tuple timestamps of Q.
Alternatively, one can start with tuples of Q: timestamps
start at t and intersect with
tuple timestamps in R.
Further alternatives involve the consideration of intervals'
endpoints. For reasons mentioned above, one can concentrate on the
tj. All in all, we get the following equations for computing the
result size of an intersection join: