In this thesis, we have been focusing on temporal joins that are processed by symmetrically partitioning the participating relations in order to create a number of smaller and independent joins. This is based on the algebraic expression
![]() |
(67) |
With respect to join selectivity, this creates a new problem: a
fragment Rk represents a number of tuples of R that have been
selected according to certain rules. Therefore, an Rk does
not necessarily have the same statistical properties as R. Thus, if
we want to estimate the join selectivity of a partial join we cannot use statistical approaches such as in [Segev et al., 1993]
because they assume the same statistical properties for all
fragments. To see why this is not necessarily the case, consider again
the phone calls example mentioned in sections 2.1,
7.3.1 and 9.1.4. A fragment
Rk can, for example, hold phone calls made during Christmas time or
during a period with a promotional offer of cheap rates or during any
other type of period which causes a different consumer behaviour. This
means that Rk is likely to have widely different properties than
many other Rj with or R itself. However, one has to
keep in mind that the partial joins in (11.6) are processed
in parallel. Therefore it is the most expensive one among these that
determines the overall performance. Thus, if there is at least one
partial join whose result size ``gets out of hand'' for the reasons
mentioned above then this translates into an immediate performance
penalty. In order to avoid such a situation one requires a method that
can cope with statistical properties that vary over time.
Our analytical approach can tackle this problem. Consider, for example, a partial temporal intersection join where Rk and Qk hold tuples from R and Q, respectively, whose timestamps intersect with a time period that runs from time x to time y. Let
i.e. jx and jy are the indices of the start- or endpoints within the set that are closest to x and y, respectively, but inside the range between x and y, i.e. .
At the beginning, at time x, we have those tuples whose timestamps start before time x. There are and of these in Rk and Qk respectively. All their timestamps intersect. Thus all their combinations are in the result of . Furthermore and similar to the derivation of equation (11.5), there are those tuples in Rk whose timestamps start between x and y. These join with those tuples in Qk that intersect the respective startpoint. Therefore the result size of the partial intersection join is given by