When looking for an optimal partition in chapter 5, we found out that an optimal partition can always be found within the set E(R) of interval endpoints of a temporal relation R (theorem 1). The proof for this was essentially based on lemma 3. It showed the benefits of using interval endpoints as breakpoints of a partition because this can possibly reduce the number of overlapping intervals. This advantage does not only apply when we look for an optimal partition but shows that interval endpoints are probably good choices for breakpoints of a partition in any case: choosing the breakpoints from the intervals' endpoints should reduce the number of overlapping intervals. Therefore we can reduce an IP-table I(R) to entries concerning endpoints and call this an endpoint IP-table I''(R) .
Creating an endpoint IP-table is similar to condensing an IP-table as described in the previous section. The only difference is that we collapse those of the original timepoints that are in between two with into . See figure 7.8 for an example of this process. Formally, the creation of an endpoint IP-table can be described like this:
It also implies that - logically - all intervals that start at one of the original timepoints are now considered to start at t''j. To reflect this fact, we define a new function with values
(32) |
Theorem 29479
Let be a partition for R with pk-1 < pk for , and let and pm=t''N''. Then the following holds
(33) |
Now we can show the following
Corollary 29495
Let be a partition for R with pk-1 < pk for and let p0=t''1 - 1 and pm=t''N''. Then the following holds
(34) |
Figure 7.9 shows the endpoint IP-table I''(R)
for the example of figure 7.2. In contrast to
condensed IP-tables, we do not require the definition of additional
functions to make the formulas (5.1), (5.2) and
(5.3) work when using rather than . The
reason behind this is that exactly intervals end within
the range of t''j: by definition of an endpoint IP-table there can
be no interval ending at a timepoint between t''j-1 and t''j.
Therefore the accumulated number of intervals ending within the
range of t''j is . For the same reason, can
be computed by the number of overlaps occurring at t''j-1 plus the
number of intervals starting within (t''j-1,t''j]. Thus
(5.2) applies when replacing
by .Please note that - as in the case of condensed IP-tables - it is
We created the respective endpoint IP-table for the dataset examples that have been described in section 7.3.2. Table 7.3 shows the figures. Having more distributed temporal data, i.e. a longer underlying lifespan of the temporal relation (with a constant number of tuples), causes a greater difference between the sizes of the endpoint IP-table and a complete IP-table. This means that more space can be saved in these cases. An obvious example can be seen by comparing the figures in the Month-Lifespan columns of tables 7.2 and 7.3.
2|c||Day-Lifepan | 2|c||Week-Lifespan | 2|c|Month-Lifespan | |||||
Dataset R | Size |R| | |V''(R)| | |V''(R)|:|R| | |V''(R)| | |V''(R)|:|R| | |V''(R)| | |V''(R)|:|R| |
EPCC | 125185 | 1306 | 0.01 | 7804 | 0.06 | 26370 | 0.21 |
DEPT | 27206 | 1313 | 0.05 | 7010 | 0.26 | 16754 | 0.62 |
STUD | 27431 | 1218 | 0.04 | 6363 | 0.23 | 15731 | 0.57 |
FRANKFURT | 1995 | 210 | 0.11 |