Endpoint IP-Tables

Next: Maintaining IP-Tables Up: Size Considerations Previous: Condensation of IP-Tables

Subsections

Endpoint IP-Tables

When looking for an optimal partition in chapter 5, we found out that an optimal partition can always be found within the set E(R) of interval endpoints of a temporal relation R (theorem 1). The proof for this was essentially based on lemma 3. It showed the benefits of using interval endpoints as breakpoints of a partition because this can possibly reduce the number of overlapping intervals. This advantage does not only apply when we look for an optimal partition but shows that interval endpoints are probably good choices for breakpoints of a partition in any case: choosing the breakpoints from the intervals' endpoints should reduce the number of overlapping intervals. Therefore we can reduce an IP-table I(R) to entries concerning endpoints and call this an endpoint IP-table I''(R) .

Creating an endpoint IP-table is similar to condensing an IP-table as described in the previous section. The only difference is that we collapse those of the original timepoints that are in between two with into . See figure 7.8 for an example of this process. Formally, the creation of an endpoint IP-table can be described like this:

Let
with t''_j < t''_j+1 for .
We note that it is always . As a consequence we get t_N = t''_N''.
Let f be the function that maps the index j of a to the index h for a such that t''_j = t_h = t_f(j), i.e. f(j) = h. To simplify following formulas, we define f(0)=0.

We collapse the original timepoints into the new timepoint t''_j for . As in section 7.3.3, this implies

for all .

**Figure:** Collapsing timepoints into interval endpoints for the example of figure 7.2.

It also implies that - logically - all intervals that start at one of the original timepoints are now considered to start at t''_j. To reflect this fact, we define a new function with values

(32)

for and for all other values it is

We still get a correct result when Using instead of $s_{\scriptscriptstyle R}$ in the formula (7.1). This is due to the fact that if I''(R) is used for partitioning rather than I(R) then the resulting partition is a subset of V''(R), thus for all . We formally prove this in the following

Theorem 29479

Let be a partition for R with p_k-1 < p_k for , and let and p_m=t''_N''. Then the following holds

(33)

for all $k=1,\dots,m$ .

Proof:

For a let p_k-1 = t''_x and p_k = t''_y with .We define . Then it is

Now we can show the following

Corollary 29495

Let be a partition for R with p_k-1 < p_k for and let p₀=t''₁ - 1 and p_m=t''_N''. Then the following holds

(34)

for all $k=1,\dots,m$ .

Proof:

Trivial because of theorem 6.

Figure 7.9 shows the endpoint IP-table I''(R) for the example of figure 7.2. In contrast to condensed IP-tables, we do not require the definition of additional functions to make the formulas (5.1), (5.2) and (5.3) work when using rather than $s_{\scriptscriptstyle R}$ . The reason behind this is that exactly intervals end within the range of t''_j: by definition of an endpoint IP-table there can be no interval ending at a timepoint between t''_j-1 and t''_j. Therefore the accumulated number of intervals ending within the range of t''_j is . For the same reason, can be computed by the number of overlaps occurring at t''_j-1 plus the number of intervals starting within (t''_j-1,t''_j]. Thus (5.2) applies when replacing $s_{\scriptscriptstyle R}$ by .Please note that - as in the case of condensed IP-tables - it is

for . (5.1) is not affected as it does not involve $s_{\scriptscriptstyle R}$ .

**Figure:** The IP-table I''(R) (in bold typeface) for the intervals in figure 7.2 plus the values and .

We created the respective endpoint IP-table for the dataset examples that have been described in section 7.3.2. Table 7.3 shows the figures. Having more distributed temporal data, i.e. a longer underlying lifespan of the temporal relation (with a constant number of tuples), causes a greater difference between the sizes of the endpoint IP-table and a complete IP-table. This means that more space can be saved in these cases. An obvious example can be seen by comparing the figures in the Month-Lifespan columns of tables 7.2 and 7.3.

**Table:** Endpoint IP-table characteristics of some real-world temporal relations.
		2\|c\|\|Day-Lifepan	2\|c\|\|Week-Lifespan	2\|c\|Month-Lifespan
Dataset R	Size \|R\|	\|V''(R)\|	\|V''(R)\|:\|R\|	\|V''(R)\|	\|V''(R)\|:\|R\|	\|V''(R)\|	\|V''(R)\|:\|R\|
EPCC	125185	1306	0.01	7804	0.06	26370	0.21
DEPT	27206	1313	0.05	7010	0.26	16754	0.62
STUD	27431	1218	0.04	6363	0.23	15731	0.57
FRANKFURT	1995	210	0.11

Next: Maintaining IP-Tables Up: Size Considerations Previous: Condensation of IP-Tables

Thomas Zurek