next up previous contents index
Next: Maintaining IP-Tables Up: Size Considerations Previous: Condensation of IP-Tables

Subsections

Endpoint IP-Tables

  

When looking for an optimal partition  in chapter 5, we found out that an optimal partition can always be found within the set E(R) of interval endpoints of a temporal relation R (theorem 1). The proof for this was essentially based on lemma 3. It showed the benefits of using interval endpoints as breakpoints of a partition because this can possibly reduce the number of overlapping intervals. This advantage does not only apply when we look for an optimal partition but shows that interval endpoints are probably good choices for breakpoints of a partition in any case: choosing the breakpoints from the intervals' endpoints should reduce the number of overlapping intervals. Therefore we can reduce an IP-table I(R) to entries concerning endpoints and call this an endpoint IP-table I''(R) .

Creating an endpoint IP-table is similar to condensing an IP-table as described in the previous section. The only difference is that we collapse those of the original timepoints that are in between two with    into . See figure 7.8 for an example of this process. Formally, the creation of an endpoint IP-table can be described like this:

We collapse the original timepoints into the new timepoint t''j for . As in section 7.3.3, this implies

for all .




  
Figure: Collapsing timepoints into interval endpoints for the example of figure 7.2.

It also implies that - logically - all intervals that start at one of the original timepoints are now considered to start at t''j. To reflect this fact, we define a new function with values  
  (32)
for and for all other values it is

We still get a correct result when Using instead of $s_{\scriptscriptstyle R}$ in the formula (7.1). This is due to the fact that if I''(R) is used for partitioning rather than I(R) then the resulting partition is a subset of V''(R), thus for all . We formally prove this in the following

Theorem 29479

Let be a partition for R with pk-1 < pk for , and let and pm=t''N''. Then the following holds  
  (33)
for all $k=1,\dots,m$. 

Proof:

For a let pk-1 = t''x and pk = t''y with .We define . Then it is

  

Now we can show the following

Corollary 29495

Let be a partition for R with pk-1 < pk for and let p0=t''1 - 1 and pm=t''N''. Then the following holds  
  (34)
for all $k=1,\dots,m$. 

Proof:

Trivial because of theorem 6.  

Figure 7.9 shows the endpoint IP-table I''(R) for the example of figure 7.2. In contrast to condensed IP-tables, we do not require the definition of additional functions to make the formulas (5.1), (5.2) and (5.3) work when using rather than $s_{\scriptscriptstyle R}$. The reason behind this is that exactly intervals end within the range of t''j: by definition of an endpoint IP-table there can be no interval ending at a timepoint between t''j-1 and t''j. Therefore the accumulated number of intervals ending within the range of t''j is . For the same reason, can be computed by the number of overlaps occurring at t''j-1 plus the number of intervals starting within (t''j-1,t''j]. Thus (5.2) applies when replacing $s_{\scriptscriptstyle R}$ by .Please note that - as in the case of condensed IP-tables - it is

for . (5.1) is not affected as it does not involve $s_{\scriptscriptstyle R}$.


  
Figure: The IP-table I''(R) (in bold typeface) for the intervals in figure 7.2 plus the values and .

We created the respective endpoint IP-table for the dataset examples that have been described in section 7.3.2. Table 7.3 shows the figures. Having more distributed temporal data, i.e. a longer underlying lifespan of the temporal relation (with a constant number of tuples), causes a greater difference between the sizes of the endpoint IP-table and a complete IP-table. This means that more space can be saved in these cases. An obvious example can be seen by comparing the figures in the Month-Lifespan columns of tables 7.2 and 7.3.


 
Table: Endpoint IP-table characteristics of some real-world temporal relations.
    2|c||Day-Lifepan 2|c||Week-Lifespan 2|c|Month-Lifespan      
Dataset R Size |R| |V''(R)| |V''(R)|:|R| |V''(R)| |V''(R)|:|R| |V''(R)| |V''(R)|:|R|
EPCC 125185 1306 0.01 7804 0.06 26370 0.21
DEPT 27206 1313 0.05 7010 0.26 16754 0.62
STUD 27431 1218 0.04 6363 0.23 15731 0.57
FRANKFURT 1995 210 0.11        


next up previous contents index
Next: Maintaining IP-Tables Up: Size Considerations Previous: Condensation of IP-Tables

Thomas Zurek