next up previous contents index
Next: Condensation of IP-Tables Up: Size Considerations Previous: The Size of an

Realistic Examples

  In order to discover realistic values for the |V(R)| : |R| ratio, we analysed four real-world temporal relations:

1.
We retrieved accesses to a supercomputer at the Edinburgh Parallel Computing Centre (EPCC). Such login information can be found on the frontends which are machines running the UNIX operating system. On these frontend machines, the last command provides access information. Its output typically looks as shown in figure 7.4.
In this example, the main access times are during office hours but possibly also during the weekend or late at night. The dataset comprises 125185 tuples. We refer to it as EPCC. 
2.
Similarly, we looked at a cluster of departmental workstations, mainly used by staff during office hours. The set comprises 27206 tuples. It is referred to as DEPT. 

3.
Next, the logins of a workstation cluster in a student computer laboratory was analysed. Here, the access characteristic is different and mainly influenced by the students' timetables: as an example, one can recognise an accumulation of accesses at times when lectures have just finished. The set comprises 27431 tuples and is referred to as STUD. 

4.
Finally, we analysed the flight schedule to and from Frankfurt Airport. This example differs from the others as departure and arrival times follow certain rules. For example, scheduled times use five-minutes-steps, i.e. there is no departure or arrival time such as 15:03 but times like 15:00, 15:05, 15:10, 15:15 etc. In the case of the computer cluster accesses, times were arbitrary. Here, it is a man-made schedule rather than a random process that generates the temporal data in this case. Figure 7.5 shows an extract of the schedule. For the measurements, the times were converted to Central European Time (CET). The dataset comprises 1995 tuples and is referred to as FRANKFURT. 

In the first three cases, there are various possibilities to interpret a timestamp : it can be considered as a daily timestamp (ignoring weekday and date information if this is irrelevant); it can also be a timestamp inside a week-long lifespan, thus ignoring the date; we can ignore the month, thus considering the timestamp to define a point with a month-long timestamp. There are more possibilities. We mapped the access data into these three lifespans (day, week, month) with the respective lengths 1440, 10080, 44640 (minutes), thus producing three different temporal relations out of each dataset. The fourth set, FRANKFURT, imposed a day-long lifespan of length 1440. In total, we had ten temporal relations for each of which we computed the two values |V(R)| and |V(R)|:|R|. The results are shown in table 7.2. The figures prove that for these real-world examples one can expect the corresponding IP-table to be of a reasonable size. The ratios |V(R)|:|R| are far away from the worst case scenario and suggest that IP-table sizes can be expected to correspond to the situations described by the left part of table 7.1.


  
Figure: A typical example of login information.


  
Figure: An extract of a flight schedule of Frankfurt Airport.


 
Table: Characteristics of some real-world temporal relations.
    2c||Day-Lifepan 2c||Week-Lifespan 2c|Month-Lifespan      
Dataset R |R| |V(R)| |V(R)|:|R| |V(R)| |V(R)|:|R| |V(R)| |V(R)|:|R|
EPCC 125185 1411 0.01 8286 0.07 29525 0.24
DEPT 27206 1408 0.05 8036 0.30 23877 0.88
STUD 27431 1360 0.05 7228 0.26 21379 0.78
FRANKFURT 1995 288 0.14        


next up previous contents index
Next: Condensation of IP-Tables Up: Size Considerations Previous: The Size of an

Thomas Zurek