Next: Condensation of IP-Tables
Up: Size Considerations
Previous: The Size of an
In order to discover realistic values for the |V(R)| : |R|
ratio, we analysed four real-world temporal relations:
- 1.
- We retrieved accesses to a supercomputer at the Edinburgh Parallel
Computing Centre (EPCC). Such login information can be found on
the frontends which are machines running the UNIX operating system.
On these frontend machines, the last command provides access
information. Its output typically looks as shown in figure 7.4.
In this example, the main access times are during office hours but
possibly also during the weekend or late at night. The dataset
comprises 125185 tuples. We refer to it as EPCC.
- 2.
- Similarly, we looked at a cluster of departmental workstations,
mainly used by staff during office hours. The set comprises
27206 tuples. It is referred to as DEPT.
- 3.
- Next, the logins of a workstation cluster in a student
computer laboratory was analysed. Here, the access characteristic is
different and mainly influenced by the students' timetables: as an
example, one can recognise an accumulation of accesses at times when
lectures have just finished. The set comprises 27431 tuples and is
referred to as STUD.
- 4.
- Finally, we analysed the flight schedule to and from Frankfurt
Airport. This example differs from the others as departure and arrival
times follow certain rules. For example, scheduled times use
five-minutes-steps, i.e. there is no departure or arrival time such
as 15:03 but times like 15:00, 15:05, 15:10, 15:15 etc. In the case of
the computer cluster accesses, times were arbitrary. Here, it is a
man-made schedule rather than a random process that generates the
temporal data in this case. Figure 7.5 shows an extract
of the schedule. For the measurements, the times were converted to
Central European Time (CET). The dataset comprises 1995 tuples and is
referred to as FRANKFURT.
In the first three cases, there are various possibilities to interpret
a timestamp : it can be considered as a daily
timestamp (ignoring weekday and date information if this is
irrelevant); it can also be a timestamp inside a week-long lifespan,
thus ignoring the date; we can ignore the month, thus considering the
timestamp to define a point with a month-long timestamp. There are
more possibilities. We mapped the access data into these three
lifespans (day, week, month) with the respective lengths 1440, 10080,
44640 (minutes), thus producing three different temporal relations out
of each dataset. The fourth set, FRANKFURT, imposed a day-long
lifespan of length 1440. In total, we had ten temporal relations for
each of which we computed the two values |V(R)| and |V(R)|:|R|.
The results are shown in table 7.2. The figures
prove that for these real-world examples one can expect the
corresponding IP-table to be of a reasonable size. The ratios
|V(R)|:|R| are far away from the worst case scenario and suggest
that IP-table sizes can be expected to correspond to the situations
described by the left part of table 7.1.
Figure:
A typical example of login information.
|
Figure:
An extract of a flight schedule of Frankfurt Airport.
|
Table:
Characteristics of some real-world temporal relations.
|
|
2c||Day-Lifepan |
2c||Week-Lifespan |
2c|Month-Lifespan |
|
|
|
Dataset R |
|R| |
|V(R)| |
|V(R)|:|R| |
|V(R)| |
|V(R)|:|R| |
|V(R)| |
|V(R)|:|R| |
EPCC |
125185 |
1411 |
0.01 |
8286 |
0.07 |
29525 |
0.24 |
DEPT |
27206 |
1408 |
0.05 |
8036 |
0.30 |
23877 |
0.88 |
STUD |
27431 |
1360 |
0.05 |
7228 |
0.26 |
21379 |
0.78 |
FRANKFURT |
1995 |
288 |
0.14 |
|
|
|
|
Next: Condensation of IP-Tables
Up: Size Considerations
Previous: The Size of an
Thomas Zurek