Introduction

Next: The Basic Data Set Up: The Test Data Previous: The Test Data

Introduction

Timestamps of a temporal relation are influenced by various statistical processes. Let us re-consider the phone calls scenario: the start times are dictated by many factors such as

daily routines, eg. the times when we wake up, work, have lunch, sleep etc.,
business hours,
the fact whether it is a working day or a public holiday, etc.

Furthermore, the lengths of the phone calls are a result of pricing or the nature of the calls, e.g. business calls as opposed to calls to a friend or a relative. Possibly, calls in the evenings are generally longer than daytime calls because of lower prices or because one tends to chat longer with friends or relatives rather than customers, bank managers, travel agents etc.

This is only one of many examples that illustrate how a set of `real life' timestamps can be the result of a variety of statistical processes. We note that this feature is not restricted to transaction time but applies to many valid time scenarios as well. Just imagine the bookings database of a travel agent, travel organiser, car rental company or a hotel. Here, start and end times, i.e. the timestamp intervals, are dictated by dates for holiday seasons, public holidays or sports/theatre/music events, by special, promotional offers and possibly even by the weather.

The high statistical complexity behind the creation of timestamps is a significant difference in comparison to atomic data. It is therefore much more difficult to artificially create temporal test data with realistic properties. In the case of atomic data, many situations with a non-uniform distribution of the attribute values (i.e. data skew ) have been successfully modelled using a Zipf distribution [Zipf, 1949]. An example of a paper that describes such experiments is [Wolf et al., 1993]. A similar approach for temporal data would either be

unrealistic, if the statistical model is too simplistic, or
too complex because a huge number of statistical parameters would have to be used; the underlying combinatorial effect would cause the experiments to be very hard to manage and to evaluate.

For these reasons, we decided to take an alternative approach for our experiments. It is based on real temporal data that we manipulate in order to control the experiments. The following section describes the data set and the manipulations that were performed.

Next: The Basic Data Set Up: The Test Data Previous: The Test Data

Thomas Zurek