Next: The Architectural Model Up: Performance Model Previous: Performance Model

Outline

The task of creating an effective performance model is not straightforward because there are many factors that affect the costs of a temporal join. Amongst these are, for example, characteristics of the hardware architecture, issues of the (parallel) programming paradigm and the choice of the temporal join algorithm. If too many of these issues are incorporated into the model then it might be specific to one particular hardware architecture, one particular programming paradigm and/or one particular temporal join algorithm. On the other hand, if we omit or generalise too many of these issues then we probably miss out important factors that affect the join performance. This means that there is not one, single performance model which can be considered as appropriate but many. Our intention is to create one that seeks a good tradeoff between covering as many situations (with respect to hardware and software configurations) as possible while still being specific enough to achieve a reasonable performance prediction. In other words: it will necessarily be a compromise model. We divided the task of creating a performance model into three subtasks (see also figure 8.1):

In section 8.2, we consider the hardware issues that affect the performance. An architectural model is presented. It is parameterised by two variables: depending on the values of these variables we get a single-processor architecture, a parallel shared-everything (SMP) architecture, a parallel shared-nothing architecture or a parallel hybrid, two-level architecture that incorporates elements of the shared-everything and the shared-nothing approaches. This covers a broad spectrum of possible architectures and therefore supports our goal to be as general as possible.
In section 8.3, we look at the software aspect, i.e. the temporal join algorithm and how it works on the architectural model. In chapter 4, we have presented a wide range of temporal join algorithms of which only those are relevant that employ explicit and symmetric partitioning. Please recall that the performance of all the other temporal join algorithms is not affected by the choice of a partition. This still leaves us with a a variety of possible algorithms. We necessarily have to compromise here. However, this does not imply that we cannot create a model that provides a realistic feedback on the performance impacts of partitioning: we can pick an algorithm which represents a family of algorithms whose performances are similarly affected by the choice of a partition.
The cost model for partitioned temporal join processing is derived in section 8.4. This model incorporates the assumptions and compromise decisions that have been taken in sections 8.2 and 8.3.

The chapter is concluded in section 8.5 where we evaluate the performance model on top of a uniform workload, i.e. we assume that the temporal intervals are uniformly distributed and of a uniform length. In this case a uniform partition of the data is optimal. Therefore we can draw conclusions about performance characteristics that are partition-independent. This will allow us to draw a variety of partition-independent conclusions, in particular it will prove that the assumptions that we had to make about the underlying program paradigms only have minor impacts on the cost model.

This analytical way of modeling the performance has two major advantages in comparison to alternative approaches for determining processing costs, such as simulation or implementation on a real hardware platform:

1.: It can be used not only for the evaluation of our techniques but provides a tool for an optimiser to estimate the performance on which it can base its decisions. Simulation or real implementations can only cater for the first purpose.
2.: As already elaborated above, we want to obtain a model for a variety of hardware platforms. Implementations and simulations produce results that are specific to the respective hardware or range of hardware platforms.

These advantages are accompanied by the disadvantage that the absolute cost figures that are obtained probably do not compare directly with the ones that are achieved in reality. A simulation or an implementation of the operation would achieve preciser results. However, as we are concerned with comparing possible partitions in a platform-independent manner, we will use an analytical approach. In future research it might be interesting to validate our analytical model by simulating or implementing the operation.

**Figure:** The structure of the performance model and the modeling process.

Next: The Architectural Model Up: Performance Model Previous: Performance Model

Thomas Zurek