Next: The Architectural Model
Up: Performance Model
Previous: Performance Model
The task of creating an effective performance model is not
straightforward because there are many factors that affect the costs
of a temporal join. Amongst these are, for example, characteristics of
the hardware architecture, issues of the (parallel) programming
paradigm and the choice of the temporal join algorithm. If too many of
these issues are incorporated into the model then it might be specific
to one particular hardware architecture, one particular programming
paradigm and/or one particular temporal join algorithm. On the other
hand, if we omit or generalise too many of these issues then we
probably miss out important factors that affect the join performance.
This means that there is not one, single performance model which
can be considered as appropriate but many. Our intention is to
create one that seeks a good tradeoff between covering as many
situations (with respect to hardware and software configurations) as
possible while still being specific enough to achieve a reasonable
performance prediction. In other words: it will necessarily be a
compromise model. We divided the task of creating a performance model
into three subtasks (see also figure 8.1):
- In section 8.2, we consider the hardware
issues that affect the performance. An architectural model is
presented. It is parameterised by two variables: depending on the
values of these variables we get a single-processor architecture, a
parallel shared-everything (SMP) architecture, a parallel
shared-nothing architecture or a parallel hybrid, two-level
architecture that incorporates elements of the shared-everything and
the shared-nothing approaches. This covers a broad spectrum of
possible architectures and therefore supports our goal to be as
general as possible.
- In section 8.3, we look at the software aspect,
i.e. the temporal join algorithm and how it works on the
architectural model. In chapter 4, we have presented a
wide range of temporal join algorithms of which only those are
relevant that employ explicit and symmetric partitioning. Please
recall that the performance of all the other temporal join algorithms
is not affected by the choice of a partition. This still leaves us
with a a variety of possible algorithms. We necessarily have to
compromise here. However, this does not imply that we cannot create a
model that provides a realistic feedback on the performance impacts of
partitioning: we can pick an algorithm which represents a family of
algorithms whose performances are similarly affected by the choice of
a partition.
- The cost model for partitioned temporal join processing
is derived in section 8.4. This model incorporates the
assumptions and compromise decisions that have been taken in
sections 8.2 and 8.3.
The chapter is concluded in section 8.5 where we
evaluate the performance model on top of a uniform workload, i.e. we
assume that the temporal intervals are uniformly distributed and of a
uniform length. In this case a uniform partition of the data is
optimal. Therefore we can draw conclusions about performance
characteristics that are partition-independent. This will allow us to
draw a variety of partition-independent conclusions, in particular it
will prove that the assumptions that we had to make about the
underlying program paradigms only have minor impacts on the cost
model.
This analytical way of modeling the performance has two major
advantages in comparison to alternative approaches for determining
processing costs, such as simulation or
implementation on a real hardware platform:
- 1.
- It can be used not only for the evaluation of our techniques but
provides a tool for an optimiser to estimate the performance on which
it can base its decisions. Simulation or real implementations can
only cater for the first purpose.
- 2.
- As already elaborated above, we want to obtain a model for a
variety of hardware platforms. Implementations and simulations
produce results that are specific to the respective hardware or range
of hardware platforms.
These advantages are accompanied by the disadvantage that the absolute
cost figures that are obtained probably do not compare directly with
the ones that are achieved in reality. A simulation or an
implementation of the operation would achieve preciser results.
However, as we are concerned with comparing possible partitions in a
platform-independent manner, we will use an analytical approach. In
future research it might be interesting to validate our analytical
model by simulating or implementing the operation.
Figure:
The structure of the performance model and the modeling process.
|
Next: The Architectural Model
Up: Performance Model
Previous: Performance Model
Thomas Zurek