Next: Experiments
Up: Evaluation of Characteristics
Previous: Evaluation of Characteristics
In practice, uniform workloads are a very rare exception. Therefore
they cannot be used to draw realistic conclusions. However, they can
help to simplify the evaluation of the components of the performance
model that are not susceptible to data skew and other
forms of non-uniformity. To prepare such a preliminary evaluation is
the focus of this section. Under the notion of uniformity of the
workload we assume the following:
- All intervals in R and Q have the same length .
- The startpoints of these intervals are uniformly distributed
over the timeline, i.e. at any time there is
the same number of intervals starting as at a time with .
- The only input parameter to define a partition is m. The
(m-1) breakpoints are supposed to be at equal distances.
- Both relations have the same span, i.e.
- The previous points imply that
- We also assume that both relations have the same number of
tuples, i.e.
- Tuples in R and Q are supposed to be of an equal size, i.e.
for and .
Following these assumptions on uniform data, we can derive the
necessary parameters for the costs model.
First, we want to approximate the size of a fragment Rk with . For that purpose we can use
which gives the average number of
fragments to which a tuple is assigned. Assuming a partition
of into m equally sized segments, i.e. any two
subsequent breakpoints, pk and pk+1, are at equal distance.
The latter can be calculated by . If is the average length of a timestamp interval then
the interval occupies a portion of
of a segment. As interval startpoints are uniformly distributed over a
segment, there are some intervals starting in close enough proximity
to the end of a segment to overlap a breakpoint. Therefore we have to
add 1 to the value obtained by (8.30) in order to
get an estimate for :
Figure 8.14 shows an example for , and one interval in R starting at each point . (8.30) gives a value of 0.4
which means that 40% of the intervals starting within a segment
overlap the right breakpoint. In the example of
figure 8.14, these comprise 4 out of 10 intervals. From
a global point of view this implies that 40% of the relation's
tuples overlap a breakpoint. Therefore it is
Thus in this example.
The quotient in (8.30) can also result in values
beyond 1 which means that which
means that the (average) length of a segment is smaller than the
average length of an interval. Consequently, most (in the general
case) or all (under the assumption of uniformity) intervals overlap
the right breakpoint. This is a bad choice for a partition.
Figure:
An example for the approximation of for
chronons and
chronons.
|
Now we can approximate the size of a fragment Rk for :
As each tuple can only be assigned to one R'k, the sizes of these
are given by
From |Rk| = |R'k| + |R''k| we can conclude that
In the same way, the values for the |Qk|, |Q'k| and |Q''k| can
be derived. Table 8.8 summarises the approximations
made under the assumption of uniformity.
Next: Experiments
Up: Evaluation of Characteristics
Previous: Evaluation of Characteristics
Thomas Zurek