For the recursive function evaluation system, it can be predicted in advance how much useful parallelism can be obtained from a given problem. By `useful parallelism' I mean the point at which adding more processors and splitting the problem up amongst them using my method, does not result in any performance improvement. First we look at an example to explain the reasoning. Consider the example where we have 6 sub functions, 2 of which commute, see fig. 6.2. The directions that commute create a part of the structure that, if split up amongst many processors, require much communication to evaluate, and dominate the time taken for the whole computation. Also, since the processors involved in this part of the tree will both have to traverse the same part of the structure, very little of the computation is actually being shared between them. So, if the parallelisation requires splitting up this sub section we will not have any benefit from using the extra processors. As we increase the number of processors to which we map the sub trees from 3 to 6, there is be a point at which the two commuting sub trees are split.
Consider a recursive function that is defined in terms of k
sub functions. This leads to a call graph of degree k. Now suppose
that r of the functions commute with at least one other function, so
that k-r of the sub trees are distinct. These distinct sub trees can
be exploited to provide the parallelism. So if we are mapping to p
processors, if , then the commuting directions will
be split and we cannot expect any improvement in the
performance. Thus, with a function for which k and r are defined
as above, we will recieve no more speedup if we run the evaluation on
more than
processors.
The method is therefore more suited for functions where there are many
sub functions, few of which commute. However, specifying functions
that do commute as not commuting will not change the correctness of
the evaluation, and will only decrease the amount of redundancy that
can be exploited. We can artificially make the number of directions
that are reported as commuting as small as we choose, and thus
increase the potential for parallelism. This can be exploited in the
situation where we have a fixed number of processors p available to
us, and an evaluation call graph for which we know the values of k
and r. We then find the minimum r such that is not
greater than p. We then use that level of commutativity to evaluate
the tree. This will provide the optimal performance.