Next: Experimental Results Up: Analysis of Method Previous: Redundancy in a Data

Predicting the amount of parallelism in the evaluation

For the recursive function evaluation system, it can be predicted in advance how much useful parallelism can be obtained from a given problem. By `useful parallelism' I mean the point at which adding more processors and splitting the problem up amongst them using my method, does not result in any performance improvement. First we look at an example to explain the reasoning. Consider the example where we have 6 sub functions, 2 of which commute, see fig. 6.2. The directions that commute create a part of the structure that, if split up amongst many processors, require much communication to evaluate, and dominate the time taken for the whole computation. Also, since the processors involved in this part of the tree will both have to traverse the same part of the structure, very little of the computation is actually being shared between them. So, if the parallelisation requires splitting up this sub section we will not have any benefit from using the extra processors. As we increase the number of processors to which we map the sub trees from 3 to 6, there is be a point at which the two commuting sub trees are split.

Consider a recursive function that is defined in terms of k sub functions. This leads to a call graph of degree k. Now suppose that r of the functions commute with at least one other function, so that k-r of the sub trees are distinct. These distinct sub trees can be exploited to provide the parallelism. So if we are mapping to p processors, if $p \gt \frac{k}{r}$ , then the commuting directions will be split and we cannot expect any improvement in the performance. Thus, with a function for which k and r are defined as above, we will recieve no more speedup if we run the evaluation on more than $\frac{k}{r}$ processors.

The method is therefore more suited for functions where there are many sub functions, few of which commute. However, specifying functions that do commute as not commuting will not change the correctness of the evaluation, and will only decrease the amount of redundancy that can be exploited. We can artificially make the number of directions that are reported as commuting as small as we choose, and thus increase the potential for parallelism. This can be exploited in the situation where we have a fixed number of processors p available to us, and an evaluation call graph for which we know the values of k and r. We then find the minimum r such that $\frac{k}{r}$ is not greater than p. We then use that level of commutativity to evaluate the tree. This will provide the optimal performance.

**Figure 6.2:** Dividing out the sub trees
$\begin{figure} \begin{center} \includegraphics {parallelism.eps} \end{center}\end{figure}$

Next: Experimental Results Up: Analysis of Method Previous: Redundancy in a Data

Timothy Lewis
11/12/1997