Parallel Algorithm Design

The parallel algorithm for a given problem attempts to divide it into sub-problems which can then be solved concurrently on the different processors of a parallel computer.

The design of a parallel algorithm can be viewed as consisting of four stages - Partitioning, Communication Analysis, Granularity Control and Mapping. The following simple example illustrates some of the issues involved in each of the stages.

Consider the following scenario: n answer-scripts have to be marked, each of which contains the answers to m questions. The scripts could be viewed as the data, and the marking process itself as the computation to be performed on it.

In order to design a parallel solution to this problem, it must first be decomposed into smaller tasks which can be executed simultaneously. This is referred to as the partitioning stage.

This can be done in one of two ways. Each script could be marked by a different marker - this would require n markers. Alternatively, marking each question could be viewed as a task. This would result in m such tasks, each of which could be tackled by a separate marker, implying that every script passes through every marker.

In the first approach, the data (scripts) is first decomposed and then the computation (marking) is associated with it. This technique is called domain decomposition.

In the second approach, the computation to be performed (marking) is first decomposed and then the data (scripts) is associated with it. This technique is called functional decomposition. The partitioning technique that will be chosen often depends on the nature of the problem.

Suppose one needs to compute the average mark of the n scripts. If domain decomposition was chosen, then the marks from each of the markers would be required. If the markers are at different physical locations, then some form of communication is needed, in order to obtain the sum of the marks.

The nature of the information flow is specified in the communication analysis stage of the design. In this case, each marker can proceed independently and communicate the marks at the end. However, other situations would require communication between two concurrent tasks before computation can proceed.

It may be the case that the time to communicate the marks between two markers is much greater than the time to mark a question. In which case, it is more efficient to reduce the number of markers and have a marker work on a number of scripts, thereby decreasing the amount of communication.

Effectively, several small tasks are combined to produce larger ones, which results in a more efficient solution. This is called granularity control. For example, k markers could mark n/k scripts each. The problem here is to determine the best value of k.

The mapping stage specifies where each task is to execute. In this example, all tasks are of equal size and the communication is uniform, so any task can be mapped to any marker. However, in more complex situations, mapping strategies may not be obvious, requiring the use of more sophisticated techniques.

Parallel algorithm design is an interesting and challenging area of computer science which requires a combination of creative and analytical skills.