Data-parallel programming: Can we have high-level languages *and* high-performance?

Jan F. Prins

Nested arrays and nested data-parallelism combine to provide a uniquely expressive mechnism for the specification of irregular parallel computations. The flattening transformation first described by Blelloch in 1990 provides a technique to realize all nested parallelism to within a constant factor of that specified without increasing total work and with perfect load balance in execution. The price for all these advantages is that the absolute performance and performance predictability fall precipitously in the presence of non-uniform memory access costs, and this appears, in some sense, to be inherent.

Yet uniform memory access (UMA) on real machines can be guaranteed through a bulk-reference condition that is easily satisfied by flattened parallel programs. Current commercial machines (such as the Cray and NEC parallel vector processors) provide UMA for vector operations. Using these machines and the flattening technique, some important irregular applications, such as the solution of unstructured sparse linear systems, may achieve the highest absolute performance currently available.