MPI: Everyday Collective Communication

Collective communication means all processes within a communicator call the same routine. Portable applications should assume that collective routines include a global synchronization.

The following simple code fragment employs four basic collective routines to manipulate a statically partitioned regular domain (one-dimensional in this case). The full domain length is broadcast from a root process to all others. The initial dataset is distributed (scattered) among the processes. After each compute step, a global maximum is determined for use by the root. The root then gathers the final dataset.

#include <mpi.h>

{
    int		i;
    int		myrank;
    int		size;
    int		root;
    int		full_domain_length;
    int		sub_domain_length;
    double	global_max;
    double	local_max;
    double	*full_domain;
    double	*sub_domain;

    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    root = 0;
/*
 * Root obtains full domain and broadcasts its length.
 */
    if (myrank == root) {
	get_full_domain(&full_domain, &full_domain_length);
    }

    MPI_Bcast(&full_domain_length, 1, MPI_INT, root, MPI_COMM_WORLD);
/*
 * Allocate subdomain memory.
 * Scatter the initial dataset among the processes.
 */
    sub_domain_length = full_domain_length / size;
    sub_domain = (double *) malloc(sub_domain_length * sizeof(double));

    MPI_Scatter(full_domain, sub_domain_length, MPI_DOUBLE,
   	    sub_domain, sub_domain_length, MPI_DOUBLE,
    	    root, MPI_COMM_WORLD);
/*
 * Loop computing and determining max values.
 */
    for (i = 0; i < NSTEPS; ++i) {
    	compute(sub_domain, sub_domain_length, &local_max);

    	MPI_Reduce(&local_max, &global_max, 1, MPI_DOUBLE,
		MPI_MAX, root, MPI_COMM_WORLD);
    }
/*
 * Gather final dataset.
 */
    MPI_Gather(sub_domain, sub_domain_length, MPI_DOUBLE,
	    full_domain, sub_domain_length, MPI_DOUBLE,
	    root, MPI_COMM_WORLD);
}

Broadcast

    MPI_Bcast(void *buffer, int count, MPI_Datatype datatype,
	    int root, MPI_Comm comm);

All processes use the same count, datatype, root, and communicator. Before the operation, the root buffer contains a message. After the operation, all buffers contain the message from the root process.

Scatter

    MPI_Scatter(void *sndbuf, int sndcnt, MPI_Datatype sndtype,
	    void *rcvbuf, int rcvcnt, MPI_Datatype rcvtype,
	    int root, MPI_Comm comm);

All processes use the same send and receive counts, datatypes, root and communicator. Before the operation, the root send buffer contains a message of length `sndcnt * N', where N is the number of processes. After the operation, the message is divided equally and dispersed to all processes (including the root) following rank order.

Reduce

    MPI_Reduce(void *sndbuf, void *rcvbuf, int count,
	    MPI_Datatype datatype, MPI_Op op,
	    int root, MPI_Comm comm);

All processes use the same count, datatype, reduction operation, root and communicator. After the operation, the root process has in its receive buffer the result of the pair-wise reduction of the send buffers of all processes, including its own. MPI predefines reduction operations, including: MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND, MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR.

Gather

    MPI_Gather(void *sndbuf, int sndcnt, MPI_Datatype sndtype,
	    void *rcvbuf, int rcvcnt, MPI_Datatype rcvtype,
	    int root, MPI_Comm comm);

All processes use the same send and receive counts, datatypes, root and communicator. This routine is the reverse of MPI_Scatter(): after the operation the root process has in its receive buffer the concatenation of the send buffers of all processes (including its own), with a total message length of `rcvcnt * N', where N is the number of processes. The message is gathered following rank order.

LAM / MPI Parallel Computing / Ohio Supercomputer Center / lam@tbag.osc.edu