MPI: Everyday Datatypes

Heterogeneous computing requires that the data constituting a message be typed or described somehow so that its machine representation can be converted between computer architectures. MPI can thoroughly describe message datatypes, from the simple primitive machine types to complex structures, arrays and indices.

MPI messaging functions accept a datatype parameter, whose C typedef is MPI_Datatype:

    MPI_Send(void* buf, int count, MPI_Datatype datatype,
	    int dest, int tag, MPI_Comm comm);

Basic Datatypes

Everybody uses the primitive machine datatypes. Some C examples are listed below (with the corresponding C datatype in parentheses):

    MPI_CHAR (char)
    MPI_INT (int)
    MPI_FLOAT (float)
    MPI_DOUBLE (double)

The count parameter in MPI_Send( ) refers to the number of elements of the given datatype, not the total number of bytes.

For messages consisting of a homogeneous, contiguous array of basic datatypes, this is the end of the datatype discussion. For messages that contain more than one datatype or whose elements are not stored contiguously in memory, something more is needed.

Strided Vector

Consider a mesh application with patches of a 2D array assigned to different processes. The internal boundary rows and columns are transferred between north/south and east/west processes in the overall mesh. In C, the transfer of a row in a 2D array is simple - a contiguous vector of elements equal in number to the number of columns in the 2D array. Conversely, storage of the elements of a single column are dispersed in memory; each vector element separated from its next and previous indices by the size of one entire row.

An MPI derived datatype is a good solution for a non-contiguous data structure. A code fragment to derive an appropriate datatype matching this strided vector and then transmit the last column is listed below:

#include <mpi.h>

{
    float		mesh[10][20];
    int			dest, tag;
    MPI_Datatype	newtype;

/*
 * Do this once.
 */
    MPI_Type_vector(10,		/* # column elements */
	    1,			/* 1 column only */
	    20,			/* skip 20 elements */
	    MPI_FLOAT,		/* elements are float */
	    &newtype);		/* MPI derived datatype */

    MPI_Type_commit(&newtype);
/*
 * Do this for every new message.
 */
    MPI_Send(&mesh[0][19], 1, newtype,
	    dest, tag, MPI_COMM_WORLD);
}

MPI_Type_commit( ) separates the datatypes you really want to save and use from the intermediate ones that are scaffolded on the way to some very complex datatype.

A nice feature of MPI derived datatypes is that once created, they can be used repeatedly with no further set-up code. MPI has many other derived datatype constructors.

C Structure

Consider an imaging application that is transferring fixed length scan lines of eight bit color pixels. Coupled with the pixel array is the scan line number, an integer. The message might be described in C as a structure:

    struct {
	int		lineno;
	char		pixels[1024];
    } scanline;

In addition to a derived datatype, message packing is a useful method for sending non-contiguous and/or heterogeneous data. A code fragment to pack and send the above structure is listed below:

#include <mpi.h>

{
    unsigned int	membersize, maxsize;
    int			position;
    int			dest, tag;
    char		*buffer;
/*
 * Do this once.
 */
    MPI_Pack_size(1, 		/* one element */
	    MPI_INT,		/* datatype integer */
	    MPI_COMM_WORLD,	/* consistent comm. */
	    &membersize);	/* max packing space req'd */

    maxsize = membersize;
    MPI_Pack_size(1024, MPI_CHAR, MPI_COMM_WORLD, &membersize);
    maxsize += membersize;
    buffer = malloc(maxsize);
/*
 * Do this for every new message.
 */
    position = 0;

    MPI_Pack(&scanline.lineno,	/* pack this element */
	    1,			/* one element */
	    MPI_INT,		/* datatype int */
	    buffer,		/* packing buffer */
	    maxsize,		/* buffer size */
	    &position,		/* next free byte offset */
	    MPI_COMM_WORLD);	/* consistent comm. */

    MPI_Pack(scanline.pixels, 1024, MPI_CHAR,
	    buffer, maxsize, &position, MPI_COMM_WORLD);

    MPI_Send(buffer, position, MPI_PACKED,
	    dest, tag, MPI_COMM_WORLD);
}

A buffer is allocated once to contain the size of the packed structure. The size must be computed because of implementation dependent overhead in the message. Variable sized messages can be handled by allocating a buffer large enough for the largest possible message. The position parameter to MPI_Pack( ) always returns the current size of the packed buffer.

A code fragment to unpack the message, assuming a receive buffer has been allocated, is listed below:

{
    int             src;
    int             msgsize;
    MPI_Status      status;

    MPI_Recv(buffer, maxsize, MPI_PACKED,
	    src, tag, MPI_COMM_WORLD, &status);

    position = 0;
    MPI_Get_count(&status, MPI_PACKED, &msgsize);

    MPI_Unpack(buffer,		/* packing buffer */
	    msgsize,		/* buffer size */
	    &position,		/* next element byte offset */
	    &scanline.lineno,	/* unpack this element */
	    1,			/* one element */
	    MPI_INT,		/* datatype int */
	    MPI_COMM_WORLD);	/* consistent comm. */

    MPI_Unpack(buffer, msgsize, &position,
	    scanline.pixels, 1024, MPI_CHAR, MPI_COMM_WORLD);
}

You should be able to modify the above code fragments for any structure. It is completely possible to alter the number of elements to unpack based on application information unpacked previously in the same message.

LAM / MPI Parallel Computing / Ohio Supercomputer Center / lam@tbag.osc.edu