MPI Cubix: Collective I/O

MPI Cubix is a collective I/O library for POSIX and MPI. It is a _portable_ library, not specific to any implementation of MPI. MPI Cubix has been tested with LAM and MPICH on most UNIX workstations.

The following description is distilled from Ohio Supercomputer Center Technical Report OSC-TR-1995-10 which can be obtained at the LAM ftp site. The library itself (mpi-cubix*.tar.Z) can be obtained from the same location.

Send bug reports on MPI Cubix to lam@tbag.osc.edu.

Description

MPI Cubix is an I/O library for MPI applications. The semantics and language binding reflect POSIX in its sequential aspects and MPI in its parallel aspects. The library is built on a few POSIX I/O functions and each of the POSIX-like Cubix functions translate directly to a POSIX operation on a file system somewhere in the parallel machine. The library is also built on MPI and is therefore portable to any machine that supports both MPI and POSIX.

The functionality of MPI Cubix closely follows the I/O functionality of the Cubix parallel software developed at the California Institute of Technology.

The I/O in MPI Cubix could be described as parallel but it is more accurately described as collective. As in MPI, all processes in a communicator participate in each operation. However, only one file on one file system is operated upon by each operation. (Straight POSIX calls to the local environment is one way to operate on multiple files.) MPI Cubix intends to simplify the manipulation, distribution and collection of data between a single file and a parallel application.

Typical Usage

Many parallel applications read and write data from and to a single file. Typically, a separate ``master'' or ``host'' program or program module is written to perform the actual file operations and move the data among the other processes via communication primitives. Within this scenario, two specific types of file operations are common:

MULTI

Each process moves a variable amount of unique data to/from a file through the master. An example is distributing a 2D matrix, where each process works on a subset of the rows or columns.


SINGL

Each process moves an identical amount of identical data to/from a file through the master. An example is every process printing the same error message. Another example is every process reading the size of a matrix, before going on to calculate subdomain sizes.


The contribution of MPI Cubix is to encapsulate these operations as collective I/O functions, with the role of the master hidden within the library. The user does not write specific master code and global message-passing code to accomplish the same effect. The MPI Cubix operations look and feel like sequential I/O operations.

Features

Only the basic file I/O operations are provided:

    int CBX_Open(const char *name, int flags, int mode,
	    int owner, MPI_Comm comm);

    int CBX_Close(int fd);

    int CBX_Read(int fd, void *buffer, int count,
	    MPI_Datatype dtype);

    int CBX_Write(int fd, void *buffer, int count,
	    MPI_Datatype dtype);

    off_t CBX_Lseek(int fd, off_t offset, int whence);
The I/O operations are collective upon the group of processes in the communicator used to open the file. Two access methods are provided. In Singl method, all processes must provide identical arguments. Input data is broadcast to all processes; output data is taken from only one process. In Multi method, processes can provide individual varying lengths to input, output and seek operations. Separate input data is read for each process. Output data is taken from all processes. The access method is originally established when the file is opened by adding a special MPI Cubix open flag. It can be changed between Singl and Multi at any time with another collective function call.

When a file is opened, a single process in the communicator group is chosen to own and hence operate on the file. No other process actually operates on the file and the owner cannot change without closing and re-opening the file (assuming it is in the file space of another process).

MULTI Ordering

Process ordering is irrelevant in Singl method. In Multi method, data is read or written from/to a file in the rank order of the communicator used in the file open operation. A function is provided to change the ordering on an already open file.

Data Representation

Like MPI message-passing functions, MPI Cubix input and output functions transfer data that is described by MPI datatypes. Any MPI basic or derived datatype can be used. The full extent of the datatype, as it is represented on the file owner's machine, is transferred to/from the file. This includes any internal gaps.

Acknowledgements

  1. J. Salmon, CUBIX: Programming Hypercubes without Programming Hosts in "Hypercube Multiprocessors, 1987", M.T. Heath (ed.), SIAM, 1987
  2. G. Fox et al., Solving Problems on Concurrent Processors, vol. 1, Prentice Hall, 1988
LAM / MPI Parallel Computing / Ohio Supercomputer Center / lam@tbag.osc.edu