MPI Cubix: Collective I/O

MPI Cubix is a collective I/O library for POSIX and MPI.
It is a _portable_ library, not specific to any implementation of MPI.
MPI Cubix has been tested with LAM
and MPICH
on most UNIX workstations.
The following description is distilled from Ohio Supercomputer Center
Technical Report OSC-TR-1995-10 which can be obtained at the
LAM ftp site.
The library itself (mpi-cubix*.tar.Z) can be obtained from the same
location.
Send bug reports on MPI Cubix to lam@tbag.osc.edu.
Description
MPI Cubix is an I/O library for MPI applications. The semantics and
language binding reflect POSIX in its sequential aspects and MPI in its
parallel aspects. The library is built on a few POSIX I/O functions
and each of the POSIX-like Cubix functions translate directly to a
POSIX operation on a file system somewhere in the parallel machine.
The library is also built on MPI and is therefore portable to any
machine that supports both MPI and POSIX.
The functionality of MPI Cubix closely follows the I/O functionality of
the Cubix parallel software developed at the California Institute of
Technology.
The I/O in MPI Cubix could be described as parallel but it is more
accurately described as collective. As in MPI, all processes in a
communicator participate in each operation. However, only one file on
one file system is operated upon by each operation. (Straight POSIX
calls to the local environment is one way to operate on multiple files.)
MPI Cubix intends to simplify the manipulation, distribution and collection
of data between a single file and a parallel application.
Typical Usage
Many parallel applications read and write data from and to a single
file. Typically, a separate ``master'' or ``host'' program or program
module is written to perform the actual file operations and move the
data among the other processes via communication primitives. Within
this scenario, two specific types of file operations are common:
MULTI
Each process moves a variable amount of unique data to/from a file
through the master. An example is distributing a 2D matrix, where each
process works on a subset of the rows or columns.

SINGL
Each process moves an identical amount of identical data to/from a file
through the master. An example is every process printing the same
error message. Another example is every process reading the size of a
matrix, before going on to calculate subdomain sizes.

The contribution of MPI Cubix is to encapsulate these operations as
collective I/O functions, with the role of the master hidden within the
library. The user does not write specific master code and global
message-passing code to accomplish the same effect. The MPI Cubix
operations look and feel like sequential I/O operations.
Features
Only the basic file I/O operations are provided:
int CBX_Open(const char *name, int flags, int mode,
int owner, MPI_Comm comm);
int CBX_Close(int fd);
int CBX_Read(int fd, void *buffer, int count,
MPI_Datatype dtype);
int CBX_Write(int fd, void *buffer, int count,
MPI_Datatype dtype);
off_t CBX_Lseek(int fd, off_t offset, int whence);
The I/O operations are collective upon the group of processes in the
communicator used to open the file. Two access methods are provided.
In Singl method, all processes must provide identical arguments. Input
data is broadcast to all processes; output data is taken from only one
process. In Multi method, processes can provide individual varying
lengths to input, output and seek operations. Separate input data is
read for each process. Output data is taken from all processes. The
access method is originally established when the file is opened by
adding a special MPI Cubix open flag. It can be changed between Singl
and Multi at any time with another collective function call.
When a file is opened, a single process in the communicator group is
chosen to own and hence operate on the file. No other process actually
operates on the file and the owner cannot change without closing and
re-opening the file (assuming it is in the file space of another
process).
MULTI Ordering
Process ordering is irrelevant in Singl method. In Multi method, data
is read or written from/to a file in the rank order of the communicator
used in the file open operation. A function is provided to change the
ordering on an already open file.
Data Representation
Like MPI message-passing functions, MPI Cubix input and output
functions transfer data that is described by MPI datatypes. Any MPI
basic or derived datatype can be used. The full extent of the
datatype, as it is represented on the file owner's machine, is
transferred to/from the file. This includes any internal gaps.
Acknowledgements
- J. Salmon, CUBIX: Programming Hypercubes without Programming
Hosts in "Hypercube Multiprocessors, 1987", M.T. Heath (ed.), SIAM, 1987
- G. Fox et al., Solving Problems on Concurrent Processors,
vol. 1, Prentice Hall, 1988
LAM / MPI Parallel Computing
/ Ohio Supercomputer Center / lam@tbag.osc.edu