A poll was conducted to better understand how programmers are using the rich MPI functionality, what extensions are needed, and how to prioritize our work to better serve MPI users, and the HPC community at large.
These are the verbatim responses to the question on missing MPI functionality.
|> Dynamic task creation. One-sided communications
active message types, remote execution, interrupt-messages
> Dynamic creation of tasks for client-server programs > Parallel I/O
Better standard I/O facilities
Some simple forms of remote actions without explcit polling by the remote program: - remote memory access - fully asynchronous collective operations - active messages Full fledged multithreading is not so important. I would rather prefer to reduce the functionality of MPI in some areas. In particular, I would propose to standardize a v e r y small subset of functions which are easy to implement (efficiently). These can then serve as a basis for generic implementations of MPI. This would improve the availability of MPI on new machines and research machines. Has anybody thought about standardizing benchmarks? As an assembler handbook for a CISC computer is incomplete without a table of cycle times, I consider a library implementation without timings for the most common operations incomplete. One way to force vendors to disclose this vital infomation would be to make a set of benchmarks part of some kind of certification process.
* For porting existing parallel progs to MPI it would be useful to be able to have processes ranked from 1->n instead of 0->n-1. There does not appear to be a way of doing this (as far as I can make out from the standard!).
MPI_Pprobe(int source, int tagcount, int *tagarray, MPI_Comm comm, MPI_Status *status) This will wait for any message with from a given array of tags, but given the choice between two or more messages will return the status of the message with the lowest tag-index. MPI_Ipprobe(...) As above, but does not block. MPI_Discard(int source, int tag, MPI_Comm comm) Especially when writing task-farms and pipelines, there are many cases when the existence of a message gives enough information and the content can be discarded. There are also cases where the message itself is out of date. Creating a receive buffer for a message that is not going to be read is a waste of resources, and can also be quite complicated for the programmer.
_ _ Dynamic startup of processes, C++ binding
Nothing that I can think of at the moment... Parallel I/O would be nice, though. We can deal without it, (obviously), but it might make things a little easier and more streamlined.
An easy way to send a message from one group to ALL members of another group.
> Some facility for inter-application communication. I would like to > be able to set up an inter-communicator between two applications > that are started at different times. Specifically, I would like the > ability to create persistent servers and transient clients.
Our current NX implementation uses hrecvx() -- whether we can reasonably convert this to multiple threads or processes is as yet unclear to us. We also start processes at initialization (only). A portable way to do this would be very useful.
- Spawning as it is possible in PVM: one process creates another. - An asynchronous receive function should have a pointer to a function as parameter. This function is called when the message can is received (this is possible e.g. in CMMD for the CM-5).
Dynamic process creation. The ability for a new program to join to a running parallel application. Calls to provide host and platform information.
>1. Support for atomicity, e.g., an atomic transaction >2. Support for a process to asynchronously send a msg to a group of processes, such as the eureka mechanism in Cray T3D
> 1)congestion free multi broadcasting for balanced data distribution > 2)i/o :)
MPI_SPAWN - ABILITY TO CREATE A PROCESS -- MAY NOT REALLY FIT IN MPI
Guaranteed access to stdio.
I would like to have the facility to set up a non-blocking receive with a pointer to a handling function. Upon arival of the message, the function would be called. The facility for supplying a pointer to a function that returns a pointer to a receive buffer would be very useful. Upon arrival of the message, the function would be called and the message DMA'ed into the memory pointed to by the function (which would have access to all of the message's header information). A null pointer returned by the function would indicate that the message should be ignored.
Dynamic creation or abortion of process; Dynamic relabeling of process; Simulation of Fault-tolerance.
I need to investigate MPI further before answering this
- support for threads - active messages - integrated MPI-IO
> - multicast operation that replaces replicated pt-to-pt sends. > - an operation to create new communicators without a requirement that it be a collective call across an existing communicator (ie. like the existing comm_create but where only the members of the new communicator need to make the call - when all processes to be in the new comm have made the call, then everyone completes)
> It would be nice to have a standard, class-based C++ interface...
> SCATTER & GATHER functions to decompose 2 dim. arrarys over a grid.
* Myself and other users of the AP1000 machine which has a broadcast network use a multicast type operation very often. It is hoped that the current proposal within the MPI-2 forum for multicast will be accepted. * Remote memory operations - like the current put/get proposal. * Interrupt driven send/recv calls - again that has been proposed within the MPI-2 forum.
IO - this is a big pain!
> Advanced reduce (SUM) subroutine, where only those values different than > zero are summed up.
> Interrupt driven handled messages, ala Intel's NX would be useful > for certain of my requirements.
Remote operations: load, store, [op] to memory (e.g., add to memory.) As an alternative, "active messages" (which can be used to implement remote operations.) This functionality is *very* important to my applications. Without it, MPI is not terribly interesting.
A handler based receive function, so that my code will be interrupted asynchronously and a specified handler will be run when a message arrives. This could make a big difference to efficiency (our local MPI guru is in the process of adding MPI_Hrecv().
Non-blocking extension of collective communication for different buffers. For example call MPI_REDUCE(A,...) call MPI_REDUCE(B,...) I need CALL MPI_REDUCE_NONBLOCKing(A...) CALL MPI_REDUCE_NONBLOCKing(B,...) CALL MPI_BARRIER()
Remote procedure calls.
|> A more efficient global sum!!
I'm pretty new to this game, so I may not understand all the implications, but I think a non-blocking version of MPI_Bcast() would be useful.
Haven't run across any yet... Libraries I may want to use (mostly from PNL HPCC group) don't yet run under MPI, but they're working on it... I believe they would be happier if there were explicit shared memory (get/put) support.
Receive-add (or more general receive-op) would be very nice for all the different types of receives. Native complex type for C/C++ is needed It would be nice to be able to do a scatter where the origin (not the destination as it seems to be now) creates the derived datatype that is used on the receiving end. After all how can you destination create the indexed derived datatype if it doesn't know what is coming.
Addition/deletion of dynamic created processes to MPI_COMM_WORLD at runtime.
Dynamic Process Management
1) The standard for C should include complex numbers, it use to be there and then was taken out, I need this for C++ and global operations. Now I have to ifdef code for using complex or real with a data type of complex I won't need to 2) Receives should support add to location or more general merge into location rather then just copy into buffer.
Active messages, parallel IO, task creation
c++ binding/support (I do know about MPI++)
Parallel I/O - especially disk read/writes
I/O
None right now, but in my opinion it would be interesting to add task (or thread) management functios.
Task support like PVM (spawn, ...)
General high-level permutation routines (like CM-2/CM-5 CMF_send/CMF_get)
The ability to expand and contract the working space, or number of nodes.
1) A boolean function returns whether you are in the Communicator or not. 2) A mapping function to tell a rank in communicator "A" corresponding to a rank in communicator "B".
Those things that MPI-2 is designed to address, like I/O.
Ability to query unused capacity of user buffer for buffered communication (so that I know whether a buffered send request will be successful).
a send that implies already a receive (a little bit like in data parallel programming, e.g. C*)
Dynamic process control!
Parallel I/O is definitely needed. I'm having problems using very simple I/O with flushing/seeking to EOF on an SP-2. Of course, this isn't really supported so it shouldn't work, but parallel I/O would fix it.
full functionalityi in an implementation. To actually do what we ask not default to another mode that is safe.
None
* I need some mechanism for informing programs about the interconnection topology, so the application can make partition decisions. For example, on a machine with a communication hierarchy, such as a cluster of shared-memory machines, my apps need to know which tasks are in the same box, and which are in different boxes to make partitioning decisions. Presumably the problem is general. * related to this, I need a collective communication operation that exchanges common task environmental information. For example, most of my programs begin with an Allgather of hostname/pid/ip address/... info, so every task knows common info about the other tasks. Some MPE function to provide similar environmental info, if it was possible in a relatively portable way would be ideal.
I am in urge need for a communication between the workstation cluster and the massively parallel systems. Therefor it would be good to have MPI supporting different protocols simultaniously. Spawning processes and integrating allready running processes into an existing MPI_COMM_WORLD would be a good feature, too.
A) dynamical creation of processes B) MPI IO standard
Exclusive scans (rather than inclusive ones) would be nice
Multi threaded funcionality, but the proposals for MPI2 may cover this missing thing (requests with handler functions)
C++ bindings would really help... An easy way to debug and trace ALL mpi traffic in an application.
Threads
process control -- creation (spawning) and deletion (kill)
parallel IO to Files
In a Master-Slave context one often has to use gatherv etc because there is no data to be sent from the master. This complicates the code, requiring creation of recvcounts and displs arrays which are all the same except for the first element! I would like `gatherx' (for "exclusive") etc to behave the same as gather but with no contribution from root.