MUST can be used to detect and report MPI errors.
The MUST software consists of three individual packages:
PnMPI is responsible for the basic infrastructure and collecting data by intercepting all MPI calls of the target application. GTI provides the tool structure and the MUST package performs the correctness checking. All three packages are configured and built together using CMake and should only be used with the specific compiler and MPI library used in the process.
The two main use cases for MUST are during application development and porting of an application to a new system. MUST can single out new errors and those not manifesting in an application crash. It can also detect violations to the MPI standard on the target system.
MUST provides checks for the following classes of errors:
- Constants and integer values
- Communicator usage
- Datatype usage
- Group usage
- Operation usage
- Request usage
- Leak checks (MPI resources not freed before calling MPI_Finalize)
- Type mismatches
- Overlapping buffers passed to MPI
- Deadlocks resulting from MPI calls
- Basic checks for thread level usage (MPI_Init_thread)
The scalability of MUST is dependent on the scalibility of the application. So far, it has been successfully tested with up to 16000 parallel processors.
For installation instructions as well as the latest release please visit the project MUST website, where there is also a detailed documentation available.
Usage on the RWTH Cluster
The MUST tool is available as a module on all login nodes. There are several version that can be listed using:
module whatis must
An executable MPI program can be analysed with MUST using:
mustrun -np <num-processes> <executable>
After the application run, MUST will generate an HTML output (usually called "MUST_Output.html") listing and describing all errors detected. This file is located in the current directory and can be inspected like this: