Difference between revisions of "MPI"

From HPC Wiki
Jump to navigation Jump to search
()
Line 1: Line 1:
MPI is an open standard for Distributed Memory [[Parallel_Programming|parallelization]]. Information on how to run an existing MPI program can be found in the [[How to Use MPI]] Section.
+
The Message Passing Interface (MPI) is an open standard for distributed memory [[Parallel_Programming|parallelization]].
 +
It consists of a library API (Application Programmer Interface)
 +
specification for C and Fortran, there exist unofficial language bindings
 +
for many other popular programming languages. The first standard document was released
 +
in 1994. MPI has become the de-facto standard to program HPC cluster systems
 +
and is often the only way available. There exist many optimized
 +
implementations, Open source and proprietary. The latest version of the
 +
standard is [[MPI 3.1|https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf]] (released in
 +
2015).
 +
 
 +
MPI allows to write portable parallel programs for all kinds of parallel
 +
systems, from small shared memory nodes to petascale cluster systems.
 +
While many criticize its bloated API and complicated function interface no
 +
alternative proposal could win a significant share in the HPC application
 +
domain so far. There exist optimized implementations for any platform and
 +
architecture and a wealth of tools and libraries. Common implementations
 +
are [[OpenMPI|https://www.open-mpi.org]], [[mpich|https://www.mpich.org]]
 +
and [[Intel MPI|https://software.intel.com/en-us/mpi-library]]. Because
 +
MPI is available for such a long time and almost every HPC application is
 +
implemented using MPI it is the safest bet for a solution that will
 +
be supported and stable on mid- to long-term future systems.
 +
 
 +
Information on how to run an existing MPI program can be found in the [[How to Use MPI]] Section.
 +
 
 +
== API overview ==
 +
 
 +
The standard specifies interfaces to the following functionality (list is not complete):
 +
 
 +
* Point-to-point communication,
 +
* Datatypes,
 +
* Collective operations,
 +
* Process groups,
 +
* Process topologies,
 +
* One-sided communication,
 +
* Parallel file I/O.
 +
 
 +
While the standard document has 836 pages describing 100+ MPI functions
 +
a working and useful MPI program can be implemented using just a handful
 +
of functions. As with other standards new and uncommon features are often
 +
not implemented efficiently in available MPI libraries.
 +
 
 +
=== Basics conventions ===
 +
 
 +
A process is the smallest worker granularity in MPI. MPI offers a very
 +
generic and flexible way to manage subgroups of parallel workers using so
 +
called communicators. A communicator is part of any MPI communication
 +
routine signature. Common practice is that there exists a predefined
 +
communicator called '''MPI_COMM_WORLD''' including all processes within
 +
a job. It is possible to create a subset of processes in new
 +
communicators. Still many applications can be implemented using only
 +
'''MPI_COMM_WORLD'''. Processes are assign consecutive ranks (a integer
 +
number) and a process can be asked for its rank and the total number of
 +
ranks in a communicator within the program. This information is already
 +
sufficient to create work sharing strategies and communication structures.
 +
Messages can be send to another rank if knowing its ID, collective
 +
communication (as e.g. broadcast) involve all processes in a communicator.
 +
 
 +
=== Point-to-point communication ===
 +
 
 +
Sending and receiving of messages by processes is the basic MPI
 +
communication mechanism. The basic point-to-point communication operations
 +
are send and receive.
  
== General ==
 
In MPI the most essential operations are:
 
 
* <code>MPI_Send</code> for sending a message
 
* <code>MPI_Send</code> for sending a message
 
<syntaxhighlight lang="c">
 
<syntaxhighlight lang="c">
Line 13: Line 72:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Although there a 100+ MPI functions defined in the standard (e.g. for non-blocking or collective communication, see the [[#References|References]] for more details), you can write meaningful MPI application with less than 20 of those. Programs written with these functions have to be compiled with a specific [[compiler]] (options) and executed with a special startup program like detailed [[How_to_Use_MPI|here]].
+
MPI communication routines consist of the message data and a message
 +
envelope. The message data is specified by a pointer to a memory buffer,
 +
the MPI datatype and a count. Count may be zero to indicate that the
 +
message buffer is empty. The message envelope consists of the source
 +
(implicitly specified by the sender), destination, tag and communicator.
 +
Destination is the id or the receiving process and tag an integer which
 +
allows to distinguish different message types.
 +
 
 +
=== Collective operations ===
 +
 
 +
== Example: A minimal MPI program ==
 +
 
 +
== MPI machine model ==
 +
 
 +
== MPI+X ==
 +
 
 +
== Alternatives to MPI ==
  
Please check the more detailed tutorials in the References.
 
  
 
== References ==
 
== References ==

Revision as of 17:17, 22 February 2019

The Message Passing Interface (MPI) is an open standard for distributed memory parallelization. It consists of a library API (Application Programmer Interface) specification for C and Fortran, there exist unofficial language bindings for many other popular programming languages. The first standard document was released in 1994. MPI has become the de-facto standard to program HPC cluster systems and is often the only way available. There exist many optimized implementations, Open source and proprietary. The latest version of the standard is https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf (released in 2015).

MPI allows to write portable parallel programs for all kinds of parallel systems, from small shared memory nodes to petascale cluster systems. While many criticize its bloated API and complicated function interface no alternative proposal could win a significant share in the HPC application domain so far. There exist optimized implementations for any platform and architecture and a wealth of tools and libraries. Common implementations are https://www.open-mpi.org, https://www.mpich.org and https://software.intel.com/en-us/mpi-library. Because MPI is available for such a long time and almost every HPC application is implemented using MPI it is the safest bet for a solution that will be supported and stable on mid- to long-term future systems.

Information on how to run an existing MPI program can be found in the How to Use MPI Section.

API overview

The standard specifies interfaces to the following functionality (list is not complete):

  • Point-to-point communication,
  • Datatypes,
  • Collective operations,
  • Process groups,
  • Process topologies,
  • One-sided communication,
  • Parallel file I/O.

While the standard document has 836 pages describing 100+ MPI functions a working and useful MPI program can be implemented using just a handful of functions. As with other standards new and uncommon features are often not implemented efficiently in available MPI libraries.

Basics conventions

A process is the smallest worker granularity in MPI. MPI offers a very generic and flexible way to manage subgroups of parallel workers using so called communicators. A communicator is part of any MPI communication routine signature. Common practice is that there exists a predefined communicator called MPI_COMM_WORLD including all processes within a job. It is possible to create a subset of processes in new communicators. Still many applications can be implemented using only MPI_COMM_WORLD. Processes are assign consecutive ranks (a integer number) and a process can be asked for its rank and the total number of ranks in a communicator within the program. This information is already sufficient to create work sharing strategies and communication structures. Messages can be send to another rank if knowing its ID, collective communication (as e.g. broadcast) involve all processes in a communicator.

Point-to-point communication

Sending and receiving of messages by processes is the basic MPI communication mechanism. The basic point-to-point communication operations are send and receive.

  • MPI_Send for sending a message
int MPI_Send (const void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)
  • MPI_Recv for receiving a message
int MPI_Recv (void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status* status)

MPI communication routines consist of the message data and a message envelope. The message data is specified by a pointer to a memory buffer, the MPI datatype and a count. Count may be zero to indicate that the message buffer is empty. The message envelope consists of the source (implicitly specified by the sender), destination, tag and communicator. Destination is the id or the receiving process and tag an integer which allows to distinguish different message types.

Collective operations

Example: A minimal MPI program

MPI machine model

MPI+X

Alternatives to MPI

References

Introduction to MPI from PPCES (@RWTH Aachen) Part 1

Introduction to MPI from PPCES (@RWTH Aachen) Part 2

Introduction to MPI from PPCES (@RWTH Aachen) Part 3