Hybrid Parallel Programming: Performance Problems and Chances

Rolf Rabenseifner
High-Performance Computing-Center Stuttgart (HLRS)
University of Stuttgart
Allmandring 30
D-70550 Stuttgart
Germany
rabenseifner@hlrs.de
http://www.hlrs.de/people/rabenseifner/

ABSTRACT:
Most HPC systems are clusters of shared memory nodes. Parallel programming must combine the distributed memory parallelization on the node inter-connect with the shared memory parallelization inside of each node. Various hybrid MPI+OpenMP programming models are compared with pure MPI. Benchmark results of several platforms are presented. This paper analyzes the strength and weakness of several parallel programming models on clusters of SMP nodes. Benchmark results show, that the hybrid-masteronly programming model can be used more efficiently on some vector-type systems, although this model suffers from sleeping application threads while the master thread communicates. This paper analyses strategies to overcome typical drawbacks of this easily usable programming scheme on systems with weaker inter-connects. Best performance can be achieved with overlapping communication and computation, but this scheme is lacking in ease of use.

KEYWORDS:
OpenMP, MPI, Hybrid Parallel Programming, Threads and MPI, HPC, Performance.

GLOBAL LINKS:
Full paper as reference, PDF document, postscript, gzip'ed postscript.
Slides as reference, PDF document.
Used benchmark code mpi_bench4
Information about MPI from the author