Rolf Rabenseifner, HLRS, University of Stuttgart
Slides --NEW--NEW--NEW--
A copy of the slides can be found here.
Abstract
Most HPC systems are clusters of shared memory nodes.
Parallel programming must combine the distributed memory
parallelization on the node inter-connect with the shared
memory parallelization inside of each node.
Various hybrid MPI+OpenMP programming models are compared with pure MPI.
Benchmark results of several platforms are presented.
This paper analyzes the strength and weakness of several parallel programming
models on clusters of SMP nodes.
There are several mismatch problems between the (hybrid) programming schemes
and the hybrid hardware architectures.
Benchmark results on recent Cray, NEC, IBM, Hitachi, SUN and SGI platforms
show, that the hybrid-masteronly programming model can be used more efficiently
on some vector-type systems.
Best performance can be achieved with overlapping communication and computation,
but this scheme is lacking in ease of use.
This tutorial analyses strategies to overcome typical drawbacks of this
easily usable programming scheme on systems with weaker inter-connects.
Outline
This tutorial is presented at the High Performance Computing in Science and Engineering - The sixth Results and Review Workshop of the HPC Center Stuttgart (HLRS) (October 6-7, 2003).
Further information on hybrid programming can be found on the author's publication list.
Rolf Rabenseifner