Tutorial: Parallel Programming Models on Hybrid Systems

Rolf Rabenseifner, HLRS, University of Stuttgart

Slides --NEW--NEW--NEW--

A copy of the slides can be found here.

Abstract

Most HPC systems are clusters of shared memory nodes. Parallel programming must combine the distributed memory parallelization on the node inter-connect with the shared memory parallelization inside of each node.
Various hybrid MPI+OpenMP programming models are compared with pure MPI. Benchmark results of several platforms are presented. This paper analyzes the strength and weakness of several parallel programming models on clusters of SMP nodes. There are several mismatch problems between the (hybrid) programming schemes and the hybrid hardware architectures.
Benchmark results on recent Cray, NEC, IBM, Hitachi, SUN and SGI platforms show, that the hybrid-masteronly programming model can be used more efficiently on some vector-type systems.
Best performance can be achieved with overlapping communication and computation, but this scheme is lacking in ease of use. This tutorial analyses strategies to overcome typical drawbacks of this easily usable programming scheme on systems with weaker inter-connects.

Outline

Overview
MPI + OpenMP Programming Rules
Parallel Programming Models on Clusters of SMP Nodes
Mixed Model Communication Benchmarks on Several Platforms
(Cray X1, NEC SX-6, IBM, SUN, SGI, ...)
Overlapping Communication and Computation
Future Developments

This tutorial is presented at the High Performance Computing in Science and Engineering - The sixth Results and Review Workshop of the HPC Center Stuttgart (HLRS) (October 6-7, 2003).

Further information on hybrid programming can be found on the author's publication list.

Rolf Rabenseifner