Alice Koniges, Berkeley Lab, NERSC
Katherine Yelick, UC Berkeley and Berkeley Lab, NERSC
Rolf Rabenseifner, High Performance Computing Center Stuttgart (HLRS)
Reinhold Bader, Leibniz Supercomputing Center Munich (LRZ)
David Eder, Lawrence Livermore National Laboratory (LLNL)
A full-day tutorial at SC10
PGAS (Partitioned Global Address Space) languages offer both an alternative
to traditional parallelization approaches (MPI and OpenMP),
and the possibility of being combined with MPI for a multicore
hybrid programming model. In this tutorial we cover PGAS concepts and two
commonly used PGAS languages, Coarray Fortran (CAF,
as specified in the Fortran standard) and the
extension to the C standard, Unified Parallel C (UPC). Hands-on exercises to
illustrate important concepts are interspersed with the lectures. Attendees
will be paired in groups of two to accommodate attendees without laptops. Basic
PGAS features, syntax for data distribution, intrinsic functions and
synchronization primitives are discussed. Additional topics include parallel
programming patterns, future extensions of both CAF and UPC, and hybrid
programming. In the hybrid programming section we show how to combine PGAS languages
with MPI, and contrast this approach to combining OpenMP
with MPI. Details:
https://fs.hlrs.de/projects/rabenseifner/publ/SC2010-PGAS.html
This tutorial represents
a unique collaboration between the Berkeley PGAS/UPC group and experienced
hands-on PGAS and hybrid instructors. Participants will be provided with the
technical foundations necessary to write library or application codes using CAF
or UPC, and an introduction to experimental techniques for combining MPI with
PGAS languages.
The tutorial will stress some of the advantages of PGAS programming models
including
·
potentially easier
programmability and therefore higher productivity than with purely MPI-based
programming due to one-sided communication semantics, integration of the type
system and other language features included with the parallel facilities
·
optimization potential
for the language processor (compiler + runtime system)
·
improved scalability
compared to OpenMP at the same level of usage
complexity due to better locality control
·
flexibility with respect
to architectures – PGAS may be deployed on shared memory multi-core systems as
well as (with some care required) on large-scale MPP architectures
The tutorial's strategy to provide an integrated view of both CAF and UPC
will allow the audience to get a clear picture of similarities and differences
between these two approaches to PGAS programming. Hybrid programming using both
OpenMP and PGAS will be illustrated and compared.
The PGAS base is growing
and targets a wide range of SC attendees. Application programmers, vendors and
library designers coming from both C and Fortran
backgrounds, will attend this tutorial. Multicore
architectures are the norm now, from high end systems to desktops. This
tutorial therefore addresses computer professionals with access to a very wide
variety of programming platforms.
30% introductory, 40%
intermediate, 30% advanced
Participants should have
knowledge of at least one of the Fortran 95 and C
programming languages, possibly both, and be comfortable with running example
programs in a Linux environment. Technical assistants and other personnel will
be available for help with the exercises. In addition, a basic knowledge of
traditional parallel programming models (MPI and OpenMP)
is useful for the more advanced parts of the tutorial. Attendees will be paired
in groups of two to accommodate attendees without laptops. If you have a
laptop, a secure shell should be installed (e.g. OpenSSH
or PuTTY) to be able to login on the parallel compute
server that will be provided for the exercises, see also http://www.nersc.gov/nusers/help/access/ssh_apps.php
.
After an introduction to
general PGAS concepts as well as to the status of the standardization efforts,
the basic syntax for declaration and use of shared data is presented; the
requirements and rules for synchronization of accesses to shared data are
explained (PGAS memory model). This is followed by the topic of dynamic memory
management for shared entities. Then, advanced synchronizations mechanisms like
locks, atomic procedures as well as collective procedures are discussed, as
well as their usefulness for implementation of certain parallel programming
patterns. The section on hybrid programming explains the way MPI makes
allowances for hybrid models, and how this can be matched with PGAS-based
implementations. Finally, still existing deficiencies in the present language
definitions of CAF and UPC will be indicated; an outlook will be provided for
possible future extensions, which are presently still under discussion among
language developers, and should allow to overcome most of the above-mentioned
deficiencies.
The hands-on sessions
are interspersed with the presentations such that approximately one hour of
presentation is followed by 30 minutes of exercises. The exercises will come
from a pool of exercises that have been tested on courses given throughout
Europe, as well as additional exercises for the newest material.
Presently planned examples include
·
basic exercises to
understand the principles of UPC and CAF
·
parallelization of a
matrix-vector multiplication
·
parallelization of a
simple 2-dimensional jacobi code
·
parallelization of a ray
tracing code
[-- Coffee break --]
[-- Lunch break --]
[-- Coffee break --]
[-- End --]
Dr. Alice Koniges is a Physicist and
Computer Scientist at the National Energy Research Scientific Computing Center
(NERSC) at the Berkeley Lab. Previous to working at the Berkeley Lab, she held
various positions at the Lawrence Livermore National Laboratory, including
management of the Lab’s institutional computing. She recently led the effort to
develop a new code that is used predict the impacts of target shrapnel and
debris on the operation of the National Ignition Facility (NIF), the world’s
most powerful laser. Her current research interests include parallel computing
and benchmarking, arbitrary Lagrange Eulerian methods
for time-dependent PDE’s, and applications in plasma
physics and material science. She was the first woman to receive a PhD in
Applied and Computational Mathematics at Princeton University and also has MSE
and MA degrees from Princeton and a BA in Applied Mechanics from the University
of California, San Diego. She is editor and lead author of the book “Industrial
Strength Parallel Computing,” (Morgan Kaufmann Publishers 2000) and has
published more than 80 refereed technical papers.
Dr.
Katherine Yelick is the Director of the National Energy Research Scientific Computing
Center (NERSC) at Lawrence Berkeley National Laboratory and a Professor of
Electrical Engineering and Computer Sciences at the University of California at
Berkeley. She is the author or co-author of two books and more
than 100 refereed technical papers on parallel languages, compilers,
algorithms, libraries, architecture, and storage. She co-invented the UPC and
Titanium languages and demonstrated their applicability across architectures
through the use of novel runtime and compilation methods. She also co-developed
techniques for self-tuning numerical libraries, including the first self-tuned
library for sparse matrix kernels which automatically adapt the code to
properties of the matrix structure and machine. Her work includes performance
analysis and modeling as well as optimization techniques for memory
hierarchies, multicore processors, communication
libraries, and processor accelerators. She has worked with
interdisciplinary teams on application scaling, and her own applications work
includes parallelization of a model for blood flow in the heart. She earned her
Ph.D. in Electrical Engineering and Computer Science from MIT and has been a
professor of Electrical Engineering and Computer Sciences at UC Berkeley since
1991 with a joint research appointment at Berkeley Lab since 1996. She has
received multiple research and teaching awards and is a member of the
California Council on Science and Technology and a member of the National
Academies committee on Sustaining Growth in Computing Performance.
Dr. Rolf
Rabenseifner studied mathematics and physics at
the University of Stuttgart. Since 1984, he has worked at the High-Performance
Computing-Center Stuttgart (HLRS). He led the projects DFN-RPC, a remote
procedure call tool, and MPI-GLUE, the first metacomputing
MPI combining different vendor's MPIs without losses
to full MPI functionality. In his dissertation, he developed a controlled
logical clock as global time for trace-based profiling of parallel and
distributed applications. Since 1996, he has been a member of the MPI-2 Forum
and since Dec. 2007 he is in the steering committee of the MPI-3 Forum. From
January to April 1999, he was an invited researcher at the Center for
High-Performance Computing at Dresden University of Technology. Currently, he
is head of Parallel Computing - Training and Application Services at HLRS. He
is involved in MPI profiling and benchmarking e.g., in the HPC Challenge Benchmark
Suite. In recent projects, he studied parallel I/O, parallel programming models
for clusters of SMP nodes, and optimization of MPI collective routines. In
workshops and summer schools, he teaches parallel programming models in many
universities and labs in Germany.
Homepage: http://www.hlrs.de/people/rabenseifner/
List of publications: https://fs.hlrs.de//projects/rabenseifner/publ/
International teaching: https://fs.hlrs.de//projects/rabenseifner/publ/#tutorials
Dr. Reinhold
Bader studied physics and mathematics at the Ludwigs-Maximilians University in Munich, completing his
studies with a PhD in theoretical solid state physics in 1998. Since the
beginning of 1999, he has worked at Leibniz Supercomputing Centre (LRZ) as a
member of the scientific staff, being involved in HPC user support, procurements
of new systems, benchmarking of prototypes in the context of the PRACE project,
courses for parallel programming, and configuration management for the HPC
systems deployed at LRZ. As a member of the German delegation to WG5, the
international Fortran Standards Committee, he also takes part in the
discussions on further development of the Fortran
language. He has published a number of contributions to ACMs
Fortran Forum and is responsible for development and
maintenance of the Fortran interface to the GNU Scientific Library.
Sample of national teaching:
·
LRZ Munich / RRZE Erlangen 2001-2010 (5 days) - G.
Hager, R. Bader et al: Parallel Programming and Optimization on High
Performance Systems
·
LRZ Munich (2009) (5 days) - R. Bader: Advanced
Fortran topics - object-oriented programming, design patterns, coarrays and C
interoperability
·
LRZ Munich (2010) (1 day) - A. Block and R. Bader:
PGAS programming with coarray Fortran and UPC
Dr.
David Eder is a computational physicist and group
leader at the Lawrence Livermore National Laboratory in California. He has
extensive experience with application codes for the study of multiphysics problems. His latest endeavors include ALE
(Arbitrary Lagrange Eulerian) on unstructured and
block-structured grids for simulations that span many orders of magnitude. He
was awarded a research prize in 2000 for use of advanced codes to design the
National Ignition Facility 192 beam laser currently under construction. He has
a PhD in Astrophysics from Princeton University and a BS in Mathematics and
Physics from the Univ. of Colorado. He has published approximately 80 research
papers.
·
Languages
·
Parallel Programming
·
Performance
·
Applications
URL of this page:
https://fs.hlrs.de/projects/rabenseifner/publ/SC2010-PGAS.html
Exercises :
https://fs.hlrs.de/projects/rabenseifner/publ/SC2010-PGAS.zip
https://fs.hlrs.de/projects/rabenseifner/publ/SC2010-PGAS.tar.gz