Alice Koniges, Berkeley Lab, NERSC
Katherine Yelick, UC Berkeley and Berkeley Lab, NERSC
Rolf Rabenseifner, High Performance Computing Center
Stuttgart
Reinhold Bader, Leibniz Supercomputing Center Munich
David Eder, Lawrence Livermore National Laboratory
A full-day tutorial at SC12
PGAS (Partitioned Global Address Space) languages offer both an alternative
to traditional parallelization approaches (MPI and OpenMP), and the possibility
of improved performance on heterogeneous and modern architectures. In this tutorial we cover general PGAS
concepts and give an in depth presentation of two commonly used PGAS languages,
Coarray Fortran (CAF) and Unified Parallel C (UPC). Hands-on exercises to
illustrate important concepts are interspersed with the lectures. Basic PGAS
features, syntax for data distribution, intrinsic functions and synchronization
primitives are discussed. Advanced topics include optimization and correctness
checking of PGAS codes with an emphasis on emerging and planned PGAS language
extensions targeted at scalability and usability improvement. A section on
migration of MPI codes using performance improvements from both CAF and UPC is
given in a hybrid programming section. Longer examples, tools and performance
data on the latest petascale systems round out the presentations.
This tutorial represents
a unique collaboration between the Berkeley PGAS/UPC group and experienced
hands-on PGAS and hybrid instructors. Participants will be provided with the
technical foundations necessary to write library or application codes using CAF
or UPC, and an introduction to experimental techniques for combining MPI with
PGAS languages.
The tutorial will stress some of the advantages of PGAS programming models
including
·
potentially easier
programmability and therefore higher productivity than with purely MPI-based
programming due to one-sided communication semantics, integration of the type
system and other language features included with the parallel facilities
·
optimization potential
for the language processor (compiler + runtime system)
·
improved scalability
compared to OpenMP at the same level of usage complexity due to better locality
control
·
flexibility with respect
to architectures – PGAS may be deployed on shared memory multi-core systems as
well as (with some care required) on large-scale MPP architectures
The tutorial's strategy to provide an integrated view of both CAF and UPC
will allow the audience to get a clear picture of similarities and differences
between these two approaches to PGAS programming. Hybrid programming using both
OpenMP and PGAS will be illustrated and compared.
The PGAS base is growing
and targets a wide range of SC attendees. Application programmers, vendors and
library designers coming from both C and Fortran backgrounds, will attend this
tutorial. Multicore architectures are the norm now, from high end systems to
desktops. This tutorial therefore addresses computer professionals with access
to a very wide variety of programming platforms. This tutorial attracts more
participants than tutorials specialized to only one PGAS language. In HPC, many
applications are multi-language applications, e.g., with parts in Fortran, C
and C++. The tutorial does cover general PGAS aspects (30%), as well as the
more specific UPC implementation (35%) and CAF implementation (35%). In the
exercises, the participants can concentrate on their preferred language or use
both.
Participants should have
knowledge of at least one of the Fortran 95 and C programming languages, and
should be comfortable with running example programs in a Linux environment.
Technical assistants (TA) and other personnel will be available for help with
the exercises. In addition, a basic knowledge of traditional parallel
programming models (MPI and OpenMP) is useful but not essential for the more
advanced parts of the tutorial. Attendees are paired in groups of two or with a
TA if/when we need to accommodate attendees without laptops. On laptops, a
secure shell for SSH should be installed to be able to login to the compute
nodes that are provided for the exercises, and we have alternate servers
available both in the US and Germany if there were any problems with outages.
See also http://www.nersc.gov/users/data-and-networking/connecting-to-nersc/ .
After an introduction to general PGAS concepts as well as to the status of
the standardization efforts, the basic syntax for declaration and use of shared
data is presented; the requirements and rules for synchronization of accesses
to shared data are explained (PGAS memory model). This is followed by the topic
of dynamic memory management for shared entities. Then, advanced
synchronizations mechanisms like locks, atomic procedures as well as collective
procedures are discussed, as well as their usefulness for implementation of
certain parallel programming patterns. The section on hybrid programming
explains the way MPI makes allowances for hybrid models, and how this can be matched
with PGAS-based implementations. Finally, still existing deficiencies in the present
language definitions of CAF and UPC will be indicated; an outlook will be
provided for possible future extensions, which are presently still under
discussion among language developers, and should allow to overcome most of the
above-mentioned deficiencies.
The hands-on sessions
are interspersed with the presentations such that approximately one hour of
presentation is followed by 30 minutes of exercises. The exercises will come
from a pool of exercises that have been tested on courses given throughout
Europe, as well as additional exercises for the newest material.
The NERSC computer
center will make available a special partition of their Cray XT machines and a
set of accounts to accommodate the hands-on exercises. This model has already
been successfully deployed at the previous SC10 and SC11 PGAS tutorials. In the
event that a natural disaster or a system crash takes this planned system down,
the users will have access to the same exercises on an SGI UltraViolet system
at LRZ. Attendees will use laptops that can open a ssh window; they will be
grouped in pairs to accommodate people without a laptop, and also to handle any
other account issues that come up. Attendees may do the exercises in pairs in
both UPC and CAF, to allow comparison of both languages. When possible, C
programmers will be paired with Fortran programmers. For advanced programmers
or those who want to stay in one language, additional exercise material will be
provided for efficient use of the exercise time. UC Berkeley teaching
assistants from the course CS 267, “Applications of Parallel Computers,” may be
available as needed to help with the hands-on exercises.
Presently planned
examples include
and this list will be updated as the tutorial material
is finalized.
[-- Coffee break --]
[-- Lunch break --]
[-- Coffee break --]
[-- End --]
Dr. Alice Koniges is a Physicist and Computer Scientist at the National Energy Research
Scientific Computing Center (NERSC) at the Berkeley Lab, where she leads the
Petascale Computing Initiative and various science research projects including
preparing codes for exascale. Her current research interests include programming
models, benchmarking and optimization, applications in plasma physics, material
science, energy research, and arbitrary Lagrange Eulerian methods for
time-dependent PDE’s. Previous to working at the Berkeley Lab, she held various
positions at the Lawrence Livermore National Laboratory, including management
of the Lab’s institutional computing. She recently led the effort to develop a
new code that is used predict the impacts of target shrapnel and debris on the
operation of the National Ignition Facility (NIF), the world’s most powerful
laser. She was the first woman to receive a PhD in Applied and Computational
Mathematics at Princeton University and also has MSE and MA degrees from
Princeton and a BA in Applied Mechanics from the University of California, San
Diego. She is editor and lead author of the book “Industrial Strength Parallel
Computing,” (Morgan Kaufmann Publishers 2000) and has published more than 80
refereed technical papers.
Dr. Katherine Yelick is the Associate Laboratory Director for
Computing Sciences at Lawrence Berkeley National Laboratory, Director of the
National Energy Research Scientific Computing (NERSC) Center and a Professor of
Electrical Engineering and Computer Sciences at the University of California at
Berkeley. She is the author or co-author of two books and more than 100
refereed technical papers on parallel languages, compilers, algorithms,
libraries, architecture, and storage. She co-invented the UPC and Titanium
languages and demonstrated their applicability across architectures through the
use of novel runtime and compilation methods. She also co-developed techniques
for self-tuning numerical libraries, including the first self-tuned library for
sparse matrix kernels which automatically adapt the code to properties of the
matrix structure and machine. Her work includes performance analysis and
modeling as well as optimization techniques for memory hierarchies, multicore
processors, communication libraries, and processor accelerators. She earned her Ph.D. in Electrical
Engineering and Computer Science from MIT and has been a professor of
Electrical Engineering and Computer Sciences at UC Berkeley since 1991 with a
joint research appointment at Berkeley Lab since 1996. She has received
multiple research and teaching awards and is a member of the California Council
on Science and Technology and a member of the National Academies committee on
Sustaining Growth in Computing Performance.
Dr. Rolf
Rabenseifner studied mathematics and physics at
the University of Stuttgart. Since 1984, he has worked at the High-Performance
Computing-Center Stuttgart (HLRS). He led the projects DFN-RPC, a remote
procedure call tool, and MPI-GLUE, the first metacomputing MPI combining
different vendor's MPIs without losses to full MPI functionality. In his
dissertation, he developed a controlled logical clock as global time for
trace-based profiling of parallel and distributed applications. Since 1996, he
has been a member of the MPI-2 Forum and since Dec. 2007 he is in the steering
committee of the MPI-3 Forum. From January to April 1999, he was an invited
researcher at the Center for High-Performance Computing at Dresden University
of Technology. Currently, he is head of Parallel Computing - Training and
Application Services at HLRS. He is involved in MPI profiling and benchmarking
e.g., in the HPC Challenge Benchmark Suite. In recent projects, he studied
parallel I/O, parallel programming models for clusters of SMP nodes, and optimization
of MPI collective routines. In workshops and summer schools, he teaches
parallel programming models in many universities and labs in Germany. In
January 2012, the Gauss Center of Supercomputing (GCS), with HLRS, LRZ in
Garching and the Jülich Supercomputing Center as members, was selected as one
of six PRACE Advanced Training Centers (PATCs) and he was appointed as GCS'
PATC director.
Dr. Reinhold
Bader studied physics and mathematics at the
Ludwigs-Maximilians University in Munich, completing his studies with a PhD
(“Electronic Properties of Boron Nitride and Gallium Arsenide under hydrostatic
pressure and tetragonal deformation”) in theoretical solid-state physics in
1998. Since the beginning of 1999, he has worked at Leibniz Supercomputing
Centre (LRZ) as a member of the scientific staff, being involved in HPC user
support, procurements of new systems, benchmarking of prototypes in the context
of the PRACE project, courses for parallel programming, and configuration
management for the HPC systems deployed at LRZ. Since May 2012, he is leader of
the HPC services group at LRZ, which is responsible for operation of all
HPC-related systems and system software packages at LRZ. As a member of the
German delegation to WG5, the international Fortran Standards Committee, he has
contributed to the further development of the Fortran language, in particular
the Technical Specification TS 29113 (Further Interoperability of Fortran with
C), and a future Technical Specification of extended coarray facilities. In
connection with the work on TS 29113, he has participated in the discussion and
proofreading of the new MPI-3.0 Fortran interfaces, which is presently being
finalized under the auspices of the MPI Forum. He has published a number of
contributions to ACMs Fortran Forum and is responsible for development and
maintenance of the Fortran interface to the GNU Scientific Library.
Dr.
David Eder is a computational physicist and group
leader at the Lawrence Livermore National Laboratory in California. He has
extensive experience with application codes for the study of multiphysics
problems, including codes in various programming languages. His latest
endeavors include ALE (Arbitrary Lagrange Eulerian) on unstructured and
block-structured grids for simulations that span many orders of magnitude. He
was awarded a research prize in 2000 for use of advanced codes to design the
National Ignition Facility (NIF) 192 beam laser that is now in operational mode.
He is currently designing and performing large-scale simulation efforts to
predict performance of the NIF. He has a track record of giving SC tutorials to
a broad audience and with a particular emphasis on real applications and
performance issues. He has a PhD in
Astrophysics from Princeton University and a BS in Mathematics and Physics from
the Univ. of Colorado. He has published approximately 80 research papers.
·
Languages
·
Parallel Programming
·
Performance
·
Applications
URL of this page (shortened):
https://fs.hlrs.de/projects/rabenseifner/publ/SC2012-PGAS.html