Real World Techniques for Scientific Applications of Scale

Alice Koniges, Lawrence Livermore National Laboratory (LLNL),
Mark Seager, Lawrence Livermore National Laboratory (LLNL),
Rolf Rabenseifner, High Performance Computing Center Stuttgart (HLRS), University of Stuttgart,
David Eder, Lawrence Livermore National Laboratory (LLNL)

Full-day Tutorial at Supercomputing 2003 (SC2003).

Abstract

Teraflop performance is no longer a thing of the future as complex integrated 3D simulations drive supercomputer development. Today, most HPC systems are clusters of SMP nodes ranging from dual-CPU-PC clusters to the the largest systems at the world's major computing centers.

What are the major issues facing application code developers today? How do the challenges vary from cluster computing to the complex hybrid architectures with superscalar and vector processors? Finally, what is our path both architectually and algorithmically to petaflop performance? What skills and tools are required, both of the application developer and the system itself?

In this tutorial we address these questions and give tips, tricks, and tools of the trade for large-scale application development. A special emphasis is given to mixed-mode (combined MPI/OpenMP) programming. In the introduction, we provide an overview of terminology, hardware and performance. We describe the latest issues in implementing scalable parallel programming. We draw from a series of large application suites and discuss specific challenges and problems encountered in parallelizing these applications. Additional topics cover parallel I/O, scripting languages and code wrappers. We conclude with a road map for the possible paths to petaflop computing.


Detailed Description

Tutorial Goals:

The skill set required of application programmers and their support teams at major computing centers continues to grow with a curve similar to Moore’s Law. This tutorial covers all of the basic ideas necessary for major application development. Specific examples are used so that developers and managers can understand what constitutes good application performance and what tools are needed to attain it. The latest material on mixed-mode programming (combined MPI/OpenMP) is covered in some depth with emphasis on performance tricks for a variety of hybrid architectures available at major computer centers throughout the world. Additional material on the nuts and bolts of application programming, from debuggers like TotalView, performance tools like Vampir, and scripting languages such as Yorick and Python help those interested in developing practical applications sort through the variety of tools and resources available.

Who should attend?

Those interested in high-end applications of parallel computing should attend this tutorial. This tutorial topic appeals to a variety SC attendees as seen from our past tutorial attendance records. The attendees range from managers at industrial firms to graduate students. The introductory material should provide enough background for beginners to understand the basic issues of parallel code development with references for further study. The more advanced material is aimed at both researchers in parallel computing methodology and applications programmers who want an overview of what constitutes good parallel performance and how to attain it.

Content Level

20% Introductory, 45% Intermediate, 35% Advanced

Audience Prerequisites

The audience is assumed to have a basic understanding of parallel computing. From past attendance records, 88% felt that the tutorial met with their expectations. Full-day tutorials generally have a smaller number of attendees than half-day tutorials, however, at the end of the day at SC2002 our tutorial had a full room including many who had left their original tutorial choice and switched to our presentation. Many of the switching attendees did not fill out survey forms since they came in later in the day, but commented directly to us that they enjoyed the presentations. Many participated in the final discussion and question session.

Sample Material

Parallel programming materials including mixed-mode programming are adapted from the on-line course (http://www.hlrs.de/organization/par/par_prog_ws/). A sample of the applications is given on the web site for Industrial Strength Parallel Computing (http://www.bhusa.com/computing/us/subindex.asp?maintarget=/bookscat/search/details.asp&isbn=1558605401) and some Gordon-Bell prize-winning studies at  (http://www.cs.odu.edu/~keyes/bell.html) and (http://www.llnl.gov/CASC/asciturb). Other applications discussed include Earth Simulator codes and coupled ALE simulations.  

Tutorial Outline


Authors' Biographies

Alice E. Koniges is a Member of the Accelerated Strategic Computing Initiative (ASCI) research team at the Lawrence Livermore National Laboratory in California. She has recently returned from a loan to the Max-Planck Institute in Garching, Germany (Computer Center and Plasma Physics Institute) where she was a consultant to users at this institute helping with the conversion of applications codes for MPP computers. From 1995 to 1997, she was leader of the Parallel Applications Technology Program at Lawrence Livermore Lab. This was Livermore's portion of the largest ($40Million) CRADA (Cooperative Research and Development Agreement) ever undertaken by the Dept. of Energy.  The scope of the agreement provided for the design of parallel industrial supercomputing codes on MPP platforms. She is also Editor of the book by Morgan Kaufmann Publishers of San Francisco "Industrial Strength Parallel Computing." She has a Ph.D. in Applied and Numerical Mathematics from Princeton University, an MA and an MSME from Princeton, and a BA in Engineering Sciences from the University of California, San Diego.   (http://www.rzg.mpg.de/~ack/).

 

Mark Seager. Recognized leader in tera-scale computing systems design, procurements and integration with 19 years experience in parallel computing. Played a significant role in developing the US DOE Accelerated Strategic Computing Initiative's computing and problem solving environment (PSE) strategies including shaping the cluster of SMP's approach of "Option Blue" and "Option White." Developed the computational strategy and integrated architecture for LLNL multi-programmatic and institutional computing. Developed LLNL high-performance commodity (IA-32/Linux/Open Source) clustering strategy. Current principal investigator for the ASCI Platforms at Livermore with responsibility for executing the tri-laboratory efforts in tera-scale computing strategy, integration and support. Lead ASCI Purple ($290M) procurement team. Negotiated over $300M In contracts for scalable system procurements with multiple vendors. Previously PI for ASCI/Problem Solving Environment including coordinated computing strategy, platform support, applications development support, distributed computing environment, visualization and numerical methods and tri-laboratory networking. Lead planning effort to develop ten-year strategic vision: "Full Spectrum Computing." Defined vision, architecture and obtained funding for Scaleable I/O Facility. Drove technical specification and evaluation for first Federal MPP competitive procurement. Supervised design and implementation of major center networks. Defined and supervised programming teams with responsibility for: Cray's running NLTSS and UNICOS operating systems; networks; super mini-computers, workstations and desktop systems; department databases and computer resource utilization accounting. Primary customer requirement gathering contact for future department products. Coordinated aspects of migration from local developed products to industry standard operating systems and network protocols. Developed scheduling methodologies for MPP's and evaluation models. Developed techniques for visualization of parallel application execution. Developed major sparse linear algebra package for the solution of symmetric and non-symmetric sparse linear systems (SLAP). Worked with large code groups to integrate SLAP into production applications. Developed numerical methods for parallel architectures.

 

Rolf Rabenseifner studied mathematics and physics at the University of Stuttgart. Since 1984, he has worked at the High-Performance Computing-Center Stuttgart (HLRS). He led the projects DFN-RPC, a remote procedure call tool, and MPI-GLUE, the first metacomputing MPI combining different vendor's MPIs without loosing the full MPI interface. In his dissertation, he developed a controlled logical clock as global time for trace-based profiling of parallel and distributed applications. Since 1996, he has been a member of the MPI-2 Forum. From January to April 1999, he was an invited researcher at the Center for High-Performance Computing at Dresden University of Technology. Currently, he is head of the parallel computing department of the HLRS. He is involved in MPI profiling and benchmarking. In workshops and summer schools, he teaches parallel programming models in many universities and labs in Germany.   (http://www.hlrs.de/people/rabenseifner/).

 

David Eder is a computational physicist at the Lawrence Livermore National Laboratory in California. He has extensive experience with application codes for the study of multiphysics problems. His latest endeavors include ALE (Arbitrary Lagrange Eulerian) on unstructured and block-structured grids for simulations that span many orders of magnitude. He was awarded a research prize in 2000 for use of advanced codes to design the National Ignition Facility 192 beam laser currently under construction. He has a PhD in Astrophysics from Princeton University and a BS in Mathematics and Physics from the Univ. of Colorado.

 

URL of this page: http://www.hlrs.de/people/rabenseifner/publ/SC2003-tutorial.html.