## **HLRS Workshop**

## Tuesday 16<sup>th</sup> April – Friday 19<sup>st</sup> April

The 4 day workshop will give attendees the knowledge required to understand the most optimal way to port, optimize and execute applications on the HLRS Cray XE/XC service. The workshop is a mixture of lectures and practical sessions. Example exercises will be provided but attendees are encouraged to bring along their own applications to the workshop. Although specifically targeting the Cray architecture and programming environment much of the lessons learned with be more generally useful.

The first three days, specialists from Cray will support you in your effort porting and optimizing your application on our Cray XE6. On the fourth day, Georg Hager and Jan Treibig from RRZE will present detailed information on optimizing codes on the multicore AMD Interlagos and Intel Sandy Bridge processors.

The Cray presenters for the workshop are:

Stefan Andersson, Aniello Esposito and Stephen Sachs, HLRS on-site application support, Cray Computer Deutschland GmbH, Georg Hager and Jan Treibig from RRZE

# **First Day**

Attendees will learn about the Cray XE/XC architecture and its programming environment. They will have an initial understanding of potential causes of application performance bottlenecks, and how to identify some of these bottlenecks using the Cray Performance tools. The Attendees will use the Cray performance tools to profile their application.

```
09:00 – 09:30 Registration
09:30 - 09:40 Welcome
                                                                HLRS
                                                                Stefan Andersson
09:40 – 10:30 Overview of the Cray XE/XC Architecture
              (HLRS specifics, system, nodes, processors, network,
              Network performance, packaging and cooling,
              I/O system)
10:30 - 10:45 Break
10:45 – 11:30 Programming Environment for the Cray system
                                                                Stefan Andersson
              (Cray compilation environment, modules, linking...)
11:30 - 11:45 Break
                                                                Stefan Andersson
11:45 – 13:00 Cray Linux Environment
              (node linux, Alps, running jobs)
13:00 – 14:00 Lunch
14:00 – 15:00 Short introduction to tools on the Cray system
                                                                Aniello Esposito
              (STAT, ATP, Fast track debugging, profiling,
              DDT)
15:00 – 15:15 Break
15:15 – 15:30 Introductions to the Hands-On session
                                                                Aniello Esposito
15:30 – 18:00 Hands on Lab Profiling Applications
                                                                Cray/attendees
             Coffee and beverage available
15:00 -
              Social Event: guided city tour and dinner (self-paying)
18:00
```

# **Second Day**

We will finalize the presentations on how to identify performance bottlenecks. The attendees will use Cray Reveal and Cray Apprentice2 for performance visualization and will learn various optimization techniques. The attendees will start to tune their applications at the hands on lab.

| \ J1 1                                          | nalysis and Visualization I<br>ng, sampling, tracing, tools) | Stephen Sachs    |
|-------------------------------------------------|--------------------------------------------------------------|------------------|
| 10:00 – 10:15 Break                             |                                                              |                  |
| 10:15 – 11:00 Performance Ar<br>(load imbalance | nalysis and Visualization II<br>e, PGAS/OpenMP, Reveal)      | Aniello Esposito |
| 11:00 – 11:30 Reveal Demo                       |                                                              | Aniello Esposito |
| 11:30 – 13:00 Hands on Lab T                    | uning applications                                           | Cray/attendees   |
| 13:00 – 14:00 Lunch                             |                                                              |                  |
| 14:00 – 15:00 Optimization Te                   | echniques I (CPU)                                            | Stephen Sachs    |
| 1                                               | optimization, cache, vectorization)                          | T                |
| 15:00 – 15:15 Break                             |                                                              |                  |
| 15:15 – 15:45 Optimization Te                   | echniques II (MPI)                                           | Stephen Sachs    |
| (MPI life of a m                                | nessage, reordering, huge pages)                             | -                |
| 15:45 – 17:00 Hands on Lab T                    | uning applications                                           | Cray/attendees   |
| 15:45 – Coffee and beve                         | ~ 11                                                         | -                |

# **Third Day**

The attendees will learn advanced techniques to deal with scaling problems and how to access the on-line documentation for user help. In the hands on lab the attendees will continue to tune their applications.

| 09:00 – 10:30 I/O Optimization                                                          | Stefan Andersson |
|-----------------------------------------------------------------------------------------|------------------|
| (Parallel I/O, filesystems, Lustre, MPI-IO)                                             |                  |
| 10:30 – 10:45 Break                                                                     |                  |
| 10:45 – 11:45 Cray Scientific and Math Libraries                                        | Stephen Sachs    |
| 11:45 – 13:00 Hands on Lab Tuning applications                                          | Cray attendees   |
| Possibility to do a computer room tour                                                  | •                |
| 13:00 – 14:00 Lunch                                                                     |                  |
| 14:00 – 17:00 Hands on Lab Tuning applications<br>15:00 – Coffee and beverage available | Cray/attendees   |

## **Fourth Day**

The fourth day is dedicated to single-core and single-node performance and optimization in the multi-core and multi-socket environment with the Interlagos and Sandy Bridge CPUs in Cray XE6/XC30 systems. After introducing the basic architectural features we demonstrate simple performance modeling techniques, so that attendees get a clear view on the typical bottlenecks and performance patterns that are present on multicore nodes.

09:00 – 10:15 Lectures and hands-on

10:15 – 10:30 Break

10:30 – 11:45 Lectures and hands-on

11:45 – 12:00 Break

12:00 – 13:00 Lectures and hands-on

13:00 – 14:00 Lunch

14:00 - 15:00 Lectures and hands-on

15:00 – 15:15 Break

15:15 – 16:30 Lectures and hands-on

#### Introduction

- Architecture of multisocket multicore systems, with a special focus on the Interlagos and Sandy Bridge chips in Cray XE/XC
- Node topology: cores, caches, chips, ccNUMA
- Core architecture and typical bottlenecks
- Data parallelism: SIMD and its implications
- Data transfer through the cache hierarchy
- Performance composition

## Multicore performance and tools

- Affinity enforcement
- Performance counter measurements
- Basics and best practices for performance counter profiling
- Microbenchmarking for architectural exploration
- Roadblocks for scalability on multicore chips
  - Scaling properties and typical OpenMP overhead
  - Bandwidth saturation in cache and main memory

## Optimal utilization of parallel resources

- Programming for SIMD parallelism
- Programming in ccNUMA environments

## Simple performance modeling: The Roofline Model

- Introduction to the Roofline Model
- Example: "Simple" streaming loops
- Example: Understanding and optimizing the performance of a Jacobi stencil code
- Example: Sparse matrix-vector multiplication
- Outlook: Extending the roofline model

#### Literature:

Georg Hager and Gerhard Wellein: Introduction to High Performance Computing for Scientists and Engineers. Chapman & Hall / CRC Press, 2010, 356 pages. ISBN 978-1-4398-1192-4.