Parallel Numerics

14.01.02


Start Lecture by clicking here


Content

Author: Uwe KÜster

  1. Parallel Numerics
  2.     overview
  3.     Partial Differential Equations
  4.     Solution Methods for Partial Differential Equations
  5.     different aspects of parallelism
  6.     Parallelism in hardware view
  7.     Parallelism in software view
  8.     Parallelism in performance view
  9.     Parallelism in algorithmic view
  10.     parallel operations on independent sets of data 1
  11.     Vectorization: multiple gather
  12.     Vectorization: scattering data
  13.     F90: array syntax
  14.     F90: index vectors
  15.     F95: forall ... end forall different from do ... enddo
  16.     parallel operations on independent sets of data 2
  17.     recursive task generation 1
  18.     recursive task generation 2
  19.     functional decomposition 1
  20.     functional decomposition 2
  21.     neighbourhoods
  22.     Finite Differences (Approximation of Differential Operators by Differences)
  23.     Finite Differences 2
  24.     Matrix for regular grid with 5-points stencil
  25.     Finite Volumes (Flux Balance over neighbouring sides)
  26.     finite volume and finite differences
  27.     Finite Volume
  28.     Triangular Grid with Relation Matrix
  29.     Finite Elements (local matrices form a global matrix)
  30.     unstructured self adaptive grids: CEQ 1
  31.     unstructured self adaptive grids: CEQ 2
  32.     unstructured for self adaptive grids: CEQ 3
  33.     unstructured self adaptive grids: CEQ 4
  34.     domain decomposition: overlapping grids and communication
  35.     domain decomposition: properties
  36.     domain decomposition: communicated surface for cubus
  37.     regular grid partitioning of regular grid
  38.     domain decomposition: irregular grid partitioning of regular grid
  39.     domain decomposition: recursive Spectral Bisection
  40.     domain decomposition:
  41.     Recursive Spectral Bisection
  42.     Recursive Spectral Bisection
  43.     Partitioning Software
  44. Large Systems
  45.     Large Systems
  46.     two different approaches
  47.     main directions of linear solvers
  48.     direct solvers
  49.     simple iterative procedures: Gauss-Seidel
  50.     simple iterative procedures: Jakobi
  51.     Krylov space algorithms
  52.     Conjugate Gradient Squared (CGS)
  53.     BiCGSTAB(2)
  54.     operations of Krylov space algorithms
  55.     operations of Krylov space algorithms
  56.     preconditioning
  57.     MILU for general matrix
  58.     overlapping domain decomposition
  59.     non overlapping domain decompostion
  60.     nonoverlapping Domain Decomposition
  61.     nonoverlapping DD: several 2D domains
  62.     nonoverlapping DD:
  63.     nonoverlapping DD: separation of inner and boundary nodes
  64.     nonoverlapping Domain Decomposition
  65.     nonoverlapping DD: Schur complement
  66.     nonoverlapping DD:
  67.     nonoverlapping DD: dividing boundary nodes
  68.     nonoverlapping DD: Preconditioner 1
  69.     nonoverlapping DD: Preconditioner 2
  70.     nonoverlapping DD: reference
  71.     coloring: defining independent sets
  72.     coloring of a graph: any neighbour has different color
  73.     coloring
  74.     Multigrid 1
  75.     Multigrid 2
  76.     Hierarchy of triangles
  77.     hardware and implementation
  78.     High Performance for Fast Methods
  79.     parameters of performance limits
  80.     bandwidth
  81.     latency
  82.     overcoming latency
  83.     Latency from processor to various locations
  84.     dependency of bandwidth and latency
  85.     dependency of bandwidth and latency 2
  86.     Bandwidth-Performance relation for multi-block grid with cubes
  87.     Example multiblock technique (optimal case)
  88.     distributions in quadratic patches
  89.     distribution in line patches
  90.     High Performance Systems
  91.     ASCI White IBM[2000]
  92.     Earth-Simulator Project: end 2001
  93.     programming techniques
  94.     getting processor performance by loops with arrays
  95.     flexible but expensive: linked list 1
  96.     flexible but expensive: linked list 2
  97.     nice but expensive: recursive calls
  98.     recursive calls 2
  99.     modifying a hierarchical grid
  100.     inhibiting processor performance
  101.     good for performance
  102.     cache reuse for heat transfer: different implementations on SGI R10000
  103.     object oriented programming
  104.     Formulation of flexible neighbourhoods
  105.     discrete operators and sparse matrices
  106.     handling sparse matrices
  107.     Sparse matrix example
  108.     Sparse matrix: row ordered example
  109.     Row ordered and jagged diagonal matrices:data structure
  110.     Row ordered: matrix vector multiplication
  111.     Sparse matrices: jagged diagonal example
  112.     jagged diagonal matrix: matrix vector multiplication
  113.     Row ordered and jagged diagonal matrix diagonal matrix formulation: performance
  114.     Conclusion
  115.     Books and URLs 1
  116.     Books and URLs 2
  117.     End

Back to the Parallel Programming Workshop Overview