Practical: Working environment
------------------------------ 

 1. Your working directory: cd ~/CGV/{nr}  with {nr} = number of your group (01..19)
 
 2. Choose your task: {task} (out of 01 02 04)
 
 3. Fetch your skeleton:   cp  ~/CGV/skel/cgv_{task}.F90  . 
 
 4. Add your code, compile, run and test it (correct result?, same as serial result?)
 
 5. If your task works:
    extract your part (from   /*== task_ii begin ==*/  to  /*== task_ii end ==*/ )
    into  cgvp{task}.F90
 
 6. When all groups have finished, everyone can check the total result with:
    ls -l ../*/cgvp*.F90
    cat ../00/cgvp00.F90 ../*/cgvp08.F90 ../*/cgvp01.F90  ../*/cgvp02.F90  ../*/cgvp03.F90
        ../*/cgvp04.F90 ../*/cgvp05.F90  ../*/cgvp06.F90  ../*/cgvp07.F90 > cg_all.F90
 
    Caution: - duplicate parts must be selected by hand ({nr} instead of *)
             - missing parts may be fetched also from ../source/parts/cgvp{task}.F90

 7. Compile and run cgv_all.F90

    - on T3E:
      f90 -o cgv_all cgv_all.F90 -lm           (compile)
      fpart                                    (to look whether there are free CPUs)
      mpirun -np 2 ./cgv_all                   (run parallel)

    - on many other platforms
      f90 -o cgv_all cgv_all.F90 -lm -lmpi     (compile)
      or  mpif90 -o cgv_all cgv_all.F90 -lm    (compile)
      mpirun -np 2 ./cgv_all                   (run parallel)
      mpirun -np 4 ./cgv_all -n 100 -m 100     (run parallel with larger dataset)
 
    - on non-MPI platforms:
      f90 -Dserial -o cgv_all cgv_all.F90 -lm  (compile)
      ./cgv_all                                (run serial)
 

Practical: Options
------------------ 

Compile-time options [default]:
-Dserial              compile without MPI and without distribution [parallel]
-DWITH_MEMOPS         use memcpy and memset functions instead of loops
                      for memory copy and set operations [loops]
-DFASTMEMXS           run through the physical area during matrix-vector-mult
                      calculating two rows in one loop for better memory bandwidth usage

Run-time options [default]:
-m {m}                vertical dimension of physical heat area [4]
-n {n}                horizontal dimension  [4]
-imax {iter_max}      maximum number of iterations in the CG solver [500]
-eps {epsilon}        abort criterion of the solver for residual vector [1e-6]
-twodims              choose 2-dimensional domain decomposition [1-dim]
-mprocs {m_procs}     choose number of processors, vertical, (-twodims needed) 
-nprocs  {n_procs}    and horizontal [given by MPI_Dims_create]
-prtlev  0|1|2|3|4|5  printing and debug level [1]:
                        1 = only || result - exact solution || and partial result matrix
                        2 = and residual norm after each iteration
                        3 = and result of physical heat matrix
                        4 = and all vector and matrix information in 1st iteration
                        5 = and in all iterations

Example: mpirun -np 4 ./cgv_all -m 200 -n 200 -twodims