The effective bandwidth beff measures the accumulated bandwidth of the communication network of a parallel and/or distributed computing system. Several message sizes, communication patterns and methods are used. The algorithm uses an average to take into account that in real applications short and long messages result in different bandwidth values.
beff = | logavg | ( | logavgring patterns | (sumL (maxmthd (maxrep ( b(ring pat.,L,mthd,rep) | )))/21 ), |
logavgrandom patterns | (sumL (maxmthd (maxrep ( b(random pat.,L,mthd,rep) | )))/21 ) | |||
) |
with
In each ring, the processes are sorted by their ranks in the topology metioned above.
First approach from Karl Solchenbach, Hans-Joachim Plum and Gero Ritzenhoefer [1,2] was based on the bi-section bandwidth.
Due to several problems a redesign was done. . This redesign tries not to violate the rules defined by Rolf Hempel in [3] and by William Gropp and Ewing Lusk in [4].
Each run of the benchmark on a particular system results in an output file. The last line of this output file reports e.g.
b_eff = 9709.549 MB/s = 37.928 * 256 PEs with 128 MB/PE on sn6715 hwwt3e 2.0.4.71 unicosmk CRAY T3E
This line reports
If you use this benchmark, please send us back the following information:
Additionally -- only for you -- b_eff.c writes the last summary line also on stderr.
Some examples on how to compile and start b_eff.c are given on the first lines in b_eff.c.
Please send the mail to rabenseifner@rus.uni-stuttgart.de.
On Nov. 7, 1999, on sn6715 hwwt3e 2.0.4.71 unicosmk CRAY T3E, with 128 MB/PE: The measurements with 2 to 256 PEs were done while an other application was running on the first 256 PEs. Currently the 512 PEs value must be computed on the base of former measurements with release 3.1, using the 1 dimensional cyclic and the random values. The MPI implementation mpt.1.3.0.2 with used and the environment variable MPI_BUFFER_MAX=4099 was set.
Used commands: module switch mpt mpt.1.3.0.2 cc -o b_eff -D MEMORY_PER_PROCESSOR=128 b_eff.c export MPI_BUFFER_MAX=4099 mpirun -np size ./b_eff > result_3.2_t3e_size
On Nov. 9, 1999, on SUPER-UX hwwsx4 9.1 Rev1 SX-4, with 256 MB/PE: The measurement was done on a (dedicated) resource block with 16 processors while other application were running on the other processors (exception: the benchmark on 4 processors was done nteractively with time-sharing).
Used commands: mpicc -o b_eff -D MEMORY_PER_PROCESSOR=256 b_eff.c -lm mpirun -np size ./b_eff > result_3.2_sx4_size
On Nov. 9, 1999, on SUPER-UX sx5 9.2 k SX-5/8B, preliminary measurements were done with 256 MB/PE and without the non-blocking communication method:
size | beff MByte/s | beff/size MByte/s | summary | full protocol |
4 | 5439.199 | 1359.800 | result_3.2_sx5_256MB_004.shrt | result_3.2_sx5_256MB_004.gz |
2 | 2662.468 | 1331.234 | result_3.2_sx5_256MB_002.shrt | result_3.2_sx5_256MB_002.gz |
Used commands: mpicc -o b_eff -D MEMORY_PER_PROCESSOR=256 b_eff_mthd1+2.c -lm mpirun -np size ./b_eff > result_3.2_sx4_size
On Nov. 9, 1999, on HP-UX hwwhpv B.11.00 A 9000/800, with 1024 MB/PE: The measurement was done while another application was running, but with reduced priority (nice=39).
size | beff MByte/s | beff/size MByte/s | summary | full protocol |
7 | 435.041 | 62.149 | result_3.2_hpv_007c.shrt | result_3.2_hpv_007c.gz |
Used commands: mpicc -o b_eff -D MEMORY_PER_PROCESSOR=1024 b_eff.c -lm mpirun -np size ./b_eff > result_3.2_hpv_size
On Nov. 9, 1999, on HI-UX/MPP hitachi 02-03 0 SR2201, with 256 MB/PE: The measurement was done while another application was running on the other 16 PEs. All PEs were used as dedicated PEs.
size | beff MByte/s | beff/size MByte/s | summary | full protocol |
16 | 527.805 | 32.988 | result_3.2_SR2201_016.shrt | result_3.2_SR2201_016.gz |
8 | 276.903 | 34.613 | result_3.2_SR2201_008.shrt | result_3.2_SR2201_008.gz |
4 | 151.928 | 37.982 | result_3.2_SR2201_004.shrt | result_3.2_SR2201_004.gz |
2 | 80.086 | 40.043 | result_3.2_SR2201_002.shrt | result_3.2_SR2201_002.gz |
Used commands: mpicc -o b_eff -D MEMORY_PER_PROCESSOR=1024 b_eff.c -lm mpirun -n size ./b_eff > result_3.2_SR2201_size
On Nov. 15, 1999, on OSF1 toneb7 V5.0 910 alpha, with 512 MB/PE: The measurement was done while other applications were running on PEs that weren't used by this benchmark. All PEs were used as dedicated PEs.
size | beff MByte/s | beff/size MByte/s | summary | full protocol |
12 | ???.??? | ??.??? | result_3.3_SwissTX1baby_012a.shrt | result_3.3_SwissTX1baby_012a.gz |
8 | 97.497 | 12.187 | result_3.3_SwissTX1baby_008a.shrt | result_3.3_SwissTX1baby_008a.gz |
4 | 49.394 | 12.348 | result_3.3_SwissTX1baby_004a.shrt | result_3.3_SwissTX1baby_004a.gz |
2 | 25.792 | 12.896 | result_3.3_SwissTX1baby_002a.shrt | result_3.3_SwissTX1baby_002a.gz |
Used commands: tnetcc -o b_eff -DMEMORY_PER_PROCESSOR=512 b_eff.c -lm bsub -Is -n size txrun b_eff > result_3.3_SwissTX1baby_size
Pallas Effective Bandwidth Benchmark
MPI at HLRS
HLRS Navigation
HLRS
This page: www.hlrs.de/mpi/b_eff/b_eff_3.2/