Effective Bandwidth (beff) Benchmark - Implementation Details
Implementation Details of beff (version 3.3)
- The measurements are done in nested loops:
loop over all repetions
loop over all message lengths
loop over all patterns
loop over all methods
a barrier
one message exchange (except for the 3 largest message lengths)
time_begin = MPI_Wtime()
loop over looplength
time_end = MPI_Wtime()
- Two buffers are used, one for sending and the other for receiving
the messages, each of size 6*(Lmax+16).
- The messages are allocated cyclically in the buffers to
inhibit reading from the cache.
- Before sending a message, a value is assigned to the first and
last byte. After receiving a message, always these two bytes are
checked. If an error is detected the benchmark is aborted
immediately and a detailed error report is written to standard
output.
- If the estimation for the execution time for the two longest
message lengths will be more than 60 seconds, then
only the fastest method (based on the third longest
message length) is used and the number of repetitions is reduced.
- On a system with a latency less than 50 micro sec
and nearly same bandwidth for all pattern and methods,
the total execution time is about
1..4 * max( 60 sec, 3*MEMORY_PER_PROCESSOR/asymptotic bandwidth )
Links
UP
Effective Bandwidth Benchmark
Pallas Effective Bandwidth Benchmark
MPI at HLRS
HLRS Navigation
HLRS
Rolf Rabenseifner