Effective I/O Bandwidth (beff_io) Benchmark
This page refers to the old release b_eff_io version 1.1
The effective I/O bandwidth benchmark (b_eff_io) covers two goals:
(1) to achieve a characteristic average number for the I/O bandwidth
achievable with parallel MPI-I/O applications, and (2) to get
detailed information about several access patterns and buffer lengths.
The benchmark examines "first write", "rewrite" and "read" access,
strided (individual and shared pointers) and segmented collective
patterns on one file per application and non-collective access
to one file per process. The number of parallel accessing processes
is also varied and wellformed I/O is compared with non-wellformed.
On systems, meeting the rule that the total memory can be written to
disk in 10 minutes, the benchmark should not need more than
15 minutes for a first pass of all patterns.
The benchmark is designed analogously to the effective
bandwidth benchmark for message passing (b_eff)
that characterizes the message passing capabilities of a system in
a few minutes.
Releases:
- Release as gzip'ed tar archive:
b_eff_io_v1.1.tar.gz
- Files of current release:
b_eff_io.c,
b_eff_io_eps,
man page (formatted),
man/man1/b_eff_io.1
Helper for b_eff_io_eps:
b_eff_io_eps.gnuplot,
b_eff_io_eps_on1page.dvi
Source of "on 1 page":
b_eff_io_eps_on1page.tex
- Old releases:
1.0,
0.7,
0.6,
0.5,
0.4,
0.3,
0.2,
0.1.
A detailed report with first results of b_eff_io release 0.5
can be obtained
here.
A summary of this report
(.ps.gz)
is contributed to the
Euro-Par 2000.
Usage
Installation and the first test
- Download the tar file of the current release
b_eff_io_v1.1.tar.gz
- Unpack with: gunzip -c b_eff_io_v1.1.tar.gz | tar -xvf -
- Change directory: cd b_eff_io
- Compile it:
mpicc -o b_eff_io b_eff_io.c -lm
If you are using an old ROMIO without shared file-pointer, then use:
mpicc -o b_eff_io -D WITHOUT_SHARED b_eff_io.c -lm
- Test it:
mpirun -np 4 ./b_eff_io -MB 256 -MT 1024 -T 30 -p /my/fast/scratch/dir
This means, that you are using 4 MPI processes,
you have a system with 256 MBytes memory for each processor,
a total memory of 1024 MBytes,
and you want to run only a test, scheduled to run at least 30
seconds -- this means, it should run in not more than 2 minutes.
This I/O benchmark uses large scratch files. They are stored
in /my/fast/scratch/dir.
You will get back:
- on standard output -- the b_eff_io value
- on b_eff_io.sum -- a human readable summary
- on b_eff_io.prot -- the full benchmark protocol
- Print the summary, e.g., with:
a2ps -C -1 -l 120 b_eff_io.sum
- If gnuplot and dvips is available,
you can generate some plots from the summary:
b_eff_io_eps 4
- Print the summary sheet, e.g., with:
lpr b_eff_io_on1page.ps
CAUTION: Because this first test is scheduled with only
30 seconds, the results will never tell you
anything about the I/O bandwidth of your system.
The test only should tell you whether the benchmark is
running on your system.
To get a first b_eff_io impression
- Before you start with a realistic schedule time,
you should use correct values for the memory sizes and at least
1/8 of the real number of nodes of your system, but still
30 seconds scheduled time,
e.g. on a 64 processor system with 32 GB of memory:
mpirun -np 8 ./b_eff_io -MB 512 -MT 32768 -T 30 -p /my/fast/scratch/dir -f my_system_08pe_0030sec
The last option defines the prefix of your protocol files
- Now, you can test larger scheduled time frames, e.g. 15 minutes (=900 sec):
mpirun -np 8 ./b_eff_io -MB 512 -MT 32768 -T 900 -p /my/fast/scratch/dir -f my_system_08pe_0900sec
b_eff_io_eps 8 my_system_08pe_0900sec
Choosing optimal parameters
The definition of b_eff_io allows, that you choose an optimal
test case, i.e. you are allowed to choose:
- the number of MPI processes,
- how many processors are used or allocated for each MPI process,
- the scheduled time (-T option) under the constraints:
- at least 900 seconds (-T 900), and
- the amount of data written by b_eff_io to the scratch files
must be at least 90 percent of the total memory of the system;
- the filesystem and filesystem parameters,
- parameters of the MPI system.
According to the rules defined above for the scheduled time,
nomally you have to increase the value of the -T option, e.g.,
if your last test with -T 900 (seconds)
tells you in my_system_08pe_0900sec.sum:
b_eff_io of these measurements = 58.032 MB/s on 8 processes with 512 MByte/PE and scheduled time=15.0 min
NOT VALID for comparison of different systems
criterion 1: scheduled time 15.0 min >= 15 min -- reached
criterion 2: transferred data / total memory = 31.4 % >= 90 % -- NOT reached
criterion 3: shared file pointers must be used for pattern type 1 -- reached
criterion 4: error count (0) == 0 -- reached
then you should increase your scheduled time at least to
900 seconds * 90% / 31.4% = 2580 seconds
Because normally caching behavior is reduced with larger scheduled time,
you should expect worse I/O bandwidth and therefore, in this case,
I would propose at least 3300 seconds, i.e.:
- mpirun -np 8 ./b_eff_io -MB 512 -MT 32768 -T 3300 -p /my/fast/scratch/dir -f my_system_08pe_3300sec
- b_eff_io_eps 8 my_system_08pe_3300sec
Publishing the results
If the content of my_system_08pe_3300sec.sum tells you,
that this was a valid measurement, then you can publish this
value together with all commands and parameters you have used
to run this benchmark and together with the protocol file
(my_system_08pe_3300sec.prot)
and the summary file
(my_system_08pe_3300sec.sum).
The
Top Cluster initiative
of the
TFCC Open Forum
has nominated this benchmark for evaluating the I/O performance of
clusters (see discussion
archive).
It is planned to include the b_eff_io results into the
TOPClusters list.
References:
- [1]
- Karl Solchenbach:
Benchmarking the Balance of Parallel Computers.
SPEC Workshop on Benchmarking Parallel and High-Performance
Computing Systems (copy of the slides),
Wuppertal, Germany, Sept. 13, 1999.
- [2]
- Karl Solchenbach,
Hans-Joachim Plum and Gero Ritzenhoefer:
Pallas Effective Bandwidth Benchmark
- source code and sample results
(
EFF_BW.tar.gz, 43 KB)
- [3]
- Rolf Hempel:
Basic message passing benchmarks, methodology and pitfalls.
SPEC Workshop on Benchmarking Parallel and High-Performance
Computing Systems
(copy of the slides),
Wuppertal, Germany, Sept. 13, 1999.
- [4]
- William Gropp and Ewing Lusk:
Reproducible Measurement of MPI Performance Characteristics.
In J. Dongarra et al. (eds.), Recent Advances in Parallel
Virtual Machine and Message Passing Interface,
proceedings of the
6th European PVM/MPI Users' Group Meeting, EuroPVM/MPI'99,
Barcelona, Spain, Sept. 26-29, 1999, LNCS 1697, pp 11-18.
(Summary
on the web)
Links
Rolf Rabenseifner