Effective I/O Bandwidth (b_{eff_io}) Benchmark

This page refers to the old release b_eff_io version 1.1

The effective I/O bandwidth benchmark (b_eff_io) covers two goals: (1) to achieve a characteristic average number for the I/O bandwidth achievable with parallel MPI-I/O applications, and (2) to get detailed information about several access patterns and buffer lengths. The benchmark examines "first write", "rewrite" and "read" access, strided (individual and shared pointers) and segmented collective patterns on one file per application and non-collective access to one file per process. The number of parallel accessing processes is also varied and wellformed I/O is compared with non-wellformed. On systems, meeting the rule that the total memory can be written to disk in 10 minutes, the benchmark should not need more than 15 minutes for a first pass of all patterns. The benchmark is designed analogously to the effective bandwidth benchmark for message passing (b_eff) that characterizes the message passing capabilities of a system in a few minutes.

Releases:

Release as gzip'ed tar archive: b_eff_io_v1.1.tar.gz
Files of current release: b_eff_io.c, b_eff_io_eps, man page (formatted), man/man1/b_eff_io.1
Helper for b_eff_io_eps: b_eff_io_eps.gnuplot, b_eff_io_eps_on1page.dvi
Source of "on 1 page": b_eff_io_eps_on1page.tex
Old releases: 1.0, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1.

A detailed report with first results of b_eff_io release 0.5 can be obtained here. A summary of this report (.ps.gz) is contributed to the Euro-Par 2000.

Usage

Installation and the first test

Download the tar file of the current release b_eff_io_v1.1.tar.gz
Unpack with: gunzip -c b_eff_io_v1.1.tar.gz | tar -xvf -
Change directory: cd b_eff_io
Compile it: mpicc -o b_eff_io b_eff_io.c -lm
If you are using an old ROMIO without shared file-pointer, then use:
mpicc -o b_eff_io -D WITHOUT_SHARED b_eff_io.c -lm
Test it: mpirun -np 4 ./b_eff_io -MB 256 -MT 1024 -T 30 -p /my/fast/scratch/dir
This means, that you are using 4 MPI processes, you have a system with 256 MBytes memory for each processor, a total memory of 1024 MBytes, and you want to run only a test, scheduled to run at least 30 seconds -- this means, it should run in not more than 2 minutes. This I/O benchmark uses large scratch files. They are stored in /my/fast/scratch/dir.
You will get back:
- on standard output -- the b_eff_io value
- on b_eff_io.sum -- a human readable summary
- on b_eff_io.prot -- the full benchmark protocol
Print the summary, e.g., with: a2ps -C -1 -l 120 b_eff_io.sum
If gnuplot and dvips is available, you can generate some plots from the summary:
b_eff_io_eps 4
Print the summary sheet, e.g., with: lpr b_eff_io_on1page.ps

CAUTION: Because this first test is scheduled with only 30 seconds, the results will never tell you anything about the I/O bandwidth of your system. The test only should tell you whether the benchmark is running on your system.

To get a first b_eff_io impression

Before you start with a realistic schedule time, you should use correct values for the memory sizes and at least 1/8 of the real number of nodes of your system, but still 30 seconds scheduled time, e.g. on a 64 processor system with 32 GB of memory:
mpirun -np 8 ./b_eff_io -MB 512 -MT 32768 -T 30 -p /my/fast/scratch/dir -f my_system_08pe_0030sec
The last option defines the prefix of your protocol files
Now, you can test larger scheduled time frames, e.g. 15 minutes (=900 sec):
mpirun -np 8 ./b_eff_io -MB 512 -MT 32768 -T 900 -p /my/fast/scratch/dir -f my_system_08pe_0900sec
b_eff_io_eps 8 my_system_08pe_0900sec

Choosing optimal parameters

The definition of b_eff_io allows, that you choose an optimal test case, i.e. you are allowed to choose:

the number of MPI processes,
how many processors are used or allocated for each MPI process,
the scheduled time (-T option) under the constraints:
- at least 900 seconds (-T 900), and
- the amount of data written by b_eff_io to the scratch files must be at least 90 percent of the total memory of the system;
the filesystem and filesystem parameters,
parameters of the MPI system.

According to the rules defined above for the scheduled time, nomally you have to increase the value of the -T option, e.g., if your last test with -T 900 (seconds) tells you in my_system_08pe_0900sec.sum:

b_eff_io of these measurements =   58.032 MB/s on 8 processes with 512 MByte/PE and scheduled time=15.0 min

NOT VALID for comparison of different systems
  criterion 1: scheduled time 15.0 min >= 15 min -- reached
  criterion 2: transferred data / total memory =  31.4 % >= 90 % -- NOT reached
  criterion 3: shared file pointers must be used for pattern type 1 -- reached
  criterion 4: error count (0) == 0 -- reached

then you should increase your scheduled time at least to

900 seconds * 90% / 31.4% = 2580 seconds Because normally caching behavior is reduced with larger scheduled time, you should expect worse I/O bandwidth and therefore, in this case, I would propose at least 3300 seconds, i.e.:

mpirun -np 8 ./b_eff_io -MB 512 -MT 32768 -T 3300 -p /my/fast/scratch/dir -f my_system_08pe_3300sec
b_eff_io_eps 8 my_system_08pe_3300sec

Publishing the results

If the content of my_system_08pe_3300sec.sum tells you, that this was a valid measurement, then you can publish this value together with all commands and parameters you have used to run this benchmark and together with the protocol file (my_system_08pe_3300sec.prot) and the summary file (my_system_08pe_3300sec.sum).

The Top Cluster initiative of the TFCC Open Forum has nominated this benchmark for evaluating the I/O performance of clusters (see discussion archive).

It is planned to include the b_eff_io results into the TOPClusters list.

References:

[1]: Karl Solchenbach: Benchmarking the Balance of Parallel Computers. SPEC Workshop on Benchmarking Parallel and High-Performance Computing Systems (copy of the slides), Wuppertal, Germany, Sept. 13, 1999.
[2]

Links

Rolf Rabenseifner

Effective I/O Bandwidth (beff_io) Benchmark