Effective I/O Bandwidth (beff_io) Benchmark
This page refers to the old release b_eff_io version 1.3
The effective I/O bandwidth benchmark (b_eff_io) covers two goals:
(1) to achieve a characteristic average number for the I/O bandwidth
achievable with parallel MPI-I/O applications, and (2) to get
detailed information about several access patterns and buffer lengths.
The benchmark examines "first write", "rewrite" and "read" access,
strided (individual and shared pointers) and segmented collective
patterns on one file per application and non-collective access
to one file per process. The number of parallel accessing processes
is also varied and wellformed I/O is compared with non-wellformed.
On systems, meeting the rule that the total memory can be written to
disk in 10 minutes, the benchmark should not need more than
15 minutes for a first pass of all patterns.
The benchmark is designed analogously to the effective
bandwidth benchmark for message passing (b_eff)
that characterizes the message passing capabilities of a system in
a few minutes.
The latest releases:
- Current release as gzip'ed tar archive:
b_eff_io_v1.3.tar.gz
- Files of this release:
b_eff_io.c,
b_eff_io_eps,
man page (formatted),
man/man1/b_eff_io.1
Helper for b_eff_io_eps:
b_eff_io_eps.gnuplot,
b_eff_io_eps_on1page.dvi
Source of "on 1 page":
b_eff_io_eps_on1page.tex
- Old releases:
1.2,
1.1,
1.0,
0.7,
0.6,
0.5,
0.4,
0.3,
0.2,
0.1.
A detailed report with first results of b_eff_io release 0.5
can be obtained
here.
Latest publications:
Rolf Rabenseifner and
Alice E. Koniges:
Effective File-I/O Bandwidth Benchmark.
In proceedings,
Arndt Bode, Thomas Ludwig, Roland Wismüller (editors),
Euro-Par
2000 -- Parallel Processing,
Aug. 29 - Sept. 1, 2000,
München, Germany,
LNCS 1900,
pp 1273-1283.
Files:
full paper as
PDF and
gzip'ed postscript,
slides as
gzip'ed postscript.
Rolf Rabenseifner and
Alice E. Koniges:
The Effective I/O Bandwidth Benchmark (b_eff_io).
In proceedings of the
Message Passing Interface and High-Performance
Clusters Developer's and User's Conference
(MPIDC 2000),
March 20-23, 2000, Ithaca, NY, USA.
Files:
full paper as
US paper, gzip'ed postscript,
A4 gzip'ed postscript;
slides as
gzip'ed postscript.
Usage
Installation and the first test
- Download the tar file of the current release
b_eff_io_v1.3.tar.gz
- Unpack with: gunzip -c b_eff_io_v1.3.tar.gz | tar -xvf -
- Change directory: cd b_eff_io
- Compile it:
mpicc -o b_eff_io b_eff_io.c -lm
If you are using an old ROMIO without shared file-pointer, then you
can use for a first test:
mpicc -o b_eff_io -D WITHOUT_SHARED b_eff_io.c -lm
But,
to achieve valid b_eff_io results,
you have to install an MPI library that allows shared file-pointers.
- Test it:
mpirun -np 4 ./b_eff_io -MB 256 -MT 1024 -T 30 -p /my/fast/scratch/dir
This means, that you are using 4 MPI processes,
you have a system with 256 MBytes memory for each processor,
a total memory of 1024 MBytes,
and you want to run only a test, scheduled to run at least 30
seconds -- this means, it should run in not more than 2 minutes.
This I/O benchmark uses large scratch files. They are stored
in /my/fast/scratch/dir.
You will get back:
- on standard output -- the b_eff_io value
- on b_eff_io.sum -- a human readable summary
- on b_eff_io.prot -- the full benchmark protocol
- Print the summary, e.g., with:
a2ps -C -1 -l 120 b_eff_io.sum
- If there are serious problems, e.g., the benchmark took more than
2 minutes on a dedicated system, then please, feel free to contact
the author
and send him all your commands and attach the files
b_eff_io.sum and b_eff_io.prot.
- If gnuplot and dvips is available,
you can generate some plots from the summary:
b_eff_io_eps 4
- Print the summary sheet, e.g., with:
lpr b_eff_io_on1page.ps
CAUTION: Because this first test is scheduled with only
30 seconds, the results will never tell you
anything about the I/O bandwidth of your system.
The test only should tell you whether the benchmark is
running on your system.
To get a first b_eff_io impression
- Before you start with a realistic schedule time,
you should use correct values for the memory sizes and at least
1/8 of the real number of nodes of your system, but still
30 seconds scheduled time,
e.g. on a 64 processor system with 32 GB of memory:
mpirun -np 8 ./b_eff_io -MB 512 -MT 32768 -T 30 -p /my/fast/scratch/dir -f my_system_08pe_0030sec
The last option defines the prefix of your protocol files
- Now, you can test larger scheduled time frames, e.g. 15 minutes (=900 sec):
mpirun -np 8 ./b_eff_io -MB 512 -MT 32768 -T 900 -p /my/fast/scratch/dir -f my_system_08pe_0900sec
b_eff_io_eps 8 my_system_08pe_0900sec
- #MPROC is used in the next section to abbreviate the memory size
of each MPI process (in MBytes), e.g., in our example #MPROC=512.
- #MTOTAL is used to abbreviate the total memory size of the system
(in MBytes), e.g., in our example #MTOTAL=32768.
The official execution of b_eff_io
The b_eff_io benchmark has to be done with three mandatory
and one optional number of processes:
- First, #MPI_PROC_PER_SMP_NODE must be chosen by the
person who makes the benchmark. This number is the number
of MPI processes that should run on each SMP node.
This number is tunable and must not be more than the
number of processors of each SMP node.
On systems that are not a cluster of SMP nodes,
#MPI_PROC_PER_SMP_NODE is 1.
- Next, #NODES_FULL is defined as the number of SMP nodes
of the system.
On systems that are not a cluster of SMP nodes,
#NODES_FULL is the number of processors available for
parallel computation.
- Thus you have to compute the following numbers:
#NODES_MEDIUM = 2 ** ( round ( log_2(#NODES_FULL) * 0.70 ) )
#NODES_SMALL = 2 ** ( round ( log_2(#NODES_FULL) * 0.35 ) )
Examples: The following table shows the MID and SMALL values
for given FULL values
#NODES_FULL = 2048 1024 512 256 128 64 32 16 8 4 2 1
==> #NODES_MEDIUM = 256 128 64 64 32 16 16 8 4 2 2 1
==> #NODES_SMALL = 16 16 8 8 4 4 4 2 2 2 1 1
- And last, you can freely choose #NODES_TUNE as the
number of nodes that gives the best b_eff_io value.
- Each of these b_eff_io measurements should be done with
30 minutes scheduled time (-T 1800),
i.e., we would like to see the results of
mpirun 'on #NODES_SMALL with #MPI_PROC_PER_SMP_NODE' \
./b_eff_io -T 1800 -MB #MPROC -MT #MTOTAL -T 1800 \
-p /my/fast/scratch/dir \
-f my_system_#NODES_SMALL_1800sec
mpirun 'on #NODES_MEDIUM with #MPI_PROC_PER_SMP_NODE' \
./b_eff_io -T 1800 -MB #MPROC -MT #MTOTAL -T 1800 \
-p /my/fast/scratch/dir \
-f my_system_#NODES_MEDIUM_1800sec
mpirun 'on #NODES_FULL with #MPI_PROC_PER_SMP_NODE' \
./b_eff_io -T 1800 -MB #MPROC -MT #MTOTAL -T 1800 \
-p /my/fast/scratch/dir \
-f my_system_#NODES_FULL_1800sec
mpirun 'on #NODES_TUNED with #MPI_PROC_PER_SMP_NODE' \
./b_eff_io -T 1800 -MB #MPROC -MT #MTOTAL -T 1800 \
-p /my/fast/scratch/dir \
-f my_system_#NODES_TUNED_1800sec
#NODES_TUNED may be chosen as one of
the three other values that has the best b_eff_io result
(i.e. #NODES_TUNED is optional).
Filesystem parameters should be chosen as for normal users.
The size of data written to /my/fast/scratch/dir
by each of these four benchmarks is about
real_execution_time * accumulated_write_bandwidth / 3.
The real execution time may differ from the scheduled time (30 min.)
due to following reasons:
- The sync operation is outside of the time-driven loop
and may consume time after the scheduled iterations.
- The loop is finished by an iteration that con sumed
much more time than the previous iterations.
- The pattern types 3 (segmented) and 4 (seg-coll)
are not time-driven. The estimation for adequate
repeating factors is based on results with pattern
types 0-2. This estimation may be to high if the
implementation of pattern types 3 and 4 is worse
than that of pattern type 0 and 2.
- The same reason is valid for all patterns with the access
methods "rewrite" and "read".
Publishing the results
Now, you can publish these four b_eff_io values together
with all commands and parameters you have used
to run these benchmarks and together with the protocol files
(my_system_..._1800sec.prot)
and the summary files
(my_system_..._1800sec.sum).
The
Top Cluster initiative
of the
TFCC Open Forum
has nominated this benchmark for evaluating the I/O performance of
clusters (see discussion
archive).
It is planned to include the b_eff_io results into the
TOPClusters list.
References:
- [1]
- Karl Solchenbach:
Benchmarking the Balance of Parallel Computers.
SPEC Workshop on Benchmarking Parallel and High-Performance
Computing Systems (copy of the slides),
Wuppertal, Germany, Sept. 13, 1999.
- [2]
- Karl Solchenbach,
Hans-Joachim Plum and Gero Ritzenhoefer:
Pallas Effective Bandwidth Benchmark
- source code and sample results
(
EFF_BW.tar.gz, 43 KB)
- [3]
- Rolf Hempel:
Basic message passing benchmarks, methodology and pitfalls.
SPEC Workshop on Benchmarking Parallel and High-Performance
Computing Systems
(copy of the slides),
Wuppertal, Germany, Sept. 13, 1999.
- [4]
- William Gropp and Ewing Lusk:
Reproducible Measurement of MPI Performance Characteristics.
In J. Dongarra et al. (eds.), Recent Advances in Parallel
Virtual Machine and Message Passing Interface,
proceedings of the
6th European PVM/MPI Users' Group Meeting, EuroPVM/MPI'99,
Barcelona, Spain, Sept. 26-29, 1999, LNCS 1697, pp 11-18.
(Summary
on the web)
Links
Rolf Rabenseifner