Effective Bandwidth (beff) Benchmark - Background


Background


First Results

T3E-900/512 (sn6715 hwwt3e 2.0.4.71 unicosmk CRAY T3E)

The benchmarks were done on the same partitions starting at BasePE 0xb8. At least three other applications were running on the system: 120 PEs at 0x0, 64 PEs at 0x78, and 8 PEs at 0x1f8.

PEs  b_eff          bi-section   bi-section
     version 3.1    even nodes   all nodes (extrapolated)

 32  1869.713       1253.264     1253.264   result_3.1_032a
 33  1786.342       1213.136     1251.046   result_3.1_033a
 34  1896.591       1370.455     1370.455   result_3.1_034a
 35  1873.812       1344.275     1383.812   result_3.1_035 
  "  1823.632       1338.750     1378.125   result_3.1_035a
 36  1780.339       1368.684     1368.684   result_3.1_036 
  "  1825.823       1371.348     1371.348   result_3.1_036a
 37  1868.371       1335.060     1372.145   result_3.1_037 
  "  1855.221       1332.900     1369.925   result_3.1_037a
 38  1842.249       1442.594     1442.594   result_3.1_038 
  "  1838.336       1447.135     1447.135   result_3.1_038a
 39  1891.028       1406.456     1443.468   result_3.1_039 
  "  1806.554       1405.525     1442.512   result_3.1_039a
 40  1834.020       1460.460     1460.460   result_3.1_040 
 41  1920.499       1422.660     1458.226   result_3.1_041 
 42  2019.678       1596.252     1596.252   result_3.1_042 
 43  2052.604       1561.686     1598.869   result_3.1_043 
 44  2223.587       1675.564     1675.564   result_3.1_044 
 45  2127.454       1631.718     1668.802   result_3.1_045 
 46  2205.946       1865.806     1865.806   result_3.1_046 
 47  2211.938       1836.090     1876.005   result_3.1_047 
 48  2070.615       1861.824     1861.824   result_3.1_048 
  "  2081.355       1872.696     1872.696   result_3.1_048a
128  5225.635       4857.472     4857.472   result_3.1_128a
256  9709.549       8268.800     8268.800   result_3.1_256a

The results show that beff version 3.1 is not monotonic (see 44-48 PEs). Further investigation on a dedicated sytem can show, whether this is an artefact due to the non-zero basePE.

SX-4/32 (SUPER-UX hwwsx4 9.1 Rev1 SX-4)

The benchmarks were done on a 8 and 16 processors dedicated to the beff program by a resource block. The other CPUs were used for other applications at the same time.
PEs  b_eff          bi-section   bi-section
     version 3.1    even nodes   all nodes (extrapolated)

  8  5161.896       3215.200     3215.200   result_sx4_3.1_008
 16  9996.344       6368.176     6368.176   result_sx4_3.1_016

These results show clearly, that the former bi-section benchmark could not make full use of the communication network.


Links

UP     Effective Bandwidth Benchmark     Pallas Effective Bandwidth Benchmark     MPI at HLRS     HLRS Navigation     HLRS    

Rolf Rabenseifner