MADbench2
MADbench2是一款用于测试大规模并行架构的I/O、通信和计算子系统在真实科学应用压力下的综合性能的工具。
MADbench2 基于 MADspec 代码,该代码根据天空的噪声像素化图及其像素-像素噪声相关矩阵计算宇宙微波背景辐射的最大似然角功率谱。MADbench2 保留了其父科学应用程序代码的全部计算复杂性,但使用自行生成的伪数据来允许绕过与处理真实 CMB 数据集相关的无数计算上不相关的细节。
MADbench2 可以以两种模式运行:
- regular mode, in which the full code is run.
- IO mode, in which all calculation/communication is replaced with busy-work.
此外,MADbench2 可以作为单组或多组运行;在前者中,所有矩阵运算都分布在所有处理器上执行,而在后者中,矩阵在所有处理器(S&D)上构建、求和和求逆,然后重新分布在处理器子集(组)上他们随后的操作(W&C)。即使处理器数量非常多,这种组并行性也允许在占主导地位的矩阵-矩阵乘法 (W) 阶段的处理器上保持数据密集。
官网
https://crd.lbl.gov/divisions/scidata/c3/c3-research/madbench2/
下载
https://crd.lbl.gov/assets/Uploads/MADbench2.tar
编译
要在常规模式下运行,MADbench2 需要链接到 ScaLAPACK 和 LAPACK 库及其依赖项(BLAS、PBLAS、BLACS)。 MADbench2.h 文件包含系统特定的定义和声明;该文件应根据需要进行扩充,并使用 -D SYSTEM 编译代码。
要在 IO 模式下运行,MADbench2 应使用 -D IO(除了 -D SYSTEM 之外)进行编译,然后所有库调用都被重新定义为繁忙工作,以便不需要任何库。
bash
mpicc -D SYSTEM -D COLUMBIA -D IO -o MADbench2.x MADbench2.c -lm
修改文件系统路径
这是源代码中的固定值。因此,在编译之前,请确保修改该MADbench2.c文件(第 271、275 和 276 行):
bash
for (n=0; n<no_pe; n++) {
if (my_pe==n && stat("files", &buf)!=0) mkdir("/mnt/gkfs/files", S_IRWXU);
PMPI_Barrier(MPI_COMM_WORLD);
}
if (strcmp(FILETYPE, "UNIQUE")==0) sprintf(filename, "/mnt/gkfs/files/data_%d", my_pe);
else sprintf(filename, "/mnt/gkfs/files/data");
运行
命令行参数:
bash
MADbench2.x $NO_PIX $NO_BIN $NO_GANG $SBLOCKSIZE $FBLOCKSIZE $RMOD $WMOD
NO_PIX | Sets the size of the pseudo-data - all the component matrices have NO_PIX x NO_PIX elements |
NO_BIN | Sets the size of the pseudo-dataset - there are NO_BIN component matrices |
NO_GANG | Sets the level of gang-parallelism - there are NO_GANG gangs |
SBLOCKSIZE | Sets the ScaLAPACK blocksize - all matrices will be block-cycically distributed with side SBLOCKSIZE. |
FBLOCKSIZE | Sets the file blocksize - all IO will start at a file-offset that is an integer multiple of FBLOCKSIZE. |
RMOD | Sets the degree of simultaneous reading - 1:RMOD processors will read at once. |
WMOD | Sets the degree of simultaneous writing - 1:WMOD processors will write at once. |
运行MADbench2要求:
- a square number of processors
- a uniform square number of processors per gang
- a uniform number of bins per gang
- a scalapack blocksize that distributes some data to every processor
- a file blocksize that is a whole number of doubles
- a number of gangs that is exactly divisible by the read-modulus and the write-modulus
bash
fakerth@fakerth-IdeaCentre-GeekPro-17IRB:~$ mpirun -np 4 MADbench2.x 640 80 1 8 8 4 4
MADbench 2.0 IO-mode
no_pe = 4 no_pix = 640 no_bin = 80 no_gang = 1 sblocksize = 8 fblocksize = 8 r_mod = 4 w_mod = 4
IOMETHOD = POSIX IOMODE = SYNC FILETYPE = UNIQUE REMAP = CUSTOM
S_cc 0.00 [ 0.00: 0.00]
S_bw 0.01 [ 0.01: 0.01]
S_w 0.09 [ 0.09: 0.09]
-------
S_total 0.09 [ 0.09: 0.09]
W_cc 0.01 [ 0.01: 0.01]
W_bw 3.96 [ 3.96: 3.96]
W_r 0.03 [ 0.03: 0.03]
W_w 0.03 [ 0.03: 0.03]
-------
W_total 4.03 [ 4.03: 4.03]
C_cc 0.00 [ 0.00: 0.00]
C_bw 0.01 [ 0.01: 0.01]
C_r 0.05 [ 0.05: 0.05]
-------
C_total 0.06 [ 0.06: 0.06]
dC[0] = 0.00000e+00
环境变量
Variable | Allowed Values | Default |
---|---|---|
IOMETHOD | POSIX, MPI | POSIX |
IOMODE | SYNC, ASYNC | SYNC |
FILETYPE | UNIQUE, SHARED | UNIQUE |
REMAP | CUSTOM, SCALAPACK | CUSTOM |
BWEXP | Any number | None |
比如我们要使用mpiio,注意是MPI,不是MPIIO,export IOMETHOD=MPIIO会报错。
bash
fakerth@fakerth-IdeaCentre-GeekPro-17IRB:~$ export IOMETHOD=MPI
fakerth@fakerth-IdeaCentre-GeekPro-17IRB:~$ mpirun -np 4 MADbench2.x 640 80 1 8 8 4 4
MADbench 2.0 IO-mode
no_pe = 4 no_pix = 640 no_bin = 80 no_gang = 1 sblocksize = 8 fblocksize = 8 r_mod = 4 w_mod = 4
IOMETHOD = MPI IOMODE = SYNC FILETYPE = UNIQUE REMAP = CUSTOM
S_cc 0.00 [ 0.00: 0.00]
S_bw 0.01 [ 0.01: 0.01]
S_w 0.09 [ 0.09: 0.09]
-------
S_total 0.10 [ 0.10: 0.10]
W_cc 0.01 [ 0.01: 0.01]
W_bw 3.96 [ 3.96: 3.96]
W_r 0.03 [ 0.03: 0.03]
W_w 0.03 [ 0.03: 0.03]
-------
W_total 4.03 [ 4.03: 4.03]
C_cc 0.00 [ 0.00: 0.00]
C_bw 0.01 [ 0.01: 0.01]
C_r 0.03 [ 0.03: 0.03]
-------
C_total 0.04 [ 0.04: 0.04]
dC[0] = 0.00000e+00