一个计算密集小程序在不同CPU下的表现

本文比较了几款CPU对同一测试程序的比较结果,用的是Oracle公有云OCI上的计算实例,均分配的1 OCPU,内存用的默认值,不过内存对此测试程序运行结果不重要。

本文只列结果,不做任何评价。下表中,最后一列为测试程序运行5次的平均耗时。

OCI shape名称 CPU 型号 基本频率(GHz) 测试程序运行耗时平均值(秒)
VM.Standard3.Flex Intel Xeon Platinum 8358 2.6 135.084
VM.Optimized3 Intel Xeon 6354 3.0 123.65
VM.Standard.E4.Flex AMD EPYC 7J13 2.55 62.766
VM.Standard.E5.Flex AMD EPYC 7J13 2.4 53.22
VM.Standard.A1.Flex Ampere Altra Q80-30 3.0 107.206

测试程序:

c 复制代码
#include <stdio.h>
#include <math.h>

void main()
{
        double r;
        int i, j;
        for (i=0; i< 100000; i++)
                for (j=0; j< 100000; j++)
                        r = r + sqrt(sqrt(i));

}

编译:

bash 复制代码
cc -lm a.c

test.sh运行a.out 5次:

bash 复制代码
for i in 1 2 3 4 5; do
        time -p ./a.out
done

求平均值可以将以上输出存于临时文件,例如/tmp/1,然后运行一下:

bash 复制代码
cat /tmp/1|grep real|sed  's/real //'|awk '{s+=$1} END {print s/5}'

Intel Xeon Platinum 8358

bash 复制代码
$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
Stepping:            6
CPU MHz:             2594.024
BogoMIPS:            5188.04
Virtualization:      VT-x
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            4096K
L3 cache:            16384K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cm                                                       ov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm consta                                                       nt_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq v                                                       mx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer                                                        aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invp                                                       cid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid e                                                       pt_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdse                                                       ed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveop                                                       t xsavec xgetbv1 xsaves nt_good wbnoinvd arat vnmi avx512vbmi umip pku ospke avx                                                       512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 r                                                       dpid md_clear arch_capabilities

测试结果:

bash 复制代码
$ ./test.sh
real 135.08
user 134.69
sys 0.00
real 135.04
user 134.67
sys 0.00
real 135.14
user 134.67
sys 0.02
real 135.10
user 134.68
sys 0.00
real 135.06
user 134.69
sys 0.00

通过grep real|sed 's/real //'可以得到所有real time统计:

复制代码
135.08
135.04
135.14
135.10
135.06

直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为135.084

Intel Xeon 6354

bash 复制代码
$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz
Stepping:            6
CPU MHz:             2993.064
BogoMIPS:            5986.12
Virtualization:      VT-x
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            4096K
L3 cache:            16384K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt_good wbnoinvd arat vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm md_clear arch_capabilities

测试结果:

bash 复制代码
$ ./test.sh
real 123.69
user 123.40
sys 0.00
real 123.66
user 123.37
sys 0.00
real 123.65
user 123.38
sys 0.00
real 123.62
user 123.38
sys 0.00
real 123.63
user 123.38
sys 0.01

通过grep real|sed 's/real //'可以得到所有real time统计:

复制代码
123.69
123.66
123.65
123.62
123.63

直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为123.65

AMD EPYC 7J13

bash 复制代码
$ ./test.sh
real 60.26
user 60.25
sys 0.00
real 60.51
user 60.50
sys 0.00
real 64.45
user 64.44
sys 0.00
real 67.76
user 66.29
sys 0.13
real 60.85
user 60.80
sys 0.00

测试结果:

bash 复制代码
$ ./test.sh
real 60.26
user 60.25
sys 0.00
real 60.51
user 60.50
sys 0.00
real 64.45
user 64.44
sys 0.00
real 67.76
user 66.29
sys 0.13
real 60.85
user 60.80
sys 0.00

通过grep real|sed 's/real //'可以得到所有real time统计:

复制代码
60.26
60.51
64.45
67.76
60.85

直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为62.766

AMD EPYC 9J14

bash 复制代码
$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          25
Model:               17
Model name:          AMD EPYC 9J14 96-Core Processor
Stepping:            1
CPU MHz:             2596.100
BogoMIPS:            5192.20
Virtualization:      AMD-V
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
L3 cache:            16384K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt_good avx512_bf16 clzero xsaveerptr wbnoinvd arat npt nrip_save avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid arch_capabilities

测试结果:

bash 复制代码
$ ./test.sh
real 52.63
user 52.62
sys 0.00
real 53.29
user 53.19
sys 0.00
real 52.13
user 52.12
sys 0.00
real 52.28
user 52.27
sys 0.00
real 55.77
user 54.79
sys 0.01

通过grep real|sed 's/real //'可以得到所有real time统计:

复制代码
52.63
53.29
52.13
52.28
55.77

直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为53.22

Ampere Altra Q80-30

bash 复制代码
$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              1
On-line CPU(s) list: 0
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           ARM
Model:               1
Model name:          Neoverse-N1
Stepping:            r3p1
BogoMIPS:            50.00
L1d cache:           unknown size
L1i cache:           unknown size
L2 cache:            unknown size
NUMA node0 CPU(s):   0
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

测试结果:

bash 复制代码
$ ./test.sh
real 113.46
user 103.23
sys 0.23
real 103.77
user 103.02
sys 0.03
real 109.15
user 103.01
sys 0.14
real 105.11
user 103.29
sys 0.02
real 104.54
user 103.06
sys 0.02

通过grep real|sed 's/real //'可以得到所有real time统计:

复制代码
113.46
103.77
109.15
105.11
104.54

直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为107.206

参考

相关推荐
下一页盛夏花开15 分钟前
ubuntu 20中安装QT以后出现红色空心断点
linux·运维·ubuntu
sanshanjianke1 小时前
Thunderobot 911ME 笔记本 Linux 风扇控制研究
linux
fengyehongWorld4 小时前
TeraTerm ttl脚本登录wsl
linux·teraterm
乌托邦的逃亡者4 小时前
Linux中如何检测IP冲突
linux·运维·tcp/ip
一曦的后花园4 小时前
linux搭建promethes并对接node-exporter指标
linux·运维·服务器
乌托邦的逃亡者5 小时前
CentOS/Openeuler主机中,为一个网卡设置多个IP地址
linux·运维·网络·tcp/ip·centos
念恒123066 小时前
进程控制---自定义Shell
linux·c语言
风曦Kisaki6 小时前
# Linux Shell 编程入门 Day02:条件测试、if 判断、循环与随机数
linux·运维·chrome
李日灐6 小时前
< 6 > Linux 自动化构建工具:makefile 详解 + 进度条实战小项目
linux·运维·服务器·后端·自动化·进度条·makefile
v_JULY_v7 小时前
ARM——用于长时序操作的优势奖励建模:采用三态标注策略(前进/后退/停滞),实现对相对优势的估计(含SARM详解)
arm·优势奖励建模·三态标注策略·相对优势的估计·sarm·阶段感知奖励建模·ra-bc