一个计算密集小程序在不同CPU下的表现

本文比较了几款CPU对同一测试程序的比较结果,用的是Oracle公有云OCI上的计算实例,均分配的1 OCPU,内存用的默认值,不过内存对此测试程序运行结果不重要。

本文只列结果,不做任何评价。下表中,最后一列为测试程序运行5次的平均耗时。

OCI shape名称 CPU 型号 基本频率(GHz) 测试程序运行耗时平均值(秒)
VM.Standard3.Flex Intel Xeon Platinum 8358 2.6 135.084
VM.Optimized3 Intel Xeon 6354 3.0 123.65
VM.Standard.E4.Flex AMD EPYC 7J13 2.55 62.766
VM.Standard.E5.Flex AMD EPYC 7J13 2.4 53.22
VM.Standard.A1.Flex Ampere Altra Q80-30 3.0 107.206

测试程序:

c 复制代码
#include <stdio.h>
#include <math.h>

void main()
{
        double r;
        int i, j;
        for (i=0; i< 100000; i++)
                for (j=0; j< 100000; j++)
                        r = r + sqrt(sqrt(i));

}

编译:

bash 复制代码
cc -lm a.c

test.sh运行a.out 5次:

bash 复制代码
for i in 1 2 3 4 5; do
        time -p ./a.out
done

求平均值可以将以上输出存于临时文件,例如/tmp/1,然后运行一下:

bash 复制代码
cat /tmp/1|grep real|sed  's/real //'|awk '{s+=$1} END {print s/5}'

Intel Xeon Platinum 8358

bash 复制代码
$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
Stepping:            6
CPU MHz:             2594.024
BogoMIPS:            5188.04
Virtualization:      VT-x
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            4096K
L3 cache:            16384K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cm                                                       ov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm consta                                                       nt_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq v                                                       mx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer                                                        aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invp                                                       cid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid e                                                       pt_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdse                                                       ed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveop                                                       t xsavec xgetbv1 xsaves nt_good wbnoinvd arat vnmi avx512vbmi umip pku ospke avx                                                       512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 r                                                       dpid md_clear arch_capabilities

测试结果:

bash 复制代码
$ ./test.sh
real 135.08
user 134.69
sys 0.00
real 135.04
user 134.67
sys 0.00
real 135.14
user 134.67
sys 0.02
real 135.10
user 134.68
sys 0.00
real 135.06
user 134.69
sys 0.00

通过grep real|sed 's/real //'可以得到所有real time统计:

135.08
135.04
135.14
135.10
135.06

直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为135.084

Intel Xeon 6354

bash 复制代码
$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz
Stepping:            6
CPU MHz:             2993.064
BogoMIPS:            5986.12
Virtualization:      VT-x
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            4096K
L3 cache:            16384K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt_good wbnoinvd arat vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm md_clear arch_capabilities

测试结果:

bash 复制代码
$ ./test.sh
real 123.69
user 123.40
sys 0.00
real 123.66
user 123.37
sys 0.00
real 123.65
user 123.38
sys 0.00
real 123.62
user 123.38
sys 0.00
real 123.63
user 123.38
sys 0.01

通过grep real|sed 's/real //'可以得到所有real time统计:

123.69
123.66
123.65
123.62
123.63

直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为123.65

AMD EPYC 7J13

bash 复制代码
$ ./test.sh
real 60.26
user 60.25
sys 0.00
real 60.51
user 60.50
sys 0.00
real 64.45
user 64.44
sys 0.00
real 67.76
user 66.29
sys 0.13
real 60.85
user 60.80
sys 0.00

测试结果:

bash 复制代码
$ ./test.sh
real 60.26
user 60.25
sys 0.00
real 60.51
user 60.50
sys 0.00
real 64.45
user 64.44
sys 0.00
real 67.76
user 66.29
sys 0.13
real 60.85
user 60.80
sys 0.00

通过grep real|sed 's/real //'可以得到所有real time统计:

60.26
60.51
64.45
67.76
60.85

直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为62.766

AMD EPYC 9J14

bash 复制代码
$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          25
Model:               17
Model name:          AMD EPYC 9J14 96-Core Processor
Stepping:            1
CPU MHz:             2596.100
BogoMIPS:            5192.20
Virtualization:      AMD-V
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
L3 cache:            16384K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt_good avx512_bf16 clzero xsaveerptr wbnoinvd arat npt nrip_save avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid arch_capabilities

测试结果:

bash 复制代码
$ ./test.sh
real 52.63
user 52.62
sys 0.00
real 53.29
user 53.19
sys 0.00
real 52.13
user 52.12
sys 0.00
real 52.28
user 52.27
sys 0.00
real 55.77
user 54.79
sys 0.01

通过grep real|sed 's/real //'可以得到所有real time统计:

52.63
53.29
52.13
52.28
55.77

直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为53.22

Ampere Altra Q80-30

bash 复制代码
$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              1
On-line CPU(s) list: 0
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           ARM
Model:               1
Model name:          Neoverse-N1
Stepping:            r3p1
BogoMIPS:            50.00
L1d cache:           unknown size
L1i cache:           unknown size
L2 cache:            unknown size
NUMA node0 CPU(s):   0
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

测试结果:

bash 复制代码
$ ./test.sh
real 113.46
user 103.23
sys 0.23
real 103.77
user 103.02
sys 0.03
real 109.15
user 103.01
sys 0.14
real 105.11
user 103.29
sys 0.02
real 104.54
user 103.06
sys 0.02

通过grep real|sed 's/real //'可以得到所有real time统计:

113.46
103.77
109.15
105.11
104.54

直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为107.206

参考

相关推荐
龙泉寺天下行走16 分钟前
编写Linux下第一个Go程序(2024版)
linux·运维·golang
爱技术的小伙子16 分钟前
Linux里的Shell脚本基础:编写简单的Shell脚本
linux·运维·centos·自动化·运维开发
菜鸟赵大宝20 分钟前
【Linux】GNU是什么
linux
致宏Rex42 分钟前
数据无忧:Ubuntu 系统迁移备份全指南
linux·运维·ubuntu
小狮子安度因1 小时前
如何在 Ubuntu上搭建 LAMP
linux·运维·ubuntu
DevGu1 小时前
linux ifconfig未找到命令
linux·运维·服务器
千殃sama2 小时前
Linux高并发服务器开发(八)Socket和TCP
linux·服务器·笔记·学习·tcp/ip
夜流冰2 小时前
GNU/Linux - wic文件的使用
linux
扛枪的书生2 小时前
Linux 提权-SUID/SGID_1
linux·渗透·kali·提权
magic334165632 小时前
Linux启动elasticsearch,提示权限不够
linux·elasticsearch·jenkins