天数智芯MR50推理卡测试

1、MR50功能测试

1.1、通过resnet18模型（残差神经网络）进行推理功能测试

图1是一只虎猫（tiger cat）。我们通过resnet18模型的resnet18.onnx（onnx是一种跨平台的通用模型文件格式），来对这张图片进行推理最终得到这张图片是什么动物。

图1

运行sampleResNet时，需要指定运行方式，如图2所示。

tex_i8是指需要量化成int8进行推理。即对每32bit取后8bit进行数据量化，从而减少源数据大小，提高推理性能。
tex_fp16是指需要量化成fp16进行推理。
tex_fp32是指需要量化成fp32进行推理。
tex_s_onnx是调用运行时函数ExecuteFromSerializedONNX进行推理。
ten是调用运行时函数Enqueue进行推理。
tmc是调用运行时函数MultiContext进行推理。
ted是调用运行时函数DynamicShape进行推理。
tcmd是调用运行时函数DynamicShapeMultiContext进行推理。
hook是调用运行时函数ExecuteWithHook进行推理。

图2

本文选用tex_fp16和ted这两种方法分别对图片信息进行推理。

1.1.1、通过resnet18模型tex_fp16方法进行推理

执行命令 ./build/bin/sampleResNet tex_fp16，执行结果如图3所示。下图显示排名前5的推理结果，其中排名第1的就是虎猫（tiger cat）,与图片一致。

图3

1.1.2、通过resnet18模型ted方法进行推理

执行命令 ./build/bin/sampleResNet ted，执行结果如图4所示。下图显示排名前5的推理结果，其中排名第1的就是虎猫（tiger cat）,与图片一致。

图4

1.2、通过yolov5模型（卷积神经网络）进行推理功能测试

图5是一个由狗，自行车，汽车组成的图片。我们通过yolov5模型的yolov5.onnx（onnx是一种跨平台的通用模型文件格式），来对这张图片进行推理最终得到这张图片是什么。

图5

执行命令 ./build/bin/sampleYoloV5 --precision fp16 --data_dir ./data/yolov5，执行结果如图6所示。下图显示89%是dog（狗），80%是car（汽车），44%是bicycle（自行车），与图片一致。

图6中0-123说明本次推理进行了124次算子卷积，其中算子是包含数学运算和逻辑操作的基本操作单元。

图6

1.3、MR50的IxRT和ONNX Runtime的精度测试

MR50提供的IxRT推理引擎与开源高性能引擎ONNX Runtime的精度，在fp16(量化成fp16)时可以达到99.999%。但是IxRT在int8(量化成int16)时，resnet18模型变成99%，在resnet50模型时变成98%或者97%，精度需要进一步优化。

1.3.1、IxRT在fp16下与ONNX Runtime的精度测试

（1）resnet18模型

执行ixrtexec --onnx ./resnet18.onnx --verify_acc --precision fp16。执行结果如图7，各个算子的精度达到99.99999%。

图7

（2）resnet50模型

执行ixrtexec --onnx ./resnet50.onnx --verify_acc --precision fp16。执行结果如图8，各个算子的精度达到99.999%。

图8

（3）yolov5模型

执行ixrtexec --onnx ./yalov5s.onnx --verify_acc --precision fp16。执行结果如图9，各个算子的精度达到99.9999%。

图9

1.3.2、IxRT在int8下与ONNX Runtime的精度测试

（1）resnet18模型

执行ixrtexec --onnx ./resnet18.onnx --verify_acc --precision int8 --cosine_sim 0.99。执行结果如图10，各个算子的精度达到99 %。

图10

（2）resnet50模型

执行ixrtexec --onnx ./resnet50.onnx --verify_acc --precision int8 --cosine_sim 0.99。执行结果如图11，大部分算子的精度达到99%，但是有少部分只有98 %或者97%。

图11

2、MR50性能测试

2.1、数学库性能测试

2.1.1、加法性能测试

执行./GPU_ADD_test，可以得到512K到128M的长度的加法性能，如图12所示。

图12

2.1.2、乘法性能测试

执行./ GPU_MULTI_test，可以得到512K到128M的长度的乘法性能，如图13所示。

图13

2.1.3、三角函数性能测试

执行./GPU_TRIGG_test，可以得到512K到128M的长度的三角函数性能，如图14所示。

图14

2.1.4、傅里叶性能测试

执行./ GPU_IFFT_test，可以得到1K到128M的长度的傅里叶性能，如图15所示。

图15

2.2、帧率fps测试

帧率fps是GPU每秒可以处理的图像帧数，帧率越高，越流畅。Ixrt用resnet18模型跑的帧率最高，为883.51fps；其次是resnet50模型，帧率为549.96fps；最后是yolov5模型，帧率为316.17fps。

2.2.1、resnet18模型fps帧率测试

（1）resnet18在fp16下帧率测试

执行ixrtexec --onnx ./resnet18.onnx --precision fp16，得到帧率为525.88fps。其中如图16所示。

图16

（2）resnet18在int8下帧率测试

执行ixrtexec --onnx ./resnet18.onnx --precision int8，得到帧率为883.51fps。如图17所示。

图17

2.2.2、resnet50模型fps帧率测试

（1）resnet50在fp16下帧率测试

执行ixrtexec --onnx ./resnet50.onnx --precision fp16，得到帧率为345.01fps。其中如图18所示。

图18

（2）resnet50在int8下帧率测试

执行ixrtexec --onnx ./resnet50.onnx --precision int8，得到帧率为549.96fps。如图19所示。

图19

2.2.3、yolov5模型fps帧率测试

（1）yolov5在fp16下帧率测试

执行ixrtexec --onnx ./yolov5s.onnx --precision fp16，得到帧率为261.78fps。其中如图20所示。

图20

（2）yolov5在int8下帧率测试

执行ixrtexec --onnx ./yolov5s.onnx --precision int8，得到帧率为316.17fps。如图21所示。

图21

2.3、模型性能测试

我们以resnet50模型进行fp16量化，运行10次，存储在resnet50_fp16_10.csv里面为例。执行命令ixrtexec --onnx ./resnet50.onnx --precision fp16 --run_profiler --iterations 10 --export_profiler resnet50_fp16_10.csv，得到如图22所示数据。里面包含了各个算子的运行总时间，平均时间。

图22

bash 复制代码

ixrtexec --onnx ./resnet50.onnx --precision fp16 --run_profiler --iterations 10 --export_profiler resnet50_fp16_10.csv



1038  ixrtexec --onnx ./resnet50.onnx --precision fp16 --run_profiler -iterrations 10 --export_profiler resnet50_fp16_10.csv
 1039  ixrtexec --onnx ./resnet50.onnx --precision fp16 --run_profiler --iterrations 10 --export_profiler resnet50_fp16_10.csv
 1040  ixrtexec --onnx ./resnet50.onnx --precision fp16 --run_profiler --iterations 10 --export_profiler resnet50_fp16_10.csv
 1041  ixrtexec --onnx ./resnet50.onnx --precision fp16 --run_profiler --iterations 50 --export_profiler resnet50_fp16_10.csv
 1042  ixrtexec --onnx ./resnet50.onnx --precision fp16 --run_profiler --iterations 10 --export_profiler resnet50_fp16_10.csv
 1043  ixrtexec --onnx ./resnet50.onnx --precision fp16 --run_profiler --iterations 50 --export_profiler resnet50_fp16_50.csv
 1044  ixrtexec --onnx ./resnet50.onnx --precision fp16 --run_profiler --iterations 100 --export_profiler resnet50_fp16_100.csv
 1045  ixrtexec --onnx ./resnet50.onnx --precision fp16 --run_profiler --iterations 500 --export_profiler resnet50_fp16_500.csv
 1046  ixrtexec --onnx ./resnet50.onnx --precision fp16 --run_profiler --iterations 1000 --export_profiler resnet50_fp16_1000.csv
 1047  ixrtexec --onnx ./resnet50.onnx --precision fp16 --run_profiler --iterations 10 --export_profiler resnet50_int8_10.csv
 1048  ixrtexec --onnx ./resnet50.onnx --precision int8 --run_profiler --iterations 10 --export_profiler resnet50_int8_10.csv
 1049  ixrtexec --onnx ./resnet50.onnx --precision int8 --run_profiler --iterations 50 --export_profiler resnet50_int8_50.csv
 1050  ixrtexec --onnx ./resnet50.onnx --precision int8 --run_profiler --iterations 100 --export_profiler resnet50_int8_100.csv
 1051  ixrtexec --onnx ./resnet50.onnx --precision int8 --run_profiler --iterations 500 --export_profiler resnet50_int8_500.csv
 1052* ixrtexec --onnx ./resnet18.onnx --precision fp16 --run_profiler --iterations 1000 --export_profiler resnet50_int8_1000.csv
 1053  cd resnet18
 1054  ll
 1055  cd ..
 1056  find . -name "*.onox"
 1057  find . -name "*.onnx"
 1058  cd resnet18
 1059  ixrtexec --onnx ./resnet18.onnx --precision fp16 --run_profiler --iterations 10 --export_profiler resnet18_ft16_10.csv
 1060  ixrtexec --onnx ./resnet18.onnx --precision fp16 --run_profiler --iterations 50 --export_profiler resnet18_ft16_50.csv
 1061  ixrtexec --onnx ./resnet18.onnx --precision fp16 --run_profiler --iterations 100 --export_profiler resnet18_ft16_100.csv
 1062  ixrtexec --onnx ./resnet18.onnx --precision fp16 --run_profiler --iterations 500 --export_profiler resnet18_ft16_500.csv
 1063  ixrtexec --onnx ./resnet18.onnx --precision fp16 --run_profiler --iterations 1000 --export_profiler resnet18_ft16_1000.csv
 1064  ixrtexec --onnx ./resnet18.onnx --precision int8 --run_profiler --iterations 10 --export_profiler resnet18_int8_10.csv
 1065  ixrtexec --onnx ./resnet18.onnx --precision int8 --run_profiler --iterations 50 --export_profiler resnet18_int8_50.csv
 1066  ixrtexec --onnx ./resnet18.onnx --precision int8 --run_profiler --iterations 100 --export_profiler resnet18_int8_100.csv
 1067  ixrtexec --onnx ./resnet18.onnx --precision int8 --run_profiler --iterations 500 --export_profiler resnet18_int8_500.csv
 1068  ixrtexec --onnx ./resnet18.onnx --precision int8 --run_profiler --iterations 1000 --export_profiler resnet18_int8_1000.csv
 1069  cd ..
 1070  yolov5
 1071  cd yolov5/
 1072  ll
 1073  ixrtexec --onnx ./yolov5s.onnx --precision int8 --run_profiler --iterations 10 --export_profiler yolov5s_int8_10.csv
 1074  ixrtexec --onnx ./yolov5s.onnx --precision int8 --run_profiler --iterations 50 --export_profiler yolov5s_int8_50.csv
 1075  ixrtexec --onnx ./yolov5s.onnx --precision int8 --run_profiler --iterations 100 --export_profiler yolov5s_int8_100.csv
 1076  ixrtexec --onnx ./yolov5s.onnx --precision int8 --run_profiler --iterations 500 --export_profiler yolov5s_int8_500.csv
 1077  ixrtexec --onnx ./yolov5s.onnx --precision int8 --run_profiler --iterations 1000 --export_profiler yolov5s_int8_1000.csv
 1078  ixrtexec --onnx ./yolov5s.onnx --precision fp16 --run_profiler --iterations 10 --export_profiler yolov5s_fp16_10.csv
 1079  ixrtexec --onnx ./yolov5s.onnx --precision fp16 --run_profiler --iterations 50 --export_profiler yolov5s_fp16_50.csv
 1080  ixrtexec --onnx ./yolov5s.onnx --precision fp16 --run_profiler --iterations 100 --export_profiler yolov5s_fp16_100.csv
 1081  ixrtexec --onnx ./yolov5s.onnx --precision fp16 --run_profiler --iterations 500 --export_profiler yolov5s_fp16_500.csv
 1082  ixrtexec --onnx ./yolov5s.onnx --precision fp16 --run_profiler --iterations 1000 --export_profiler yolov5s_fp16_1000.csv