ONNX量化

ONNX量化

https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html
Quantization Overview
Quantization in ONNX Runtime refers to 8 bit linear quantization of an ONNX model.

During quantization, the floating point values are mapped to an 8 bit quantization space of the form: val_fp32 = scale * (val_quantized - zero_point)

scale is a positive real number used to map the floating point numbers to a quantization space. It is calculated as follows:

For asymmetric quantization:

scale = (data_range_max - data_range_min) / (quantization_range_max - quantization_range_min)

For symmetric quantization:

scale = max(abs(data_range_max), abs(data_range_min)) * 2 / (quantization_range_max - quantization_range_min)

zero_point represents zero in the quantization space. It is important that the floating point zero value be exactly representable in quantization space. This is because zero padding is used in many CNNs. If it is not possible to represent 0 uniquely after quantization, it will result in accuracy errors.

相关推荐
不惑_2 小时前
通俗理解神经网络的前向传播
人工智能·深度学习·神经网络
阿正的梦工坊3 小时前
WebArena:一个真实的网页环境,用于构建更强大的自主智能体
人工智能·深度学习·机器学习·大模型·llm
qijiabao41133 小时前
深度学习|可变形卷积DCNv3编译安装
人工智能·python·深度学习·机器学习·cuda
TonyLee0173 小时前
卷积操作记录(pytorch)
人工智能·pytorch·深度学习
AndrewHZ4 小时前
【图像处理基石】如何基于黑白图片恢复出色彩?
图像处理·深度学习·算法·计算机视觉·cv·色彩恢复·deoldify
liliangcsdn4 小时前
LDM潜在扩散模型的探索
人工智能·深度学习
CoovallyAIHub4 小时前
当小龙虾算法遇上YOLO:如何提升太阳能电池缺陷检测精度?
深度学习·算法·计算机视觉
CoovallyAIHub4 小时前
AI如何精准关联照片与抽象平面图?C3数据集迈向3D视觉多模态
深度学习·算法·计算机视觉
Java后端的Ai之路5 小时前
【神经网络基础】-梯度消失问题
人工智能·深度学习·神经网络·梯度消失