TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

TVM can process graph-level and operator-level optimization.

graph-level optimization

As for the graph-level optimization, it can do operator fusion, constant folding, static memory pre-allocation, and data transformate pass.

operator fusion

Now I want emphasis operator fusion, It split the operator into 4 type:

  • injective (one on one map, e.g., add)
  • reduction
  • complex out fusable(can fuse element-wise op to output)
  • opaque(can't be fused, e.g., sort)

TVM will fuse as much as possible.

These optimization methods are very common.

Operator-level Optimization

TVM seperate schedule and compute. So it can detribute different devices. There are 3 schedule primitives in TVM, Special Memory Scope, Tensorizaiton, Latency Hiding.

  • Special Memory Scope, to utilize maxmium the shaped memory in GPU.
  • Tensorization, spliting a bigger data into micro-data to fully utiize the vectorization.
  • Latency Hiding. Overlaping the computation and transition. On CPU, it is achieving by using multi-threading or hardward prefetching. GPU relys on repid context switching of many wraps of threads.

Automating Optimization

How to find the optimal parameter is very important. It proposed a ML-based cost model, which is a gradient tree boosting model based on XGboost, to predict these prameters by giving the loop pragram in the kernel, which include the memory access count, and the resue ratio of each memory buffer, as well as one-hot encoding of loop annotation such as "vectorize", "unroll" and "parallel". As shown in the following graph, the collected data can be train the model again. So the TVM matainer will updated this model periodicly.

Consequently, TVM lowers the threshold for writing a relavely high-performance kernel. I think there are 2 points deserved us to learn more, which are the schedule primitive and the prediction model.

相关推荐
风象南1 小时前
很多人说,AI 让技术平权了,小白也能乱杀老师傅 ?
人工智能·后端
董董灿是个攻城狮2 小时前
大模型连载1:了解 Token
人工智能
RoyLin4 小时前
沉睡三十年的标准:HTTP 402、生成式 UI 与智能体原生软件的时代
人工智能
needn6 小时前
TRAE为什么要发布SOLO版本?
人工智能·ai编程
毅航6 小时前
自然语言处理发展史:从规则、统计到深度学习
人工智能·后端
前端付豪7 小时前
LangChain链 写一篇完美推文?用SequencialChain链接不同的组件
人工智能·python·langchain
ursazoo7 小时前
写了一份 7000字指南,让 AI 帮我消化每天的信息流
人工智能·开源·github
_志哥_11 小时前
Superpowers 技术指南:让 AI 编程助手拥有超能力
人工智能·ai编程·测试
YongGit11 小时前
OpenClaw 本地 AI 助手完全指南:飞书接入 + 远程部署实战
人工智能
程序员鱼皮13 小时前
斯坦福大学竟然开了个 AI 编程课?!我已经学上了
人工智能·ai编程