TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

TVM can process graph-level and operator-level optimization.

graph-level optimization

As for the graph-level optimization, it can do operator fusion, constant folding, static memory pre-allocation, and data transformate pass.

operator fusion

Now I want emphasis operator fusion, It split the operator into 4 type:

  • injective (one on one map, e.g., add)
  • reduction
  • complex out fusable(can fuse element-wise op to output)
  • opaque(can't be fused, e.g., sort)

TVM will fuse as much as possible.

These optimization methods are very common.

Operator-level Optimization

TVM seperate schedule and compute. So it can detribute different devices. There are 3 schedule primitives in TVM, Special Memory Scope, Tensorizaiton, Latency Hiding.

  • Special Memory Scope, to utilize maxmium the shaped memory in GPU.
  • Tensorization, spliting a bigger data into micro-data to fully utiize the vectorization.
  • Latency Hiding. Overlaping the computation and transition. On CPU, it is achieving by using multi-threading or hardward prefetching. GPU relys on repid context switching of many wraps of threads.

Automating Optimization

How to find the optimal parameter is very important. It proposed a ML-based cost model, which is a gradient tree boosting model based on XGboost, to predict these prameters by giving the loop pragram in the kernel, which include the memory access count, and the resue ratio of each memory buffer, as well as one-hot encoding of loop annotation such as "vectorize", "unroll" and "parallel". As shown in the following graph, the collected data can be train the model again. So the TVM matainer will updated this model periodicly.

Consequently, TVM lowers the threshold for writing a relavely high-performance kernel. I think there are 2 points deserved us to learn more, which are the schedule primitive and the prediction model.

相关推荐
啊森要自信8 小时前
CANN ops-cv:AI 硬件端视觉算法推理训练的算子性能调优与实战应用详解
人工智能·算法·cann
要加油哦~8 小时前
AI | 实践教程 - ScreenCoder | 多agents前端代码生成
前端·javascript·人工智能
玄同7658 小时前
从 0 到 1:用 Python 开发 MCP 工具,让 AI 智能体拥有 “超能力”
开发语言·人工智能·python·agent·ai编程·mcp·trae
新缸中之脑8 小时前
用RedisVL构建长期记忆
人工智能
J_Xiong01178 小时前
【Agents篇】07:Agent 的行动模块——工具使用与具身执行
人工智能·ai agent
SEO_juper8 小时前
13个不容错过的SEO技巧,让您的网站可见度飙升
人工智能·seo·数字营销
小瑞瑞acd8 小时前
【小瑞瑞精讲】卷积神经网络(CNN):从入门到精通,计算机如何“看”懂世界?
人工智能·python·深度学习·神经网络·机器学习
CoderJia程序员甲8 小时前
GitHub 热榜项目 - 日榜(2026-02-06)
人工智能·ai·大模型·github·ai教程
wukangjupingbb8 小时前
AI多模态技术在创新药研发中的结合路径、机制及挑战
人工智能
CoderIsArt8 小时前
三大主流智能体框架解析
人工智能