TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

TVM can process graph-level and operator-level optimization.

graph-level optimization

As for the graph-level optimization, it can do operator fusion, constant folding, static memory pre-allocation, and data transformate pass.

operator fusion

Now I want emphasis operator fusion, It split the operator into 4 type:

  • injective (one on one map, e.g., add)
  • reduction
  • complex out fusable(can fuse element-wise op to output)
  • opaque(can't be fused, e.g., sort)

TVM will fuse as much as possible.

These optimization methods are very common.

Operator-level Optimization

TVM seperate schedule and compute. So it can detribute different devices. There are 3 schedule primitives in TVM, Special Memory Scope, Tensorizaiton, Latency Hiding.

  • Special Memory Scope, to utilize maxmium the shaped memory in GPU.
  • Tensorization, spliting a bigger data into micro-data to fully utiize the vectorization.
  • Latency Hiding. Overlaping the computation and transition. On CPU, it is achieving by using multi-threading or hardward prefetching. GPU relys on repid context switching of many wraps of threads.

Automating Optimization

How to find the optimal parameter is very important. It proposed a ML-based cost model, which is a gradient tree boosting model based on XGboost, to predict these prameters by giving the loop pragram in the kernel, which include the memory access count, and the resue ratio of each memory buffer, as well as one-hot encoding of loop annotation such as "vectorize", "unroll" and "parallel". As shown in the following graph, the collected data can be train the model again. So the TVM matainer will updated this model periodicly.

Consequently, TVM lowers the threshold for writing a relavely high-performance kernel. I think there are 2 points deserved us to learn more, which are the schedule primitive and the prediction model.

相关推荐
HIT_Weston15 分钟前
45、【Agent】【OpenCode】本地代理分析(请求&接收回调)
人工智能·agent·opencode
逻辑君31 分钟前
认知神经科学研究报告【20260010】
人工智能·深度学习·神经网络·机器学习
星河耀银海42 分钟前
远控体验分享:安全与实用性参考
人工智能·安全·微服务
企业架构师老王1 小时前
2026企业架构演进:科普Agent(龙虾)如何从“极客玩具”走向实在Agent规模化落地?
人工智能·ai·架构
GreenTea1 小时前
一文搞懂Harness Engineering与Meta-Harness
前端·人工智能·后端
鬼先生_sir1 小时前
Spring AI Alibaba 1.1.2.2 完整知识点库
人工智能·ai·agent·源码解析·springai
深念Y1 小时前
豆包AI能力集成方案:基于会话管理的API网关设计
人工智能
龙文浩_2 小时前
Attention Mechanism: From Theory to Code
人工智能·深度学习·神经网络·学习·自然语言处理
ulimate_2 小时前
八卡算力、三个Baseline算法(WALLOSS、pi0、DreamZero)
人工智能
深小乐2 小时前
AI 周刊【2026.04.06-04.12】:Anthropic 藏起最强模型、AI 社会矛盾激化、"欢乐马"登顶
人工智能