TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

TVM can process graph-level and operator-level optimization.

graph-level optimization

As for the graph-level optimization, it can do operator fusion, constant folding, static memory pre-allocation, and data transformate pass.

operator fusion

Now I want emphasis operator fusion, It split the operator into 4 type:

  • injective (one on one map, e.g., add)
  • reduction
  • complex out fusable(can fuse element-wise op to output)
  • opaque(can't be fused, e.g., sort)

TVM will fuse as much as possible.

These optimization methods are very common.

Operator-level Optimization

TVM seperate schedule and compute. So it can detribute different devices. There are 3 schedule primitives in TVM, Special Memory Scope, Tensorizaiton, Latency Hiding.

  • Special Memory Scope, to utilize maxmium the shaped memory in GPU.
  • Tensorization, spliting a bigger data into micro-data to fully utiize the vectorization.
  • Latency Hiding. Overlaping the computation and transition. On CPU, it is achieving by using multi-threading or hardward prefetching. GPU relys on repid context switching of many wraps of threads.

Automating Optimization

How to find the optimal parameter is very important. It proposed a ML-based cost model, which is a gradient tree boosting model based on XGboost, to predict these prameters by giving the loop pragram in the kernel, which include the memory access count, and the resue ratio of each memory buffer, as well as one-hot encoding of loop annotation such as "vectorize", "unroll" and "parallel". As shown in the following graph, the collected data can be train the model again. So the TVM matainer will updated this model periodicly.

Consequently, TVM lowers the threshold for writing a relavely high-performance kernel. I think there are 2 points deserved us to learn more, which are the schedule primitive and the prediction model.

相关推荐
ekprada2 小时前
Day 39 - 图像数据与显存
人工智能·python
oraen2 小时前
深度学习基础与概念笔记
人工智能·深度学习
Maynor9962 小时前
Claude vs ChatGPT vs Gemini: 기능 비교, 사용 경험, 적합 인군
人工智能·chatgpt
IT_陈寒2 小时前
JavaScript 开发者必知的 7 个 ES2023 新特性,第5个能让代码量减少50%
前端·人工智能·后端
winner88812 小时前
从 “碗状函数” 到 “坑坑洼洼”:机器学习的凸与非凸之战
人工智能·机器学习
q_30238195562 小时前
Atlas200赋能水稻病虫害精准识别:AI+边缘计算守护粮食安全
人工智能·边缘计算
芥末章宇2 小时前
TimeGAN论文精读
论文阅读·人工智能·论文笔记
腾飞开源2 小时前
40_Spring AI 干货笔记之 Transformers (ONNX) 嵌入
人工智能·huggingface·onnx·transformers·嵌入模型·spring ai·句子转换器
平凡之路无尽路2 小时前
google11月agent发展白皮书
人工智能·语言模型·自然语言处理·nlp·aigc·ai编程·agi