编译器 编译原理
定点数优化:性能成倍提升 llvm-mca代码分析 时钟 流水线
spa-StaticProgramAnalysis程序静态分析-书籍
基于clang static analyzer的源码漏洞检测插件
甲骨文公司编辑器Oracle Solaris Studio 12.4 Information Library (简体中文) c/cpp用户指南 数值计算指南 代码分析器 性能分析器 线程分析器
方舟编译器的Runtime参考实现-中科院软件研究所智能软件研究中心
利用LLVM,Clang制作自己的编译器 示例代码 强烈推荐
编译器
词法分析;
自顶向下的语法分析;
符号表 (symbol table);
基于堆栈 (stack-based) 的虚拟机;
代码生成;
数组和对象的实现.
预处理阶段 预处理器 (cpp) 根据以字符 # 开头的命令, 修改原始的 C 程序.
比如 hello.c 中第一行的 #include <stdio.h> 命令告诉预处理器读取系统头文件 stdio.h 的内容,
并把它直接插入到程序文本中. 结果就得到另一个 C 程序, 通常是以 .i 作为文件扩展名.
编译阶段 编译器 (cc1) 将文本文件 hello.i 翻译成文本文件 hello.s, 它包含一个汇编语言程序.
汇编语言程序中的每条语句都以一种标准的文本格式确切地描述了一条低级机器语言指令.
汇编语言是非常有用的, 因为它为不同高级语言的不同编译器提供了通用的输出语言.
例如, C 编译器和 Fortran 编译器产生的输出文件用的都是一样的汇编语言.
汇编阶段 接
插桩技术
插桩技术是指将额外的代码注入程序中以收集运行时的信息,可分为两种:
(1)源代码插桩[Source Code Instrumentation(SCI)]:额外代码注入到程序源代码中。
●静态二进制插桩[Static Binary Instrumentation(SBI)]:在程序执行前插入额外的代码和数据,生成一个永久改变的可执行文件。
(2)二进制插桩(Binary Instrumentation):额外代码注入到二进制可执行文件中。
●动态二进制插桩[Dynamic Binary Instrumentation(DBI)]:在程序运行时实时地插入额外代码和数据,对可执行文件没有任何永久改变。
DBI做些什么呢
(1)访问进程的内存
(2)在应用程序运行时覆盖一些功能
(3)从导入的类中调用函数
(4)在堆上查找对象实例并使用这些对象实例
(5)Hook,跟踪和拦截函数等等
GCC
编译选项
常用选项
- 常用指令
- 编译选项
-c
Compile or assemble the source files, but do not link.-S
Stop after the stage of compilation proper; do not assemble.-o file
This applies regardless to whatever sort of output is being produced, whether it be an executable file, an object file, an assembler file or preprocessed C code.
- 优化选项
-O
== -O1-O2
已经为较激进优化-O3
更为激进-Os
和-O2有一定的差异性,主要表现在控制生成目标的size(为android默认优化选项)-ffast-math
-ffast-math
does a lot more than just break strict IEEE compliance. First of all, of course, it does break strict IEEE compliance , allowing e.g. the reordering of instructions to something which is mathematically the same (ideally) but not exactly the same in floating point. Second, it disables setting errno after single-instruction math functions, which means avoiding a write to a thread-local variable (this can make a 100% difference for those functions on some architectures). Third, it makes the assumption that all math is finite , which means that no checks for NaN (or zero) are made in place where they would have detrimental effects. It is simply assumed that this isn't going to happen. Fourth, it enables reciprocal approximations for division and reciprocal square root . Further, it disables signed zero (code assumes signed zero does not exist, even if the target supports it) and rounding math , which enables among other things constant folding at compile-time. Last, it generates code that assumes that no hardware interrupts can happen due to signalling/trapping math (that is, if these cannot be disabled on the target architecture and consequently do happen, they will not be handled).- it includes
-fno-math-errno
,-funsafe-math-optimizations
,-ffinite-math-only
,-fno-rounding-math
,-fno-signaling-nans
,-fcx-limited-range
and-fexcess-precision=fast
- 参考:What does gcc's ffast-math actually do?
- 特殊属性(attribute)
特殊选项
- fomit-frame-point
- fp record the history stack of outstanding calls. Most smaller functions don't need a frame-pointer
- larger functions MAY need one. this option allows one extra register to be available for general-purpose use. In thumb it is R7
- 查看:Trying to understand gcc option -fomit-frame-pointer
- fvisibility(链接选项)
- 将库中的symbol隐藏
-fvisibility=default|internal|hidden|protected
- 可以设置所有符号全部隐藏,但暴露部分符号:
- 暴露的符号:
__attribute__ ((visibility ("default")))
and pass-fvisibility=hidden
to the compiler
- 暴露的符号:
- 也可以设置所有符号全部暴露,但部分隐藏:
- 隐藏的符号:
__attribute__ ((visibility ("hidden")))
- 隐藏的符号:
- 参考 how-to-hide-the-exported-symbols-name-within-a-shared-library
- 将库中的symbol隐藏
- fpic
- Generate position independent code
- All objects in a shared library should be fpic or not fpic(keep the same) (GCC -fPIC option)
- diff between
fpic
&fPIC
- fpic and fPIC区别
- Use -fPIC or -fpic to generate position independent code. Whether to use -fPIC or -fpic to generate position independent code is target-dependent. The -fPIC choice always works, but may produce larger code than -fpic (mnenomic to remember this is that PIC is in a larger case, so it may produce larger amounts of code). Using -fpic option usually generates smaller and faster code, but will have platform-dependent limitations, such as the number of globally visible symbols or the size of the code. The linker will tell you whether it fits when you create the shared library. When in doubt, I choose -fPIC, because it always works.
-fpic
Generate position-independent code (PIC) suitable for use in a shared library, if supported for the target machine. Such code accesses all constant addresses through a global offset table (GOT). The dynamic loader resolves the GOT entries when the program starts (the dynamic loader is not part of GCC; it is part of the operating system). If the GOT size for the linked executable exceeds a machine-specific maximum size, you get an error message from the linker indicating that -fpic does not work; in that case, recompile with -fPIC instead. (These maximums are 8k on the SPARC, 28k on AArch64 and 32k on the m68k and RS/6000. The x86 has no such limit.) Position-independent code requires special support, and therefore works only on certain machines. For the x86, GCC supports PIC for System V but not for the Sun 386i. Code generated for the IBM RS/6000 is always position-independent.
-rpath
- 链接选项,运行期生效
- Add a directory to the runtime library search path. This is used when linking an ELF executable with shared objects. All -rpath arguments are concatenated and passed to the runtime linker, which uses them to locate shared objects at runtime.
-L
- 链接选项,编译期链接生效
--library-path=searchdir
- Add path searchdir to the list of paths that ld will search for archive libraries and ld control scripts.
- So,
-L
tells ld where to look for libraries to link against when linking . You use this (for example) when you're building against libraries in your build tree, which will be put in the normal system library paths by make install.--rpath
, on the other hand, stores that path inside the executable, so that the runtime dynamic linker can find the libraries. You use this when your libraries are outside the system library search path. - 参考:What's the difference between -rpath and -L?
-Wl
- The
-Wl,xxx
option for gcc passes a comma-separated list of tokens as a space-separated list of arguments to the linker. Sogcc -Wl,aaa,bbb,ccc
eventually becomes a linker callld aaa bbb ccc
- 参考:I don't understand -Wl,-rpath -Wl,
- The
GDB调试
- 状态查看
- 查看函数堆栈
bt
- 查看一次寄存器状态
info registers
- 打印变量
p your_variable
- 查看函数堆栈
- 控制执行
- 单步执行(step into)
step & s
- 单条语句(step over)
n
- 单步汇编指令执行
si
&stepi
- 继续运行
c
- 添加断点
b xxx.c:line_num
,比如b main.c:97
- 运行
r
- 单步执行(step into)
- 调试脚本
./gdbtest -command=gdbtest.sh
transpiler
- all python to cpp transpiler projects
- cpython based方案分析
- 难验证:基于PyObject以及PyObject衍生出来的系列function缺乏可读性,难以验证生成代码的正确性
- 兼容性差: 基于PyObject类型的c代码风格,很难整合其他编译器功能,包括热点分析,热点函数抓取或替换,以及基于模型的代码块分类等。
- 性能差: 任何对象,哪怕是简单序列操作都被PyObject化,转化的c代码本身的性能堪忧
- shedskin
- 框架部署分析
- 执行环境:需要在嵌入式平台中加入libgc和libpcre3库(distributing binaries)
- libgc库,license: FSF(成分较为复杂,需要深入分析),依赖库:无
- libpcre3库,license:BSD,可直接商用,依赖库:无
- 框架部署分析
TVM
- 图表示
- NNVM
- NNVM相当于深度学习领域的LLVM,是一个神经网络中比较高级的中间表示模块,通常称为计算图。前端你框架只需要将其计算表达成NNVM中间表示,之后NNVM则统一的对图做与具体硬件和框架无关的优化。包括内存分配,数据类型和形状的推导,算子融合等。
- Relay
- Relay解决了静态图和动态图的矛盾。是一种专用于自动微分编程领域的特定域语言。
- 图优化方法
- OpFusion:算子融合
- FoldConstant:常量折叠
- CombineParallelConv2D:结合并行的卷积与运算
- FoldScaleAxis:折叠缩放轴
- AlterOpLayout:改变算子排布
- CanonicalizeOps:规范化算子
- EliminateCommonSubexpr:消除公共子表达式
- NNVM
- 算子优化
- TVM低层次中间表达的特点
- Halide & HalideIR
- Auto-Tuning
- loopy循环变换工具,多面体模型分析
- python作为宿主语言
- TVM低层次中间表达的特点