gcc的-O优化等级和编译后程序占用空间的关系

文章目录

gcc的-O优化等级和编译后程序占用空间的关系

概述

在调试固件,想看看可以选哪个优化等级,既不会增加代码体积,也不会妨碍单步调试。

不关心具体的优化标志,只关心大的-O优化等级开关。

编译优化标志(-fx)受到编译优化等级开关(-Ox)的控制. 如果一个编译优化标志在makefile中指定了,但是优化等级开关没达到(e.g. -O0), 这些指定的优化标志也不会生效。

如果不指定优化等级,默认的是-O0

然后比较一下同一个实际工程,用不同的优化等级,编译出的代码体积的差别。

gcc在线文档url https://gcc.gnu.org/onlinedocs/

笔记

-O0

bash 复制代码
-O0
Reduce compilation time and make debugging produce the expected results. This is the default.

At , GCC completely disables most optimization passes; they are not run even if you explicitly enable them on the command line, or are listed by as being enabled by default. Many optimizations performed by GCC depend on code analysis or canonicalization passes that are enabled by , and it would not be useful to run individual optimization passes in isolation. -O0-Q --help=optimizers-O

缩短编译时间,并确保调试能产生预期的结果。这是默认设置。

GCC 完全禁用了大多数优化过程;即便您在命令行中明确启用了这些过程,它们也不会被执行,而且在-O0中也不会被列为默认启用的状态。GCC 执行的许多优化过程都依赖于由-O0启用的代码分析或规范化过程,单独运行个别优化过程是毫无意义的。

-O1

bash 复制代码
-O
-O1
Optimize. Optimizing compilation takes somewhat more time, and a lot more memory for a large function.

With , the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time. -O

-O is the recommended optimization level for large machine-generated code as a sensible balance between time taken to compile and memory use: higher optimization levels perform optimizations with greater algorithmic complexity than at . -O

-O -O1 是一样的。

优化编译会花费稍多的时间,对于大型函数来说,还会占用大量内存。

使用 -o 时,编译器会尝试减小代码大小和执行时间,但不会执行任何耗费大量编译时间的优化操作。

-O1 是推荐用于大型机器生成代码的优化级别,它在编译时间和内存使用之间达到了合理的平衡:更高的优化级别会执行算法复杂度更高的优化操作。

-O2

bash 复制代码
-O2
Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to , this option increases both compilation time and the performance of the generated code. -O

-O2在-O1的基础上进一步优化。

GCC 会执行几乎所有不涉及空间与速度权衡的优化操作。与 -O1相比,此选项会增加编译时间,但提高生成代码的性能。

-O3

bash 复制代码
-O3
Optimize yet more. turns on all optimizations specified by and also turns on the following optimization flags: -O3-O2

-O3在-O2的基础上进一步优化。

-Os

bash 复制代码
-Os
Optimize for size. enables all optimizations except those that often increase code size: -Os-O2

-falign-functions  -falign-jumps
-falign-labels  -falign-loops
-fprefetch-loop-arrays  -freorder-blocks-algorithm=stc
It also enables , causes the compiler to tune for code size rather than execution speed, and performs further optimizations designed to reduce code size. -finline-functions

-Os在-O2的基础上进一步优化。

优化以减小代码大小。启用除那些通常会增加代码大小的优化之外的所有优化

-对函数进行对齐

-对跳转进行对齐

-对标签进行对齐

-对循环进行对齐

-预取循环数组

-采用 stc 算法重新排列代码块, 它还使编译器能够针对代码大小而非执行速度进行优化,并执行进一步的优化以减少代码大小。

-内联函数优化

-Ofast

bash 复制代码
-Ofast
Disregard strict standards compliance. enables all optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on , and the Fortran-specific , unless is specified, and . It turns off . -Ofast-O3-ffast-math-fallow-store-data-races-fstack-arrays-fmax-stack-var-size-fno-protect-parens-fsemantic-interposition

-Ofast开启了所有的优化选项,这可能会引起问题。

忽略严格的标准。这能使所有优化生效。它还能使那些并非适用于所有符合标准的程序的优化生效。

-Og

bash 复制代码
-Og
Optimize debugging experience. should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience. It is a better choice than for producing debuggable code because some compiler passes that collect debug information are disabled at . -Og-O0-O0

Like , completely disables a number of optimization passes so that individual options controlling them have no effect. Otherwise enables all optimization flags except for those that may interfere with debugging: -O0-Og-Og-O1

优化调试体验。应作为标准的编辑-编译-调试循环的优化级别首选项,既能提供合理的优化程度,又能保证快速编译和良好的调试体验。与生成可调试代码相比,这是一个更好的选择,会排除干扰调试操作的优化。

-Oz

bash 复制代码
-Oz
Optimize aggressively for size rather than speed. This may increase the number of instructions executed if those instructions require fewer bytes to encode. behaves similarly to including enabling most optimizations. -Oz-Os-O2

If you use multiple options, with or without level numbers, the last such option is the one that is effective. -O

Options of the form specify machine-independent flags. Most flags have both positive and negative forms; the negative form of is . In the table below, only one of the forms is listed---the one you typically use. You can figure out the other form by either removing '' or adding it. -fflag-ffoo-fno-foono-

The following options control specific optimizations. They are either activated by options or are related to ones that are. You can use the following flags in the rare cases when "fine-tuning" of optimizations to be performed is desired. -O

-Oz在-Os -O2的基础上进一步优化

优先考虑优化代码的大小而非速度。这样做可能会增加执行的指令数量,因为这些指令所需的编码字节数较少。其效果类似于启用大多数优化选项。

不同优化等级下编译后的size比较

就拿同一个开源工程中的一个提交为测试对象(e.g. bootloader工程的初始提交), 预留的编译后的目标size空间为16kb

因为程序空间用的差不多了,-O0编译后装不下,将.ld改一下,改为预留最大的MCU代码空间。

编译时,都是先clean, 再build

为了验证是否同一个配置,每次编译的rom-size, ram-size, build-time都一致,编译2次,进行参考。

应该同一个配置,每次编译出的rom-size, rame-size都是相同的,build-time可能稍有差别。

不指定-O开关

bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     24540 bytes   5% of 512kb    150% of  16kb
  RAM        3248 bytes  20% of  16kb

14:04:53 Build Finished. 0 errors, 14 warnings. (took 10s.208ms)
bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     24540 bytes   5% of 512kb    150% of  16kb
  RAM        3248 bytes  20% of  16kb

14:20:15 Build Finished. 0 errors, 14 warnings. (took 10s.63ms)

-O

bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     14120 bytes   3% of 512kb    87% of  16kb
  RAM        3248 bytes  20% of  16kb

14:06:35 Build Finished. 0 errors, 14 warnings. (took 11s.232ms)
bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     14120 bytes   3% of 512kb    87% of  16kb
  RAM        3248 bytes  20% of  16kb

14:29:29 Build Finished. 0 errors, 14 warnings. (took 11s.137ms)

-O0

bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     24540 bytes   5% of 512kb    150% of  16kb
  RAM        3248 bytes  20% of  16kb

14:11:23 Build Finished. 0 errors, 14 warnings. (took 10s.175ms)
bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     24540 bytes   5% of 512kb    150% of  16kb
  RAM        3248 bytes  20% of  16kb

14:30:25 Build Finished. 0 errors, 14 warnings. (took 10s.255ms)

-O1

bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     14120 bytes   3% of 512kb    87% of  16kb
  RAM        3248 bytes  20% of  16kb

14:12:32 Build Finished. 0 errors, 14 warnings. (took 11s.180ms)
bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     14120 bytes   3% of 512kb    87% of  16kb
  RAM        3248 bytes  20% of  16kb

14:31:22 Build Finished. 0 errors, 14 warnings. (took 11s.278ms)

-O2

bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     13348 bytes   3% of 512kb    82% of  16kb
  RAM        3248 bytes  20% of  16kb

14:13:14 Build Finished. 0 errors, 19 warnings. (took 12s.143ms)
bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     13348 bytes   3% of 512kb    82% of  16kb
  RAM        3248 bytes  20% of  16kb

14:32:15 Build Finished. 0 errors, 19 warnings. (took 11s.992ms)

-O3

bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     16428 bytes   4% of 512kb    101% of  16kb
  RAM        3248 bytes  20% of  16kb

14:13:56 Build Finished. 0 errors, 20 warnings. (took 13s.309ms)
bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     16428 bytes   4% of 512kb    101% of  16kb
  RAM        3248 bytes  20% of  16kb

14:33:08 Build Finished. 0 errors, 20 warnings. (took 13s.348ms)

-Os

bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     12308 bytes   3% of 512kb    76% of  16kb
  RAM        3248 bytes  20% of  16kb

14:14:57 Build Finished. 0 errors, 15 warnings. (took 11s.581ms)
bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     12308 bytes   3% of 512kb    76% of  16kb
  RAM        3248 bytes  20% of  16kb

14:33:51 Build Finished. 0 errors, 15 warnings. (took 11s.532ms)

-Ofast

bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     16428 bytes   4% of 512kb    101% of  16kb
  RAM        3248 bytes  20% of  16kb

14:15:47 Build Finished. 0 errors, 20 warnings. (took 13s.511ms)
bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     16428 bytes   4% of 512kb    101% of  16kb
  RAM        3248 bytes  20% of  16kb

14:34:44 Build Finished. 0 errors, 20 warnings. (took 13s.412ms)

-Og

bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     14188 bytes   3% of 512kb    87% of  16kb
  RAM        3248 bytes  20% of  16kb

14:16:43 Build Finished. 0 errors, 16 warnings. (took 10s.719ms)
bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     14188 bytes   3% of 512kb    87% of  16kb
  RAM        3248 bytes  20% of  16kb

14:35:43 Build Finished. 0 errors, 16 warnings. (took 10s.875ms)

-Oz

bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     12308 bytes   3% of 512kb    76% of  16kb
  RAM        3248 bytes  20% of  16kb

14:18:05 Build Finished. 0 errors, 15 warnings. (took 11s.632ms)
bash 复制代码
           [1;4m  SIZE        LPC1769         (bootloader)[0m
  FLASH     12308 bytes   3% of 512kb    76% of  16kb
  RAM        3248 bytes  20% of  16kb

14:36:40 Build Finished. 0 errors, 15 warnings. (took 11s.735ms)

通过多次编译一个配置的工程,可知,编译出来的东西是确定的,编译时间稍有差别。

整理

优化等级 程序输出占用的size(byte) RAM输出占用的size(byte) 编译时间
不指定-O开关 24540 3248 10s.208ms
-O0 24540 3248 10s.175ms
-Ofast 16428 3248 13s.511ms
-O3 16428 3248 13s.309ms
-Og 14188 3248 10s.719ms
-O 14120 3248 11s.232ms
-O1 14120 3248 11s.180ms
-O2 13348 3248 12s.143ms
-Os 12308 3248 11s.581ms
-Oz 12308 3248 11s.632ms

从以上统计数据,可知:

RAM总是没有优化,如果想减少RAM的用量,那么程序中数据结构用的变量要减少才行。RAM的优化和代码编译开关没关系。

对于不同的编译开关,编译时间是有不同,但是我们中的大多数人都是不关心的。

我们其实最关心的是编译出的程序size和程序的稳定性(不会因为编译优化使程序的逻辑出错)

对于编译出的程序的size和编译开关的选择,可知:

不加优化开关时,等同于 -O0

按照编译开关和编译出的程序占用空间的关系,可以看出有些编译开关编译效果是差不多的,按照程序占用空间从大到小,如下:

不指定-O开关/ -O0, 代码占用空间最大

-Ofast/-O3 次之

-Og 次之

-O/-O1 次之

-O2 次之

-Os/-Oz 代码占用空间最小

备注

并不是优化等级越高,编译后程序的占用空间越小。根据以上实验结果,可以看出O3优化比O2优化占用的程序空间还多。

如果是调试程序,启用-Og, 调试感觉是最好的(是有优化的,代码占用空间在O2和O3之间,但是一点不影响单步调试),在每一句代码处都能单步,每一个变量都能观察到。

如果在-Og下,编译出的程序占用空间装不下了,那可能说明需求/实现 = 不合理/可以优化/应该重新评估。

如果需求不能改, 实现逻辑也没有太多可优化的地方,就只能用-Os/-Oz来优化,但是单步调试受影响(单步时,可能下一步就跳过了很多代码行).

END

相关推荐
梁萌2 天前
MySQL分区表使用保姆级教程
数据库·mysql·优化·分区表·分区·partitions
NueXini5 天前
Unity 3D MMO RPG手游征服2GB设备之历程
3d·unity·性能优化·游戏引擎·优化·rpg·mmo
CodeAmaz6 天前
MySQL 调优的(实战思路)
数据库·mysql·优化
mzhan0176 天前
Linux: gcc: pkgconf: 谁添加的-I选项
linux·make·gcc·pkgconf
冉佳驹6 天前
Linux ——— sudo权限管理和GCC编译工具链的核心操作
linux·makefile·make·gcc·sudo·.phony
小小工匠7 天前
LLM - 主流RAG优化思路解析
llm·优化·rag
梁萌7 天前
Percona Toolkit工具优化MySQL索引
数据库·mysql·优化·索引·冗余索引·索引分析
EterNity_TiMe_9 天前
使用openEuler来测试GCC编译效率实战测评
开源·操作系统·gcc·openeuler·实战测评
威桑12 天前
LLVM (Low Level Virtual Machine)全景机制解析
c++·gcc·llvm