注:本文为 "M4 宏编程" 相关合辑。
英文引文,机翻未校。
如有内容异常,请看原文。
m4 宏处理器入门
m4 is a macro processor for the COHERENT system. It is a powerful and flexible text processing tool. You can tell it, with a great degree of generality, to search for macro names and replace them with other strings. Macros can also take arguments。
m4 是面向 COHERENT 系统的宏处理器,是一款功能强大且灵活的文本处理工具。你可以用高度通用的方式指定它查找宏名称并替换为其他字符串,宏也支持传入参数。
m4 provides a useful front end for programming languages such as fourth-generation lanaguages (4GLs) which commonly have no built-in macro facility. m4 also has powerful facilities for manipulating files, making decisions conditionally, selecting substrings, and performing arithmetic, so it is useful for processing forms。
m4 可为第四代编程语言(4GL)等通常无内置宏功能的语言提供实用前端;它还具备文件操作、条件判断、子串提取与算术运算等强大能力,适用于表单处理。
The command
m4 [ file ... ]
invokes m4. m4 reads each file in the order given on the command line; if no file is given, m4 reads from the standard input. The file '-' also indicates the standard input; this allows you to perform interactive input while m4 is processing files. m4 reports any file that it cannot open, and eliminates it from the input stream。
执行命令
m4 [ file ... ]
可启动 m4 。m4 按命令行指定顺序读取文件;未指定文件时从标准输入读取,文件名 - 同样代表标准输入,支持在 m4 处理文件时进行交互式输入。无法打开的文件,m4 会提示并从输入流中剔除。
m4 writes its output to the standard output stream. As with other COHERENT commands, the optional output redirection specification >outfile on the command line redirects the output into outfile. To leave m4, type 。
m4 将输出写入标准输出流。与其他 COHERENT 命令一致,命令行中可通过 >outfile 将输出重定向至文件 outfile。退出 m4 请按 <ctrl-D>。
定义与语法
Definitions and Syntax
m4 reads text one line at a time from its input stream. When it reads a line of text, it scans the line for a macro that you have defined. A legal macro name is a string of alphanumeric characters (letters, digits, underscore '_'), the first of which is not a digit. m4 recognizes the macro name only if it is surrounded by nonalphanumeric characters (i.e., spaces or newline characters) on both sides。
m4 从输入流逐行读取文本,每行都会扫描你定义的宏。合法宏名由字母、数字、下划线 _ 组成,首字符不能是数字;m4 仅当宏名两侧为非字母数字字符(空格、换行符)时才识别该宏。
When m4 finds a macro, it removes it from the input stream and replaces it with its definition. It then writes the resulting modified text (called replacement text), onto the input stream. m4 then reads another line from the input stream, and continues processing。
m4 匹配到宏后,会将其从输入流移除并替换为宏定义,再将修改后的文本(替换文本)写回输入流,随后读取下一行继续处理。
Text that is contained within single quotation marks is quoted (i.e., is contained between a grave mark ```on the left and an apostrophe ' on the right). All other text is unquoted. m4 searches only unquoted text for macros。 被单引号包裹的文本为引用文本(左为反引号 ````` ``,右为单引号 '),其余为非引用文本。m4 仅在非引用文本中查找宏。
A macro call can be either a macro or a macro immediately followed by a set of arguments:
macroname(arg1, ..., argn)
宏调用可以是宏名本身,也可以是宏名后紧跟参数列表:
macroname(arg1, ..., argn)
A set of arguments must start with a left parenthesis that follows the macro immediately (i.e., no space can come between the macro and the left parenthesis). The entire argument set must be enclosed by balanced, unquoted parentheses: parentheses may appear within the text of an argument, but they must always come in balanced pairs. A single left or right parenthesis may be passed by quoting it, e.g. (' or )'。
参数列表必须以左括号紧跟宏名开头(宏名与左括号间无空格),整个参数列表需用配对的非引用括号包裹;参数文本内可含括号,但必须成对出现。单个左/右括号可通过引用传递,例如 ('`` 或 )'``。
Arguments are separated by commas that are neither within apostrophes nor within an inner set of unquoted parentheses. m4 strips from each argument all leading unquoted spaces, tabs, and newlines. It processes the text of each argument in the same manner that it processes ordinary text; that is, it removes, evaluates, and replaces any recognized macro calls before it stores the argument text for possible use within the replacement text. If you wish to pass a macro name or an entire macro call as an argument, it must be quoted. m4 stores the values of the first nine arguments for possible use in the replacement text. It processes arguments after the ninth, but throws away the results。
参数由未被引号或内层非引用括号包裹的逗号分隔。m4 会剔除每个参数开头的非引用空格、制表符与换行符,并按普通文本规则处理参数文本:先识别、展开并替换宏,再将参数文本存入替换文本。若需将宏名或完整宏调用作为参数,必须加引用。m4 仅保留前 9 个参数值用于替换文本,第 9 个之后的参数会处理但丢弃结果。
m4 does not search quoted text for macros. Instead, it removes the quotation marks and copies the text to the standard output unchanged. Quotes can be nested; that is, quoted text can contain other blocks of quoted text. m4 removes only the outermost level of quotation marks each time it reads a piece of quoted text. This aids in delaying macro expansion in text until the second (or later) time the text is read by m4。
m4 不会在引用文本中查找宏,而是移除引号后原样输出至标准输出。引用支持嵌套,即引用文本内可包含其他引用块;m4 每次仅移除最外层引号,可将文本中的宏展开延迟到第 2 次及以后读取时执行。
m4 includes numerous predefined macros, which perform various functions. The remainder of this document describes the predefined macros in detail. The Lexicon entry for m4 summarizes each predefined macro。
m4 内置大量预定义宏以实现各类功能,本文剩余部分详细说明这些预定义宏,m4 的词典条目会汇总所有预定义宏。
宏定义
The macro
define(name', definition')
defines a macro name and its replacement text definition. m4 replaces every subsequent unquoted occurrence of name with definition, as described above. For example, the m4 input
define(her', COHERENT')
To know, know, know her Is to love, love, love her ...
produces the output
To know, know, know COHERENT Is to love, love, love COHERENT ...
宏
define(name', definition')
用于定义宏名与替换文本。m4 会将后续所有非引用的 name 替换为 definition。
示例 m4 输入:
define(her', COHERENT')
To know, know, know her Is to love, love, love her ...
输出:
To know, know, know COHERENT Is to love, love, love COHERENT ...
name should usually be quoted. If it is not quoted and it is being redefined, m4 sees its old definition as the first argument to define, which will not have the intended effect. Similarly, definition should be quoted if the macro names that occur in it should not be replaced。
name 通常需要加引用。若未加引用且重新定义,m4 会将旧定义作为 define 的第一个参数,导致效果异常。同理,若 definition 中的宏名无需展开,也应加引用。
Any legal macro name may be the first argument of a define. If you redefine a predefined macro, its original function is lost and cannot be recovered。
任何合法宏名都可作为 define 的第一个参数;重定义预定义宏会丢失原功能且无法恢复。
As noted above, m4 recognizes a macro name only if it is surrounded by non-alphanumeric characters. For example,
define(her', COHERENT') Coherent software is reliable software.
produces the output
Coherent software is reliable software.
m4 does not recognize the characters her in the word Coherent as a macro name。
如前所述,m4 仅当宏名两侧为非字母数字字符时才识别。
示例:
define(her', COHERENT')
Coherent software is reliable software.
输出:
Coherent software is reliable software.
m4 不会将 Coherent 中的 her 识别为宏名。
The value of the define macro is the null or empty string (the string which contains no characters). In other words, m4 puts nothing (the null string) back on its input stream when it processes a define call。
define 宏的返回值为空字符串,即 m4 处理 define 调用时不会向输入流写入任何内容。
Like predefined macros, user-defined macros may take arguments. m4 replaces the string n in the macro definition with the value of the nth argument, where n is a digit (1 to 9). It replaces 0 with the macro name. If the argument set contains fewer than n arguments, m4 replaces $n with the null string. m4 uses functional notation to specify argument sets. Unlike a normal function, however, an m4 macro does not require a fixed number of arguments. The same macro may be called with or without an argument set, or with argument sets containing different numbers of arguments.
与预定义宏一致,用户自定义宏支持参数。m4 会将宏定义中的 $n 替换为第 n 个参数值(n 为 1 至 9 的数字),$0 替换为宏名;参数不足 n 个时,$n 替换为空字符串。m4 用函数式语法表示参数列表,但宏不要求固定参数个数,同一宏可带参、无参或带不同数量参数调用。
The following macro concatenates its arguments:
define(cat', $1$2$3$4$5$6$7$8$9) Then cat(one, two', three'', `four, four ', five(also,),,seven) becomes onetwothreefour, four five(also,)seven 以下宏用于拼接参数: `define(`cat', $1$2$3$4$5$6$7$8$9)` 调用: `cat(one, `two', three'', four, four ', five(also,),,seven)
展开为:
onetwothreefour, four five(also,)seven
A more complex definition is:
define(comma', ``$0 (which looks like ,')'')
This turns each subsequent unquoted occurrence of
comma
into
comma (which looks like ,') 更复杂的定义示例: define(comma', ``$0 (which looks like ,')'')后续所有非引用的comma会被替换为:comma (which looks like ,')
Two sets of quotation marks around the replacement text are necessary. When m4 reads this call to macro define, the resultant argument text is:
comma
for the name and
$0 (which looks like ,')'
for the definition. When m4 sees the text
comma that is not quoted
it evaluates and replaces the now-defined macro name comma to produce the text
comma (which looks like ,')' that is not quoted
on the input stream. Because comma appears inside a set of quotation marks, m4 does not treat it as a macro name. For the same reason, the string ',' also passes through unmodified. The final output is:
comma (which looks like `,') that is not quoted
替换文本需要两层引号。m4 读取该 define 时,宏名为 comma,定义为 ```$0 (which looks like `,')'``。
当 m4 遇到文本:
comma that is not quoted
会展开宏 comma,向输入流写入:
comma (which looks like `,')' that is not quoted。
由于 comma 位于引号内,m4 不会将其识别为宏名,逗号 , 也保持原样,最终输出:
comma (which looks like ,') that is not quoted`
When the predefined macro dumpdef is used without arguments, it returns the names and definitions of all defined macros. For each macro, it returns its quoted name, a tab character, and then its quoted definition; no definition is given for a predefined macro. When used with arguments,
dumpdef(name)
returns the quoted definition of each macro name that appears as an argument。
预定义宏 dumpdef 无参时返回所有已定义宏的名称与定义,格式为引用宏名 + 制表符 + 引用定义,预定义宏不显示定义;带参时:
dumpdef(name)
返回每个参数宏名的引用定义。
The predefined macro
undefine(`name')
removes a macro definition. As noted for define above, the argument must be quoted to have the desired effect. undefine ignores arguments which are not defined macro names. The value of the undefine call is the null string. If a predefined macro is undefined, its original function cannot be recovered。
预定义宏:
undefine(name')用于删除宏定义。与define 一致,参数必须加引用才能生效;未定义的宏名会被忽略,undefine` 返回空字符串。删除预定义宏会丢失原功能且无法恢复。
输入控制
The predefined macro changequote changes the quote characters. For example:
makes the quote characters the left and right braces. It also removes the effect of the previously defined quotation characters. Missing arguments default to for open quotation and ' for close quotation. Thus, changequote without arguments restores the original quote characters and '. If the arguments are identical, the nesting ability of quotation marks is temporarily lost. Instead, the first instance of the new quote character turns on quoting and the next instance turns off quoting. The value of the changequote call is the null string。
预定义宏 changequote 用于修改引号字符。可将引号改为左右大括号,并失效原有引号字符。参数缺失时,左引号默认为 `````,右引号默认为 ',无参 changequote 恢复默认引号。若两个参数相同,引号嵌套功能临时失效,新引号字符第一次出现开启引用,第二次关闭。changequote 返回空字符串。
The predefined macro dnl (delete to newline) ''eats'' all characters from the input stream up to and including the next newline and returns the null string. It is particularly useful in a string of define macro calls. Although m4 replaces each define by the null string, newlines often separate macro definitions, and m4 copies the newlines to the output stream unchanged. Two ways of using dnl are:
define(this, that)dnl define(something, else)dnl
dnl(define(this, that), define(something, else))
The first examples use dnl without arguments. The final example uses dnl with an argument set, which m4 processes (performing each define) and subsequently ignores. The following section describes an alternative (and generally preferable) method of eliminating extraneous newlines in a sequence of define calls。
预定义宏 dnl(delete to newline)会吞噬输入流中直至下一行换行符(含换行符)的所有字符,返回空字符串。连续 define 定义时尤其实用;define 本身返回空字符串,但宏定义间的换行符会被原样输出。
dnl 两种用法:
define(this, that)dnl define(something, else)dnl
dnl(define(this, that), define(something, else))
前一种为无参 dnl,后一种为带参 dnl,m4 会先处理参数内的 define 逻辑再忽略多余内容。下文介绍更推荐的多余换行符消除方案。
m4 includes two decision-making macros: ifdef and ifelse。
m4 提供两个条件判断宏:ifdef 与 ifelse。
ifdef checks whether a macro is defined. It has the following form:
ifdef(macro,defvalue,undefvalue)
If macro is defined, ifdef returns defvalue; otherwise, it returns undefvalue。
ifdef 检查宏是否已定义,语法格式:
ifdef(macro,defvalue,undefvalue)
宏已定义返回 defvalue,未定义返回 undefvalue。
ifelse compares pairs of arguments. It has the following form:
ifelse(arg1,arg2,arg3, ... , arg9)
ifelse compares arg1 with arg2. If they are the same, it returns arg3. If not, and if arg4 is the last argument, it returns arg4. Otherwise, it repeats the process, comparing arg4 with arg5, and so on. Like other m4 macros, this takes a maximum of nine arguments。
ifelse 逐对比较参数,语法格式:
ifelse(arg1,arg2,arg3, ... , arg9)
比较 arg1 与 arg2,相等则返回 arg3;不相等且 arg4 为末尾参数则返回 arg4,否则继续比对 arg4 与 arg5,依次类推。与其他 m4 宏一致,最多支持 9 个参数。
In addition to each file specified in the command line, any other accessible file may be included in the input stream with the predefined macro
include(file)
m4 replaces this macro call on the input stream with the entire contents of the specified file. If file cannot be accessed, include causes a fatal error; m4 prints an error message and exits. The alternative predefined macro
sinclude(file)
functions exactly like include, except that it does not print an error message and stop processing if file is inaccessible。
除命令行指定文件外,可通过预定义宏:
include(file)
将任意可访问文件内容嵌入输入流。m4 会用目标文件完整内容替换该宏调用;文件无法访问时 include 触发致命错误,打印提示信息并退出。
替代预定义宏:
sinclude(file)
功能与 include 完全一致,文件不可访问时不会报错,也不会中断流程。
输出控制
m4 maintains ten output streams, numbered zero through nine. Stream 0 is the standard output, where m4 normally directs its output. Streams 1 through 9 are temporary files. The predefined macro
divert(n)
diverts output away from stream 0, appending it instead to stream n. Any n outside the range 0 to 9 causes output to be thrown away until the next divert call. divert without any arguments or with a nonnumeric argument is equivalent to divert(0). The value of a divert call is the null string。
m4 维护 10 个输出流,编号 0 至 9 。流 0 为标准输出,是默认输出通道;流 1 至 9 对应临时文件。
预定义宏:
divert(n)
将输出从流 0 重定向追加至流 n 。n 超出 0--9 区间时,输出会被丢弃直至下一次 divert 调用。无参或传入非数字参数的 divert 等价于 divert(0),divert 调用返回空字符串。
The preceding section described the use of dnl to eliminate extraneous newlines on the output stream when processing a sequence of define calls. A more readable method of eliminating the newlines is to precede the definitions with divert(-1) and follow them with divert. m4 then diverts the extraneous newlines to the nonexistent stream -1。
前文介绍过用 dnl 清理连续 define 产生的多余换行符,更易读的方案是在宏定义开头添加 divert(-1),定义结束后补充 divert。m4 会将多余换行符重定向至不存在的 -1 流并直接丢弃。
The predefined macro
undivert(streams)
fetches text diverted to one or more temporary streams. It appends the text from the specified streams in the given order to the current output stream. m4 does not allow diverted text to be undiverted back to the same stream. undivert with no arguments undiverts all diversions in numerical order. The value of undivert is the null string; undiverted text is not scanned for macro calls, but is simply moved from one place to another. m4 automatically undiverts all diversions in numerical order to the standard output (stream 0) at the end of processing。
预定义宏:
undivert(streams)
读取重定向至单个或多个临时流的文本,按指定顺序追加到当前输出流。m4 不允许将已重定向文本迁回原流;无参 undivert 会按编号顺序恢复所有重定向内容。undivert 返回空字符串,恢复的文本不会扫描宏调用,仅做纯文本迁移。处理结束时,m4 会自动按编号将所有重定向内容输出至标准输出(流 0)。
To illustrate the use of divert and undivert, invoke m4 and type:
define(`count', 12)
And to see what macro count does, type:
count(one, three)
The output on the screen reads:
Now type:
divert(1)
This diverts device 1 (the standard output) into a temporary file. Now type:
count(one, three)
Nothing appears on the screen. divert sent the output of the macro count(one, three) into a temporary file. Thus, the output is not lost, as you might have thought. To demonstrate the existence of that output, type:
divert
to reset the standard output to be the screen. See for yourself. Now, when you type
count(one, four)
m4 replies on the screen:
onefour
As you can see, the standard output is again directed to the screen. To retrieve the diverted output of count(one, three), and send it to the screen, type:
undivert(1)
which produces:
onethree
演示 divert 与 undivert 用法:启动 m4 交互式环境并输入:
define(count', 12)测试宏功能:count(one, three)屏幕输出对应内容。 输入:divert(1)将输出重定向至流 **1**(临时文件),再输入:count(one, three) 屏幕无任何输出,divert将宏执行结果存入临时文件,内容不会丢失。 输入:divert恢复标准输出至屏幕,再输入:count(one, four)屏幕输出:onefour标准输出已恢复正常。 读取流 **1** 中缓存的内容并输出到屏幕:undivert(1)输出:onethree`
The predefined macro divnum returns the current diversion number。
预定义宏 divnum 返回当前重定向流编号。
The predefined macro
errprint(message)
sends the given message to the standard error stream. The value of errprint is the null string。
预定义宏:
errprint(message)
将指定信息输出至标准错误流,调用返回空字符串。
字符串操作
The predefined macro
substr(string, start, count)
returns a substring of a string of characters. The first argument string can be anything. The second argument start is a number giving the starting position of the desired substring in string. Position 0 is the leftmost character of string, position 1 is the next character to the right, and so on. If start is negative, the orientation switches to the right. Position -1 is the rightmost character of string, position -2 is the character to its left, and so on. The third argument count specifies the length and direction of the substring. Zero returns the null string. A positive count returns a substring consisting of the character addressed by start and count-1 characters to the right of it. A negative number does the same thing, but to the left. If count is omitted, it is assumed to be of the same sign as start and large enough to extend to the end of string in that direction. If start is omitted, it is assumed to be 0 if count is positive or omitted, or -1 if count is negative. For example:
define(alpha', abcdefghijklmnopqrstuvwxyz')
substr(alpha, , )
returns
abcdefghijklmnopqrstuvwxyz
Here both start and count are omitted and are therefore assumed to be 0 and 26, respectively.
substr(alpha, 0, 6)
substr(alpha, , 6)
both return
Similarly,
substr(alpha, , -6)
substr(alpha, 21, )
both return
uvwxyz
Finally,
substr(alpha, -6, )
substr(alpha, 0, 21)
both return
abcdefghijklmnopqrstu
预定义宏:
substr(string, start, count)
用于截取字符串子串。第一个参数 string 为任意字符串;第二个参数 start 为起始位置,0 对应最左侧字符,1 为右侧相邻字符,依次递增;start 为负数时从字符串右侧计数,-1 为最右侧字符,-2 为左侧相邻字符,依次递减。第三个参数 count 控制子串长度与截取方向,传 0 返回空字符串;正数从起始位置向右截取 count 个字符,负数向左截取对应数量字符。省略 count 时,默认与 start 符号一致并截取至字符串末尾;省略 start 时,count 为正或省略则默认 start 为 0 ,count 为负则默认 start 为 -1。
示例:
define(alpha', abcdefghijklmnopqrstuvwxyz')
substr(alpha, , )
返回:
abcdefghijklmnopqrstuvwxyz
省略 start 与 count,默认取值分别为 0 与 26。
substr(alpha, 0, 6)
substr(alpha, , 6)
二者返回相同结果。
substr(alpha, , -6)
substr(alpha, 21, )
均返回:
uvwxyz
substr(alpha, -6, )
substr(alpha, 0, 21)
均返回:
abcdefghijklmnopqrstu
The predefined macro
translit(string, characters, replacements)
transliterates single characters within a string. It returns string with every occurrence of a character specified in characters replaced with the corresponding character from replacements. If there is no corresponding character, translit simply deletes the character. For example:
define(liquorjugs, `pack my box with five dozen liquor jugs')
translit(liquorjugs, aeiou, 1234)
returns:
p1ck my b4x w3th f3v2 d4z2n l3q4r jgs
预定义宏:
translit(string, characters, replacements)
对字符串做单字符映射替换,将 characters 中的每个字符按顺序替换为 replacements 对应位置字符;无对应映射字符时直接删除原字符。
示例:
define(liquorjugs, pack my box with five dozen liquor jugs') translit(liquorjugs, aeiou, 1234)返回:p1ck my b4x w3th f3v2 d4z2n l3q4r jgs`
数值操作
m4 can simulate the long integer variables typical of most programming languages by using define as the assignment operator. Whenever the defined macro name appears unquoted, m4 immediately replaces it by its numeric value。
m4 可将 define 作为赋值运算符,模拟主流编程语言的长整型变量;非引用状态下的宏名会被即时替换为对应数值。
The predefined macros incr and decr return their argument incremented or decremented by 1. Thus,
define(`x', 1234)
incr(x)
returns:
1235
Note that incr and decr do not change the value of the simulated variable x, or of any other variable. They return only that value plus or minus 1; x itself retains its value of 1234。
预定义宏 incr 与 decr 分别返回参数自增 1 、自减 1 后的结果。
示例:
define(x', 1234) incr(x)返回:1235 incr与decr不会修改模拟变量x本身的数值,仅做运算返回,变量x` 仍保持 1234。
incr and decr initialize to zero all arguments that are omitted or not a valid number. Thus, the example
incr(a34/87)
returns 1; but
incr(123.67)
returns 124. As you can see, incr truncates floating-point numbers. The same applies to a variable that you have defined to have a floating-point value。
incr 与 decr 会将缺失参数、非法数字参数默认初始化为 0 。
示例:
incr(a34/87)
返回 1 ;
incr(123.67)
返回 124 。
incr 会直接截断浮点数小数部分,自定义浮点数值变量同样遵循该规则。
More generally, the predefined macro
eval(expression)
evaluates an integer-value arithmetic expression and returns the resulting value. The operators available, in order of decreasing precedence, are:
| 运算符 | 说明 |
|---|---|
() |
Parentheses for grouping 分组用括号 |
+ - |
Unary plus, negation 一元正号、取反(负号) |
^ ** |
Exponentiation 幂运算 |
* / % |
Multiplication, division, modulus 乘法、除法、取模运算 |
+ - |
Addition, subtraction 加法、减法 |
> < >= <= == != |
Comparisons 比较运算 |
! |
Logical negation 逻辑非 |
&& & |
Logical and 逻辑与 |
| ` |
比较运算与逻辑运算符仅返回 0 (假)或 1 (真)。eval 全程以长整型运算,表达式格式非法时触发报错。
The predefined macro
len(string)
returns a numeric value corresponding to the length of string。
预定义宏:
len(string)
返回字符串长度数值。
The predefined macro
index(string, pattern)
returns a numeric value corresponding to the first position where pattern appears in string. If it does not appear, index returns -1. Both pattern and string may be arbitrary strings of any length。
预定义宏:
index(string, pattern)
返回 pattern 在 string 中首次匹配的位置,无匹配项返回 -1 。string 与 pattern 支持任意长度自定义字符串。
The following example defines a macro repeat that repeats its first argument the number of times specified by its second argument.
define(repeat', ifelse(eval(2\<=0),1,\`repeat(1,decr(2) )'1)')
The definition is recursive; that is, repeat calls itself within its own definition. The entire definition is quoted to defer the evaluation of ifelse from when m4 encounters the definition to when it encounters a repeat macro call. Similarly, the recursive repeat call is quoted to defer its evaluation within the ifelse. eval checks if the first argument is less than or equal to 0; if so, it returns 1 (true) and ifelse returns the null string. Otherwise, decr decrements the count, so each successive recursive call has a smaller second argument, and each call appends a copy of the first argument to the previous result. For example:
produces
From this example, you can see that the lowered value of the second argument - generated by the macro decr- is ''kept in mind'' successively. Nevertheless, decr and incr never change the value of a variable. For example, consider:
We now have a variable called turns whose value is ten. Typing
produces:
Ho! Ho! Ho! Ho! Ho! Ho! Ho! Ho! Ho! Ho!
Within repeat, decr lowered the current value of the second argument (i.e. turns), until it becomes zero. But when we type
turns
we see:
10
As you can see, the value of turns remained ten, despite that variable's having been used in a decr statement.
以下示例定义递归宏 repeat,将第一个参数按第二个参数指定次数重复拼接:
define(repeat', ifelse(eval($2<=0),1,,repeat(1,decr(2) )'$1)') 该宏为递归定义,repeat在自身定义体内自我调用。整体定义添加引用,将ifelse的展开逻辑延迟到repeat实际调用时;递归嵌套的repeat 同样加引用,延迟内部展开时机。eval 判断第二个参数是否小于等于 **0**,满足则返回 **1**(真),ifelse输出空字符串;不满足时通过decr` 递减计数,逐层递归并拼接第一个参数内容。
递归过程中 decr 会逐次递减参数数值,但不会修改原始变量本身。定义变量 turns 赋值为 10 ,调用 repeat 循环输出十次内容后,直接调用 turns 仍返回原值 10。
COHERENT 系统接口
The predefined macro
maketemp(string)
creates a unique file name for a temporary file. string is a six-character string that is normally initialized to XXXXXX; maketemp replaces all of the Xs with a pattern of six numerals that form a unique file name in the directory where temporary files are being written. It is the same as the C library routine mktemp. It returns the null string if its argument is less than six characters long。
预定义宏:
maketemp(string)
用于生成唯一临时文件名。参数 string 为 6 位字符模板,常规默认 XXXXXX;maketemp 将模板中所有 X 替换为 6 位数字,生成临时目录内唯一文件名,功能等同于 C 标准库 mktemp 函数。参数长度不足 6 位时返回空字符串。
The predefined macro
syscmd(string)
performs the given COHERENT command and returns the null string. It is the same as the C library routine system。
预定义宏:
syscmd(string)
执行指定 COHERENT 系统命令,调用返回空字符串,功能等同于 C 标准库 system 函数。
A common use of syscmd is to create a file which m4 subsequently reads with an include. For example, to get the output from the COHERENT date command:
define(tempfile', maketemp(/tmp/m4XXXXXX)) define(get_date', syscmd(date >tempfile)'include(tempfile)')
In subsequent input, m4 replaces each occurrence of get_date with the system date information. The definition of tempfile is unquoted, so m4 executes the maketemp call only once (when it processes the define), and it creates only one temporary file. On the other hand, the definition of get_date is quoted, so m4 executes syscmd and include to get the current time and date each time it processes a call to get_date. The temporary file should be
removed with
syscmd(rm tempfile)
at the end of the m4 program。
syscmd 典型用法是生成系统文件,再通过 include 引入文件内容。获取系统日期示例:
define(tempfile', maketemp(/tmp/m4XXXXXX)) define(get_date', syscmd(date >tempfile)'include(tempfile)')
后续文本中调用 get_date,会被替换为系统日期信息。
tempfile 定义未加引用,m4 仅在解析 define 时执行一次 maketemp,只生成单个临时文件;get_date 定义添加引用,每次调用都会重新执行 syscmd 与 include,获取实时日期。m4 脚本执行结束后,需通过:
syscmd(rm tempfile)
手动删除临时文件。
The following example is more complex. It defines a macro save, which appends a macro definition to a file:
define(`save',`syscmd(`cat>>$2 <<\#
define(`$1','dumpdef(`$1')`)
The arguments to define are the name
save
and the definition
syscmd(`cat >>$2 <<\#
define(`$1','dumpdef(`$1')`)
#
')
(Note that the body of macro syscmd uses the shell operator << to create a ''here document''. For more information on here documents, see the tutorial Intorducing sh, the Bourne Shell.) A typical call of this macro is:
save(sample',defs.m4')
which saves the macro definition of sample in a COHERENT file defs.m4 containing macro definitions. When m4 processes this call, the argument of syscmd becomes
cat >>defs.m4 <<#
define(sample', followed by the definition of sample returned by dumpdef, followed by Then syscmd executes the COHERENT cat command to append the here document delimited by # to the macro definition file defs.m4. The leading # delimiter of the here document is quoted with \ to prevent interpretation by the COHERENT shell. Because save uses the character # to delimit the here document, it does not work correctly for macro definitions containing #. For example, save(save',`defs.m4')
does not work as expected。
复杂示例:定义宏 save,将指定宏定义追加写入文件:
define(save',syscmd(cat>>2 \<\<#` `define(`1','dumpdef($1')) # ') define接收宏名save 与完整定义体;syscmd内部借用 Shell<<` 语法创建 here document ,相关用法可参考 Bourne Shell 教程。
典型调用方式:
save(sample',defs.m4')
可将 sample 宏的定义保存至 defs.m4 文件。m4 解析该调用时,会拼接生成完整 cat 命令,以 # 作为分界符追加内容到文件。分界符 # 前加反斜杠 \,避免被 COHERENT Shell 提前解析。
该宏以 # 作为文本分界符,若待保存的宏定义本身包含 # 字符,会出现解析异常,例如 save(save',defs.m4') 无法正常执行。
Note that you can only use save when you run m4 interactively - you cannot use it in a script. Furthermore, save does not always save a definition literally. For example:
save(tempfile', defs.m4')
saves the tempfile definition in defs.m4 as:
define(tempfile', /tmp/m400074a') #
where, as you can see, the XXXXXX has been replaced with a hexadecimal number (which may differ from the one you ). Likewise, the definition of get_date will look like this:
define(get_date', syscmd(date >tempfile)include(tempfile)') #
To load a saved definition into m4, simply type m4 at the shell's command-line prompt to invoke it interactively; and then type:
sinclude(defs.m4)
From now on, you can use any definition that you had saved into file defs.m4。
save 仅支持 m4 交互式环境使用,脚本模式下无法生效;且不会完全原样保存宏定义。
示例:
save(tempfile', defs.m4')
在 defs.m4 中保存为:
define(tempfile', /tmp/m400074a') #
模板中的 XXXXXX 会被替换为十六进制随机字符,get_date 保存后也会做对应格式化处理。
加载已保存的宏定义:在 Shell 启动交互式 m4 ,输入:
sinclude(defs.m4)
即可直接使用 defs.m4 内所有已保存的宏。
错误信息
m4 reports all errors to the standard error stream. An error produces a line of the form
m4: line: message
where line is a decimal line number and message describes the error. For example, the error message
m4: 7: illegal macro name: abc
indicates an attempt to define a macro with the illegal macro name ab c in line 7 of the input stream。
m4 所有错误均输出至标准错误流,标准错误格式:
m4: line: message
line 为十进制行号,message 为错误描述文本。
示例错误提示:
m4: 7: illegal macro name: ab*c
代表输入流第 7 行尝试用非法名称 ab*c 定义宏。
The following error messages may occur:
常见错误提示:
cannot open file
eval: invalid expression
eval: missing or unknown operator
eval: missing value
illegal macro name: name
out of space
/tmp open error
unexpected EOF
The file or name will be the file name or macro name which caused the error, or {NULL} if the required argument is omitted。
错误信息中的 file 与 name 会替换为实际触发错误的文件名、宏名;必选参数缺失时显示为 {NULL}。
m4 does not recognize (and therefore does not report) the most common of m4 errors, namely invoking recursive macro definitions that never terminate. A simple example is the definition
define(recursive', recursive')
When m4 subsequently encounters a call of recursive in its input stream, it replaces it on the input stream with its definition. Because the definition is another call to recursive, m4 replaces it in turn with its definition; the process never terminates. More complicated examples may involve many macro definitions and may be difficult to discover. If m4 enters an endless loop, you can terminate it from the keyboard by typing the interrupt character (normally ) or the kill character (normally <ctrl->). If m4 enters an endless loop while being run in the background, you can terminate it with the kill command。
m4 无法检测也不会提示无限递归类宏错误。最简示例:
define(recursive', recursive')
m4 解析到 recursive 调用时,会持续用自身定义循环替换,形成永不终止的死循环。复杂场景会涉及多个宏嵌套递归,排查难度更高。
m4 陷入死循环时,前台可通过中断键(默认 <ctrl-C>)或终止键(默认 <ctrl-\>)结束进程;后台运行的 m4 可通过 kill 命令终止。
更多信息
The Lexicon entry for m4 gives a summary of its functions and options。
m4 的词典条目汇总了其全部功能与可选参数说明。
注:
COHERENT 系统 (常写作 Coherent)是早期 IBM PC 兼容机上的类 UNIX 操作系统 ,由已倒闭的 Mark Williams Company(MWC)开发,是x86 平台上最早的商用 UNIX 克隆系统之一。
m4 最早是为 COHERENT 系统设计的宏处理器,后来被移植到各类 UNIX/Linux 系统,成为标准文本预处理工具。
-
不是官方 UNIX,无 AT&T 源码与商标授权,完全独立重写实现,接口与行为高度兼容 UNIX。
-
面向 8086/286 等早期 PC,体积小、资源占用低,在 80--90 年代作为 PC 上的轻量 UNIX 环境流行。
-
界面:命令行,默认 KornShell(ksh)。
-
许可:早期闭源商用,2015 年以 BSD-3-Clause 开源。
-
工具链:自带 C 编译器、m4、make、yacc 等开发工具,适合轻量软件开发与脚本处理。
-
兼容性:可运行大量 UNIX 命令、脚本与工具,行为接近 UNIX V7。
Macro Magic: M4 Complete Guide
宏魔法:M4 完全指南
Written By
撰文
JP
Jerry Peek
杰瑞·皮克
May 9, 2019
2019 年 05 月 09 日
A macro processor scans input text for defined symbols --- the macros --- and replaces that text by other text, or possibly by other symbols. For instance, a macro processor can convert one language into another.
宏处理器会扫描输入文本中已定义的符号(即宏),并将对应文本替换为其他文本或其他符号。例如,宏处理器可将一种语言转换为另一种语言。
If you're a C programmer, you know cpp, the C preprocessor, a simple macro processor. m4 is a powerful macro processor that's been part of Unix for some 30 years, but it's almost unknown --- except for special purposes, such as generating the sendmail.cf file. It's worth knowing because you can do things with m4 that are hard to do any other way.
如果你是 C 语言开发者,一定了解 C 预处理器 cpp,它是一款简易宏处理器。m4 是一款功能完备的宏处理器,已在 Unix 体系中存在约 30 年,除生成 sendmail.cf 配置文件这类特定场景外,其普及度并不高。掌握 m4 具备实际价值,部分借助 m4 实现的功能难以通过其他方式替代。
The GNU version of m4 has some extensions from the original V7 version. (You'll see some of them.) As of this writing, the latest GNU version was 1.4.2, released in August 2004. Version 2.0 is under development.
GNU 版 m4 在原始 V7 版本基础上新增了若干扩展特性(后文会介绍部分特性)。本文撰写时,GNU m4 最新稳定版本为 1.4.2,于 2004 年 08 月发布,2.0 版本仍处于开发阶段。
While you won't become an m4 wizard in three pages (or in six, as the discussion of m4 continues next month), but you can master the basics. So, let's dig in.
仅凭本篇内容无法精通 m4,但足以掌握基础用法。接下来开始正式学习。
Simple Macro Processing
基础宏处理
A simple way to do macro substitution is with tools like sed and cpp. For instance, the command sed 's/XPRESIDENTX/President Bush/' reads lines of text, changing every occurrence of XPRESIDENTX to President Bush. sed can also test and branch, for some rudimentary decision-making.
借助 sed、cpp 等工具可实现简易宏替换。例如命令 sed 's/XPRESIDENTX/President Bush/' 会逐行读取文本,将所有 XPRESIDENTX 字符串替换为 President Bush。sed 还支持条件判断与分支逻辑,可完成基础分支判定逻辑。
As another example, here's a C program with a cpp macro named ABSDIFF() that accepts two arguments, a and b.
再举一例,以下 C 语言代码定义了 cpp 宏 ABSDIFF(),该宏可接收 a、b 两个入参。
c
#define ABSDIFF(a, b)
((a)>(b) ? (a)-(b) : (b)-(a))
Given that definition, cpp will replace the code...
基于上述宏定义,cpp 会将如下代码......
c
diff = ABSDIFF(v1, v2);
... with
......替换为:
c
diff = ((v1)>(v2) ? (v1)-(v2) : (v2)-(v1));
v1 replaces a everywhere, and v2 replace b. ABSDIFF() saves typing --- and the chance for error.
代码中所有 a 会被 v1 替换,所有 b 会被 v2 替换。ABSDIFF() 既减少重复编码工作量,也能降低手动编写产生的出错概率。
Introducing m4
m4 简介
Unlike sed and other languages, m4 is designed specifically for macro processing. m4 manipulates files, performs arithmetic, has functions for handling strings, and can do much more.
与 sed 及其他脚本语言不同,m4 专为宏处理场景设计,支持文件操作、算术运算、字符串处理等丰富能力。
m4 copies its input (from files or standard input) to standard output. It checks each token (a name, a quoted string, or any single character that's not a part of either a name or a string) to see if it's the name of a macro. If so, the token is replaced by the macro's value, and then that text is pushed back onto the input to be rescanned. (If you're new to m4, this repeated scanning may surprise you, but it's one key to m4's power.) Quoting text, like text, prevents expansion. (See the section on "Quoting.")
m4 会将文件或标准输入的内容输出至标准输出。程序会逐个检查词法单元(标识符、引号字符串、不属于标识符或字符串的单个字符),判断是否匹配已定义宏名。若匹配,则将该单元替换为宏对应文本,并把替换后的文本重新送入输入流二次扫描。初次接触 m4 的使用者可能会对二次扫描机制感到陌生,这一机制是 m4 能力的重要组成部分。使用 文本 格式包裹内容可阻止宏展开,详见后文「引号机制」章节。
m4 comes with a number of predefined macros, or you can write your own macros by calling the define() function. A macro can have multiple arguments-- up to 9 in original m4, and an unlimited number in GNU m4. Macro arguments are substituted before the resulting text is rescanned.
m4 内置大量预定义宏,也可通过 define() 函数自定义宏。宏支持多入参,原生 m4 最多支持 9 个入参,GNU m4 无数量限制。宏入参会先完成替换,再对替换后的文本执行二次扫描。
Here's a simple example (saved in a file named foo.m4):
以下是简易示例(保存为 foo.m4 文件):
one
define(`one', `ONE')dnl
one
define(`ONE', `two')dnl
one ONE oneONE
`one'
The file defines two macros named one and ONE. It also has four lines of text. If you feed the file to m4 using m4 foo.m4, m4 produces:
该文件定义了 one 与 ONE 两个宏,同时包含四行普通文本。执行 m4 foo.m4 命令处理该文件,输出结果如下:
one
ONE
two two oneONE
one
Here's what's happening:
执行逻辑解析:
-
Line 1 of the input, which is simply the characters one and a newline, doesn't match any macro (so far), so it's copied to the output as-is.
输入第一行仅包含 one 字符与换行符,暂无匹配的已定义宏,直接原样输出。
-
Line 2 defines a macro named one(). (The opening parenthesis before the arguments must come just after define with no whitespace between.) From this point on, any input string one will be replaced with ONE. (The dnl is explained below.)
第二行定义 one() 宏,宏入参前的左括号必须紧跟 define 关键字,中间不能有空格。自此之后,输入中的 one 字符串都会被替换为 ONE,dnl 用法后文说明。
-
Line 3, which is again the characters one and a newline, is affected by the just-defined macro one(). So, the text one is converted to the text ONE and a newline.
第三行同样为 one 字符与换行符,受刚定义的 one() 宏影响,one 被替换为 ONE 后输出。
-
Line 4 defines a new macro named ONE(). Macro names are case-sensitive.
第四行定义全新宏 ONE(),m4 宏名区分大小写。
-
Line 5 has three space-separated tokens. The first two are one and ONE. The first is converted to ONE by the macro named one(), then both are converted to two by the macro named ONE(). Rescanning doesn't find any additional matches (there's no macro named two()), so the first two words are output as two two. The rest of line 5 (a space, oneONE, and a newline) doesn't match a macro so it's output as-is. In other words, a macro name is only recognized when it's surrounded by non-alphanumerics.
第五行包含三个以空格分隔的词法单元,前两个为 one、ONE。one 先被 one() 宏替换为 ONE,随后两个 ONE 均被 ONE() 宏替换为 two。二次扫描未匹配到 two() 宏,前两个单元最终输出为 two two。该行剩余内容(空格、oneONE、换行符)无匹配宏,原样输出。简言之,仅当宏名被非字母数字字符包围时,才会被 m4 识别触发替换。
-
Line 6 contains the text one inside a pair of quotes, then a newline. (As you've seen, the opening quote is a backquote or grave accent; the closing quote is a single quote or acute accent.) Quoted text doesn't match any macros, so it's output as-is: one. Next comes the final newline.
第六行使用引号包裹 one 字符串并以换行符结尾,m4 左引号为反引号,右引号为单引号。被引号包裹的内容不会触发宏匹配,直接原样输出 one 及换行符。
Input text is copied to the output as-is and that includes newlines. The built-in dnl function, which stands for "delete to new line," reads and discards all characters up to and including the next newline. (One of its uses is to put comments into an m4 file.) Without dnl, the newline after each of our calls to define would be output as-is. We could demonstrate that by editing foo.m4 to remove the two dnl s. But, to stretch things a bit, let's use sed to remove those two calls from the file and pipe the result to m4:
输入文本会原样保留换行符并输出。内置函数 dnl 全称为 delete to new line,作用是读取并舍弃当前位置至下一个换行符(含换行符)的所有字符,常用来为 m4 文件添加注释。若不使用 dnl,define 语句后的换行符会直接输出。可编辑 foo.m4 移除两处 dnl 验证效果,也可通过 sed 命令批量删除 dnl 并管道传入 m4 处理:
bash
$ sed 's/dnl//' foo.m4 | m4
输出:
one
ONE
two two oneONE
one
If you compare this example to the previous one, you'll see that there are two extra newlines at the places where dnl used to be.
对比原始输出可发现,原 dnl 所在位置多出两个空行。
Let's summarize. You've seen that input is read from the first character to the last. Macros affect input text only after they're defined. Input tokens are compared to macro names and, if they match, replaced by the macro's value. Any input modified by a macro is pushed back onto the input and is rescanned for possible modification. Other text (that isn't modified by a macro) is passed to the output as-is.
总结规则:m4 按字符顺序读取输入内容;宏仅在定义完成后才会生效;输入词法单元匹配宏名时,会被替换为宏文本,替换内容重新送入输入流二次扫描;未被宏处理的文本直接原样输出。
Quoting
引号机制
Any text surrounded by ```' ''`` (a grave accent and an acute accent) isn't expanded immediately. Whenever m4 evaluates something, it strips off one level of quotes. When you define a macro, you'll often want to quote the arguments --- but not always. Listing One has a demo. It uses m4 interactively, typing text to its standard input.
由反引号与单引号包裹的内容不会立即展开。m4 执行表达式时,会自动剥离一层引号。定义宏时通常需要为入参添加引号,部分场景可省略。示例一为交互式 m4 引号用法演示,直接向标准输入输入指令即可。
Listing One:
示例一:引号用法演示
bash
$ m4
define(A, 100)dnl
define(B, A)dnl
define(C, `A')dnl
dumpdef(`A', `B', `C')dnl
A: 100
B: 100
C: A
dumpdef(A, B, C)dnl
stdin:5: m4: Undefined name 100
stdin:5: m4: Undefined name 100
stdin:5: m4: Undefined name 100
A B C
100 100 100
CTRL-D
$
The listing starts by defining three macros A, B, and C. A has the value 100. So does B: because its argument A isn't quoted, m4 replaces A with 100 before assigning that value to B. While defining C, though, quoting the argument means that its value becomes literal A.
示例中先定义 A、B、C 三个宏,宏 A 赋值为 100 100 100。宏 B 入参 A 未加引号,m4 先将 A 替换为 100 100 100,再赋值给宏 B;宏 C 入参添加引号,直接将字面量 A 设为宏值。
You can see the values of macros by calling the built-in function dumpdef with the names of the macros. As expected, A and B have the value 100, but C has A.
调用内置函数 dumpdef 并传入宏名,可查看宏定义值。执行结果符合预期,A、B 取值为 100 100 100,C 取值为字面量 A。
In the second call to dumpdef, the names are not quoted, so each name is expanded to 100 before dumpdef sees them. That explains the error messages, because there's no macro named 100. In the same way, if we simply enter the macro names, the three tokens are scanned repeatedly, and they all end up as 100.
第二次调用 dumpdef 时,宏名未加引号,宏名会先展开为 100 100 100 再传入函数,因无 100 100 100 命名的宏,故抛出未定义错误。直接输入三个宏名时,词法单元经多次扫描替换,最终均输出 100 100 100。
You can change the quoting characters at any time by calling changequote. For instance, in text containing lots of quote marks, you could call changequote({,})dnl to change the quoting characters to curly braces. To restore the defaults, simply call changequote with no arguments.
可随时调用 changequote 函数修改引号标识。若文本包含大量引号字符,可执行 changequote({,})dnl 将引号替换为大括号;无入参调用 changequote 即可恢复默认引号规则。
In general, for safety, it's a good idea to quote all input text that isn't a macro call. This avoids m4 interpreting a literal word as a call to a macro. Another way to avoid this problem is by using the GNU m4 option --prefix-builtins or -P. It changes all built-in macro names to be prefixed by m4_. (The option doesn't affect user-defined macros.) So, under this option, you'd write m4_dnl and m4_define instead of dnl and define, respectively.
常规使用中,建议为非宏调用的普通文本添加引号,避免字面量被误识别为宏名。GNU m4 提供 --prefix-builtins 与 -P 选项,可为所有内置宏名添加 m4_ 前缀(不影响自定义宏),启用后需使用 m4_dnl、m4_define 替代原生 dnl、define。
Keep quoting and rescanning in mind as you use m4. Not to be tedious, but remember that m4 does rescan its input. For some in-depth tips, see "Web Paging: Tips and Hints on m4 Quoting" by R.K. Owen, Ph.D., at http://owen.sj.ca.us/rkowen/howto/webpaging/m4tipsquote.html.
使用 m4 时需牢记引号规则与二次扫描机制。如需进阶用法技巧,可查阅 R.K. Owen 博士的文章《Web Paging: Tips and Hints on m4 Quoting》:http://owen.sj.ca.us/rkowen/howto/webpaging/m4tipsquote.html
Decisions and Math
条件判断与数学运算
m4 can do arithmetic with its built-in functions eval, incr, and decr. m4 doesn't support loops directly, but you can combine recursion and the decision macro ifelse to write loops.
m4 依托内置函数 eval、incr、decr 实现算术运算,本身无原生循环语法,可通过递归逻辑与条件宏 ifelse 组合实现循环效果。
Let's start with an example adapted from the file /usr/share/doc/m4/examples/debug.m4 (on a Debian system). It defines the macro countdown(). Evaluating the macro with an argument of 5 --- as in countdown(5) --- outputs the text 5, 4, 3, 2, 1, 0, Liftoff!.
以下示例改编自 Debian 系统路径下 /usr/share/doc/m4/examples/debug.m4 文件,示例定义 countdown() 宏,传入入参 5 5 5 执行 countdown(5),会输出 5, 4, 3, 2, 1, 0, Liftoff!。
bash
$ cat countdown.m4
define(`countdown', `$1, ifelse(eval($1 > 0),
1, `countdown(decr($1))', `Liftoff!')')dnl
countdown(5)
$ m4 countdown.m4
5, 4, 3, 2, 1, 0, Liftoff!
The countdown() macro has a single argument. It's broken across two lines.That's fine in m4 because macro arguments are delimited by parentheses which don't have to be on the same line. Here's the argument without its surrounding quotes:
countdown() 宏仅含一个入参,定义分为两行书写。m4 允许宏入参跨换行,括号为入参边界标识,无需同行书写。去除外层引号后的宏入参内容如下:
$1, ifelse(eval($1 > 0), 1,
`countdown(decr($1))', `Liftoff!')
$1 expands to the macro's first argument. When m4 evaluates that countdown macro with an argument of 5, the result is:
$1 代表宏的第一个入参,传入 5 5 5 执行 countdown 宏时,展开结果为:
5, ifelse(eval(5 > 0), 1,
`countdown(decr(5))', `Liftoff!')
The leading "5, " is plain text that's output as-is as the first number in the countdown. The rest of the argument is a call to ifelse. Ifelse compares its first two arguments. If they're equal, the third argument is evaluated; otherwise, the (optional) fourth argument is evaluated.
开头的 5, 为普通文本,直接作为倒计时首个数字输出。剩余内容调用 ifelse 宏,该宏会对比前两个入参,相等则执行第三个入参逻辑,不相等则执行可选的第四个入参逻辑。
Here, the first argument to ifelse, eval(5 > 0), evaluates as 1 1 1 (logical "true") if the test is true (if 5 is greater than 0). So the first two arguments are equal, and m4 evaluates countdown(decr(5)). This starts the recursion by calling countdown(4).
示例中 ifelse 首个入参 eval(5 > 0) 判定结果为 1 1 1(逻辑真),前后入参匹配,执行 countdown(decr(5)),递归调用 countdown(4)。
Once we reach the base condition of countdown(0), the test eval(0 > 0) fails and the ifelse call evaluates Liftoff!. (If recursion is new to you, you can read about it in books on computer science and programming techniques.)
递归至 countdown(0) 时,eval(0 > 0) 判定不成立,ifelse 执行分支输出 Liftoff!。不熟悉递归概念可查阅计算机科学与编程技术相关书籍。
Note that, with more than four arguments, ifelse can work like a case or switch in other languages. For instance, in ifelse(a,b,c,d,e,f,g), if a matches b, then c; else if d matches e then f; else g.
若传入超过四个入参,ifelse 可实现类其他语言 case、switch 多分支匹配逻辑。例如 ifelse(a,b,c,d,e,f,g) 逻辑:a 与 b 匹配则执行 c;否则 d 与 e 匹配则执行 f;均不匹配则执行 g。
The m4 info file shows more looping and decision techniques, including a macro named forloop() that implements a nestable for-loop.
m4 官方 info 文档包含更多循环与条件写法,其中 forloop() 宏可实现嵌套 for 循环。
This section showed some basic math operations. (The info file shows more.) You've seen that you can quote a single macro argument that contains a completely separate string (in this case, a string that prints a number, then runs ifelse to do some more work). This one-line example (broken onto two lines here) is a good hint of m4's power. It's a mimimalist language, for sure, and you'd be right to complain about its tricky evaluation in a global environment, leaving lots of room for trouble if you aren't careful. But you might find this expressive little language to be challenging enough that it's addictive.
本章节展示了基础数学运算用法,更多高级运算可查阅官方文档。示例中可为宏入参添加引号,入参内部可嵌套独立逻辑字符串(输出数字并调用 ifelse 执行后续逻辑)。本单行拆分两行的示例,足以体现 m4 的能力特性。m4 属于极简语法语言,全局作用域下的求值规则较为复杂,使用不慎易引发异常;但其表达能力极强,深入学习后会具备独特使用乐趣。
Building Web Pages
网页生成应用
Let's wrap up this m4 introduction with a typical use: feeding an input file to a set of macros to generate an output file. Here, the macro file html.m4 defines three macros: _startpage(), _ul(), and _endpage(). (The names start with underscore characters to help prevent false matches with non-macro text. For instance, _ul() won't match the HTML tag <ul>.) The _startpage() macro accepts one argument: the page title, which is also copied into a level-1 heading that appears at the start of the page. The _ul() macro makes an HTML unordered list. Its arguments (an unlimited number) become the list items. And _endpage() makes the closing HTML text, including a "last change" date taken from the Linux date utility.
以典型应用场景收尾 m4 基础入门:通过宏文件处理输入文本,生成目标输出文件。html.m4 宏文件定义 _startpage()、_ul()、_endpage() 三个宏,宏名以下划线开头,避免与普通文本误匹配,例如 _ul() 不会与 HTML 标签 <ul> 混淆。_startpage() 接收页面标题入参,同时将标题设为页面一级标题;_ul() 用于生成 HTML 无序列表,支持不限数量入参作为列表项;_endpage() 生成 HTML 结尾标签,并调用 Linux date 工具注入最后修改时间。
Listing Two shows the input file, and Listing Three is the HTML output. The m4 macros that do all the work are shown in Listing Four. (Both the input file and the macros are available online at http://www.linux-mag.com/downloads/2005-02/power.)
输入文件见示例二,生成的 HTML 结果见示例三,核心 m4 宏定义见示例四。输入文件与宏文件可在线获取:http://www.linux-mag.com/downloads/2005-02/power
Listing Two:
示例二:未展开的网页源文件 webpage.m4h
_startpage(`Sample List')
_ul(`First item', `Second item',
`Third item, longer than the first two')
_endpage
Listing Three:
示例三:m4 生成的网页文件
bash
$ m4 html.m4 webpage.m4h > list.html
$ cat list.html
<html><head><title>Sample List</title></head><body><h1>Sample List</h1><ul><li>First item</li><li>Second item</li><li>Third item, longer than the first two</li></ul>
<p>Last change: Fri Jan 14 15:32:06 MST 2005</p></body></html>
In Listing Four, both _startpage() and _endpage() are straightforward. The esyscmd macro is one of the many m4 macros we haven't covered --- it runs a Linux command line, then uses the command's output as input to m4. The _ul() macro outputs opening and closing HTML <ul> tags, passing its arguments to the _listitems() macro via $@, which expands into the quoted list of arguments.
示例四中 _startpage() 与 _endpage() 逻辑简洁直观。esyscmd 是未讲解的内置宏,可执行 Linux 终端命令,并将命令输出作为 m4 输入内容。_ul() 输出 HTML <ul> 首尾标签,通过 @ 将所有入参传递 给 l i s t i t e m s ( ) 宏, @ 将所有入参传递给 _listitems() 宏, @将所有入参传递给listitems()宏,@ 会展开为带引号的全部入参列表。
_listitems() is similar to the countdown() macro shown earlier: _listitems() makes a recursive loop. At the base condition (the end of recursion), when KaTeX parse error: Expected 'EOF', got '#' at position 1: #̲ (the number of...# is 1), ifelse simply outputs the last list item inside a pair of <li> tags. Otherwise, there's more than one argument, so the macro starts by outputting the first argument inside <li> tags, then calls _listitems() recursively to output the other list items. The argument to the recursive call is shift($@). The m4 shift macro returns its list of arguments without its first argument --- which, here, is all of the arguments we haven't processed yet.
_listitems() 宏与前文 countdown() 逻辑类似,依托递归实现循环。# 代表入参个数,递归终止条件为入参数量等于 0 0 0,此时 ifelse 传入空分支不执行任何操作;入参数量等于 1 1 1 时,直接将唯一入参包裹 `
Notice the nested quoting: some of the arguments inside the (quoted) definition of _listitems() are quoted themselves. This delays interpretation until the macro is called. (m4 tracing, which we'll cover next month, can help you see what's happening.)
注意嵌套引号用法:_listitems() 宏定义内部的部分入参额外添加引号,用于延迟宏求值逻辑,仅在宏调用时才展开。下月讲解的 m4 追踪调试功能,可直观查看宏展开全过程。
Listing Four:
示例四:生成 HTML 的宏文件 html.m4
define(`_startpage', `<head><title>$1</title></head><body><h1>$1</h1>')dnl
define(`_endpage', `<p>Last change: esyscmd(date)</p></body></html>')dnl
define(`_listitems', `ifelse($#, 0, ,$#, 1, `<li>$1</li>',`<li>$1</li>_listitems(shift($@))')')dnl
define(`_ul', `<ul>_listitems($@)</ul>')dnl
This month, let's dig deeper into m4 and look at included files, diversions, frozen files, and debugging and tracing. Along the way, we'll see some of the rough edges of m4's minimalist language and explore workarounds. Before we start, though, here's a warning from the GNU m4 info page:
本月继续深入学习 m4,涵盖文件引入、分流输出、冻结文件、调试追踪等内容,同时剖析 m4 极简语法的细节特性与兼容适配方案。正式开始前,引用 GNU m4 官方 info 文档的一段趣味提示:
Some people[ find] m4 to be fairly addictive. They first use m4 for simple problems, then take bigger and bigger challenges, learning how to write complex m4 sets of macros along the way. Once really addicted, users pursue writing of sophisticated m4 applications even to solve simple problems, devoting more time debugging their m4 scripts than doing real work. Beware that m4 may be dangerous for the health of compulsive programmers.
部分使用者会对 m4 产生浓厚使用兴趣:初期仅用 m4 处理简单场景,随后逐步尝试复杂需求并编写复杂宏集。深度熟悉后,即便简单任务也倾向于开发复杂 m4 应用,花费在脚本调试上的时间甚至超过业务工作本身。热衷于编程的使用者需留意,m4 极易让人沉浸其中。
So take a deep breath... Good. Now let's dig in again!
稍作休整,继续深入学习。
Included Files
文件引入
m4's built-in include() macro takes m4's input from a named file until the end of that file, when the previous input resumes. sinclude() works like include() except that it won't complain if the included file doesn't exist.
m4 内置 include() 宏可读取指定文件内容作为输入,文件读取完毕后恢复原有输入流。sinclude() 功能与 include() 一致,但若引入文件不存在,不会抛出报错信息。
If an included file isn't in the current directory, GNU m4 searches the directories specified with the -I command-line option, followed by any directories in the colon-separated M4PATH environment variable.
若引入文件不在当前目录,GNU m4 会优先检索 -I 命令行参数指定目录,再检索以冒号分隔的 M4PATH 环境变量目录。
Including files is often used to read in other m4 code, but can also be used to read plain text files. However, if you're reading plain text files, watch out for files that contain text that can confuse m4, such as quotes, commas, and parentheses. One way to work around that problem and read the contents of a random file is by using changequote() to temporarily override the quoting characters and also replacing include() with esyscmd(), which filters the file through a Linux utility like tr or sed.
文件引入多用于加载外部 m4 宏代码,也可读取纯文本文件。读取纯文本时需注意,文件中的引号、逗号、括号等字符可能干扰 m4 解析。适配方案:通过 changequote() 临时修改引号规则,同时用 esyscmd() 替代 include(),借助 tr、sed 等 Linux 工具过滤文件内容后再送入 m4。
Listing One has a contrived example that shows one way to read /etc/hosts, replacing parentheses with square brackets and commas with dashes.
示例一为实操演示,读取 /etc/hosts 文件,将文件中的括号替换为方括号、逗号替换为短横线。
Listing One:
示例一:m4 include() 宏的文件过滤用法
% cat readfile.m4
dnl readfile: display file named on
dnl command line in --Dfile=
dnl converting () to [] and , to -
file `file on '
esyscmd(`hostname')
changequote({,})dnl
esyscmd({tr '(),' '[]-' < }file)dnl
That's all.
changequote
% cat /etc/hosts
127.0.0.1 localhost
216.123.4.56 foo.bar.com foo
# Following lines are for `IPv6'
# (added automatically, we hope)
::1 ip6-localhost ip6-loopback
...
% m4 --Dfile=/etc/hosts readfile.m4
/etc/hosts file on foo
127.0.0.1 localhost
234.123.4.56 foo.bar.com foo
# Following lines are for `IPv6'
# [added automatically- we hope]
::1 ip6-localhost ip6-loopback...
That's all.
The option -D or --define lets you define a macro from the command line, before any input files are read. (Later, we'll see an cleaner way to read text from arbitrary files with GNU m4's undivert().)
-D 与 --define 选项可在读取输入文件前,通过命令行定义宏。后文会介绍借助 GNU m4 的 undivert() 读取任意文本文件的更简洁方案。
Diversions: An Overview
分流输出:基础概述
Normally, all output is written directly to m4's standard output. But you can use the divert() macro to collect output into temporary storage places. This is one of m4's handiest features.
默认情况下,m4 所有输出直接写入标准输出。借助 divert() 宏可将输出内容暂存至临时存储区域,是 m4 极具实用价值的特性之一。
The argument to divert() is typically a stream number, the ID of the diversion that should get the output from now on.
divert() 入参通常为流编号,用于指定后续输出内容的存储分流标识。
-
Diversion 0 is the default. Text written to diversion 0 goes to m4's standard output. If you've been diverting text to another stream, you can call divert(0) or just divert to resume normal output.
分流编号 0 0 0 为默认输出通道,写入该分流的内容直接输出至标准输出。切换至其他分流后,执行 divert(0) 或无参 divert 即可恢复默认输出。
-
Text written to diversions 1, 2, and so on is held until m4 exits or until you call undivert(). (More about that in a moment.)
写入编号 1 、 2 1、2 1、2 及后续分流的内容会临时缓存,直至 m4 进程退出或调用 undivert() 才会输出。
-
Any text written to diversion -1 isn't emitted. Instead, diversion -1 is "nowhere," like the Linux pseudo-file /dev/null. It's often used to comment code and to define macros without using the pesky dnl macro at the ends of lines.
写入编号 − 1 -1 −1 分流的内容不会输出,等同于 Linux 空设备 /dev/null。常用于代码注释、批量定义宏,无需在每行末尾添加 dnl。
-
The divnum macro outputs the current diversion number.
divnum 宏可输出当前所处的分流编号。
Standard m4 supports diversions -1 through 9, while GNU m4 can handle a essentially unlimited number of diversions. The latter version of m4 holds diverted text in memory until it runs out of memory and then moves the largest chunks of data to temporary files. (So, in theory, the number of diversions in GNU m4 is limited to the number of available file descriptors.)
标准 m4 仅支持 − 1 -1 −1 至 9 9 9 共 11 个分流,GNU m4 无数量上限。GNU m4 优先将分流内容存入内存,内存不足时将大块数据写入临时文件,理论上分流量受系统可用文件描述符数量限制。
All diversions 1, 2, ..., are output at the end of processing in ascending order of stream number. To output diverted text sooner, simply call undivert() with the stream number. undivert() outputs text from a diversion and then empties the diversion. So, immediately calling undivert() again on the same diversion outputs nothing.
编号 1 、 2 1、2 1、2 等分流会在程序处理末尾按编号升序统一输出。需提前输出某分流内容时,传入分流编号调用 undivert() 即可;该函数输出分流内容后会清空缓存,重复调用同一分流编号无任何输出。
"Undiverted" text is output to the current diversion, which isn't always the standard output! You can use this to move text from one diversion to another. Output from a diversion is not rescanned for macros.
执行分流释放后的内容会写入当前所在分流,不一定是标准输出,可借此实现分流间内容迁移。分流缓存的内容释放输出时,不会再次执行宏二次扫描。
Diverse Diversions
分流输出多元用法
Before looking at the more-obvious uses of numbered diversions, let's look at a few surprising ones.
讲解编号分流常规用法前,先介绍几种小众实用场景。
As was mentioned, diversion -1 discards output. One of the most irritating types of m4 output is the newline characters after macro definitions. You can stop them by calling dnl after each define, but you can also stop them by defining macros after a call to divert(-1).
前文提及 − 1 -1 −1 分流可丢弃输出内容。m4 宏定义后多余换行符是常见干扰项,除每行 define 后添加 dnl 外,也可将宏定义逻辑放入 divert(-1) 作用域内屏蔽换行输出。
Here are two examples. This first example, nl, doesn't suppress the newline from define...
以下两组对比示例,nl 示例不会屏蔽 define 产生的换行:
`The result is:'define(`name', `value')name
...but the next example, nonl, does, by defining the macro inside a diversion:
nonl 示例借助分流定义宏,可屏蔽多余换行:
`The result is:'divert(-1)define(`name', `value')divert(0)dnlname
Let's compare the nl and nonl versions.
执行对比效果:
bash
$ m4 nl
The result is:
value
$ m4 nonl
The result is:value
The second divert() ends with dnl, which eats the the following newline. Adding the argument (0), which is actually the default, lets you write dnl without a space before it (which would otherwise be output). You can use divert'dnl instead, because an empty quoted string (`` ' '') is another way to separate the divert and dnl macro calls.
第二处 divert() 末尾的 dnl 用于吸收后续换行符。显式传入入参 ( 0 ) (0) (0)(默认值),可直接紧贴 dnl 书写无需空格(空格会被输出);也可使用 divert`'dnl 写法,空引号字符串可作为两个宏调用的分隔标识。
Of course, that trick is more reasonably done around a group of several defines. You can also write comments inside the same kind of diversion. This is an easy way to write blocks of comments without putting dnl at the start of each line. Just remember that macros are recognized inside the diversion (even though they don't make output). So, the following code increments i twice:
该技巧更适合批量多行 define 定义场景,也可在分流内编写整块注释,无需每行开头添加 dnl。需注意:分流内仍会识别并展开宏(仅屏蔽输出),如下代码会使变量 i 自增两次:
divert(-1)
Now we run define(`i', incr(i)):
define(`i', incr(i))
divert`'dnl
dnl can start comments, and that works on even the oldest versions of m4. Generally, # is also a comment character. If you put it at the start of the comment above, as in #Now ..., then i won't be incremented.
dnl 可作为注释起始标识,兼容所有老旧 m4 版本。# 同样可作为注释符,若将上文注释改为 #Now ...,注释内的宏不会被解析,i 不会额外自增。
Before seeing the "obvious" uses of diversions, here's one last item from the bag of diversion tricks. GNU m4 lets you output a file's contents by calling undivert() instead of include(). The advantage is that, like undiverting a diversion, "undiverting" a file doesn't scan the file's contents for macros. This lets you avoid the really ugly workaround showed in Listing One.
讲解分流常规用法前,补充最后一个实用技巧:GNU m4 可通过 undivert() 替代 include() 读取文件内容,优势与分流释放一致,不会扫描解析文件内的宏,可规避示例一中复杂的文件过滤适配写法。
With GNU m4, you could have written simply:
GNU m4 中可直接简洁写法读取文件:
undivert(`/etc/hosts')
Diversions as Diversions
分流输出常规业务用法
The previous section showed some offbeat uses of divert(). Now let's see a more obvious use: splitting output into parts and reassembling those parts in a different order.
前文介绍了 divert() 小众用法,本节讲解核心常规用途:将输出内容拆分至不同分流,再按需调整顺序重新组合输出。
Listing Two, Three, and Four show a HTML generator that outputs the text of each top-level heading in two places: in a table of contents at the start of the web page, and again, later, in the body of the web page. The table of contents includes links to the actual headings later in the document, which will have an anchor (an HTML id).
示例二、三、四实现 HTML 生成器,页面一级标题同时在两处输出:网页开头目录、正文对应位置;目录添加锚点链接,可跳转至正文对应标题位置,锚点依托 HTML id 属性实现。
Listing Two has the file, htmltext.m4, with the macro calls. Listing Three shows the HTML output from the macros (which omits the blank lines, because HTML parsers ignore them). Listing Four shows the macros, which call include() to bring in the htmltext.m4 file at the proper place. (Blank lines have been added to the macros to make the start and end of each macro more obvious.)
示例二为包含宏调用的 htmltext.m4 源文件;示例三为最终生成 HTML(省略空行,HTML 解析器会自动忽略空行);示例四为宏定义文件,通过 include() 在指定位置引入 htmltext.m4,宏定义中添加空行便于区分宏边界。
Listing Two:
示例二:m4 宏调用文件 htmltext.m4
_h1(`First heading')
_p(`The first paragraph.')
_h1(`Second heading')
_p(`The second paragraph.Yadda yadda yadda')
_h1(`Third heading')
_p(`The third paragraph.')
Listing Three:
示例三:生成的 HTML 输出
html
<strong>Table of contents:</strong>
<ol>
<li><a href="#H1_1">First heading</a></li>
<li><a href="#H1_2">Second heading</a></li>
<li><a href="#H1_3">Third heading</a></li>
</ol>
<h1 id="H1_1">First heading</h1>
<p>The first paragraph.</p>
<h1 id="H1_2">Second heading</h1>
<p>The second paragraph.Yadda yadda yadda</p>
<h1 id="H1_3">Third heading</h1>
<p>The third paragraph.</p>
Listing Four:
示例四:生成 HTML 的 m4 宏代码
define(`_h1count', 0)
define(`_h1', `divert(9)define(`_h1count', incr(_h1count))<li><a href="`#'H1`_'_h1count">$1</a></li>divert(1)<h1 id="H1`_'_h1count">$1</h1>divert')
define(`_p', `divert(1)<p>$1</p>divert')
include(`htmltext.m4')
<strong>Table of contents:</strong>
<ol>undivert(9)</ol>
undivert(1)
Let's look at the code in Listing Four.
示例四代码逻辑解析:
-
The _h1count macro sets the number used at the end of each HTML id. It's incremented by a define call inside the _h1 macro.
_h1count 宏用于计数,为每个 HTML id 提供后缀编号,每次调用 _h1 宏时自动自增。
-
The _h1 (heading level 1) macro starts by calling divert(9). The code used diversion 9 to store the HTML for the table of contents. After incrementing _h1count, the macro outputs a list item surrounded by
<li>and</li>tags. (The<ol>tags come later: when the code undiverts diversion 9.) Notice that the#is quoted to keep it from being treated as an m4 comment character. In the same way, the underscore is quoted (_), since it's used as part of the HTML id string (for instance,href="#H1_2"). A final call to divert switches output back to the normal diversion 0, which is m4's standard output._h1 一级标题宏先切换至 9 9 9 分流,用于存储目录 HTML 内容;_h1count 自增后,生成目录列表项标签,
<ol>有序列表标签在后续释放 9 9 9 分流时补充。#与下划线添加引号包裹,避免被 m4 识别为注释符或特殊语法字符,同时保留在 HTML id 与链接锚点中。最后调用 divert 恢复默认 0 0 0 分流输出。 -
The _p (paragraph) macro is straightforward. It stores a pair of
<p>tags with the first macro argument in-between in diversion 1._p 段落宏逻辑简洁,将段落内容包裹
<p>标签后存入 1 1 1 分流。 -
A call to include() brings in the file htmltext.m4 (Listing Two). This could have done this in several other ways, on the m4 command line, for instance.
通过 include() 引入示例二的 htmltext.m4,也可通过 m4 命令行参数加载该文件。
-
Finally, the call undivert(9) outputs the table of contents surrounded by a pair of ordered-list tags, followed by the headers and paragraphs from undivert(1).
最后调用 undivert(9) 输出目录内容并包裹有序列表标签,再调用 undivert(1) 输出正文标题与段落内容。
This example shows one use of diversions: to output text in more than one way. Another common use --- in sendmail, for instance --- is gathering various text into "bunches" by its type or purpose.
该示例体现分流核心价值:同一份内容多位置输出。另一高频场景(如 sendmail 配置):按类型或用途将不同文本归类缓存,统一整合输出。
Frozen Files
冻结文件
Large m4 applications can take time to load. GNU m4 supports a feature called frozen files that speeds up loading of common base files. For instance, if your common definitions are stored in a file named common.m4, you can pre-process that file to create a frozen file containing the m4 state information:
大型 m4 应用加载耗时较长,GNU m4 提供冻结文件特性,可加速通用基础宏文件加载。若通用宏定义存放于 common.m4,可预处理生成冻结文件,保存 m4 运行状态:
bash
$ m4 --F common.m4f common.m4
Then, instead of using m4 common.m4, you use m4 --R common.m4f for faster access to the common definitions.
后续无需重新加载 common.m4,执行 m4 --R common.m4f 即可快速加载所有通用宏定义。
Frozen files work in a majority of cases, but there are gotchas. Be sure to read the m4 info file (type info m4) before you use this feature.
冻结文件适用于多数场景,但存在部分使用限制,启用前建议通过 info m4 查阅官方说明文档。
Debugging and Tracing
调试与追踪
m4's recursion and quoting can make debugging a challenge. A thorough understanding of m4 helps, of course, and the techniques shown in the next section are worth studying. Here are some built-in debugging techniques:
m4 的递归逻辑与引号嵌套增加了调试难度,扎实掌握语法基础可降低排障成本,同时可借助内置调试能力辅助排查:
-
To see a macro definition, use dumpdef(), which was covered last month. dumpdef() shows you what's left after the initial layer of quoting is stripped off of a macro definition and any substitutions are made.
查看宏定义使用前文介绍的 dumpdef(),该函数会剥离宏定义首层引号并完成基础替换,展示最终宏有效值。
-
The traceon() macro traces the execution of the macros you name as arguments, or, without a list of macros, it traces all macros. The trace output shows the depth of expansion, which is typically 1 1 1, but can be greater if a macro contains macro calls. Use traceoff to stop tracing.
traceon() 可追踪指定宏的执行流程,无入参时追踪所有宏;追踪输出会显示宏展开深度,常规为 1 1 1,嵌套宏调用会增大深度;traceoff 用于关闭追踪功能。
-
The debugmode() macro gives you a lot of control over debugging output. It accepts a string of flags, which are described in the m4 info file. You can also specify debugging flags on the command line with
-dor--debug. These flags also affect the output of dumpdef() and traceon().debugmode() 可精细化控制调试输出,接收标识字符串参数,标识含义参考官方文档;也可通过
-d与--debug命令行参数设置调试标识,该参数同时影响 dumpdef() 与 traceon() 的输出格式。
More about m4
m4 拓展学习资源
Last month and this, you've seen some highlights of m4. If you have the GNU version of m4, its info page (info m4) is a good place to learn more.
通过本月及上月内容,已覆盖 m4 核心常用能力。使用 GNU m4 可通过 info m4 官方文档系统学习进阶知识。
R.K. Owen's quoting page (http://owen.sj.ca.us/rkowen/howto/webpaging/m4tipsquote.html) has lots of tips about --- what else --- quoting in m4. His site also has other m4 information and examples.
R.K. Owen 个人页面包含大量 m4 引号用法技巧及示例:https://owen.sj.ca.us/rkowen/howto/webpaging/m4tipsquote.html(见下文)
Ken Turner's technical report "CSM-126: Exploiting the m4 Macro Language," available from http://www.cs.stir.ac.uk/research/publications/techreps/previous.html, shows a number of m4 techniques.
Ken Turner 技术报告《CSM-126: Exploiting the m4 Macro Language》包含大量 m4 实战技巧,查阅地址:http://www.cs.stir.ac.uk/research/publications/techreps/previous.html
-
A Google search for
m4 macroturns up a variety of references. To find example code, try a search with an m4-specific macro name, likem4 dnlandm4 divert --motorway. (In Google, the--motorwayavoids matches of the British road named the M4. You can also add--sendmailto skip sendmail--specific information.)检索学习资料可搜索
m4 macro;查找示例代码可搭配专属宏名检索,如m4 dnl、m4 divert --motorway。Google 检索添加--motorway可屏蔽英国 M4 公路相关无关结果,添加--sendmail可过滤 sendmail 专属配置内容。 -
Mailing lists about m4 are at https://savannah.gnu.org/mail/?group=m4.
m4 官方邮件讨论组:https://savannah.gnu.org/mail/?group=m4
Happy m4 hacking!
祝 m4 学习与开发顺利!
Web Paging - Tips and Hints on M4 quoting
网页编写:M4 引号使用技巧与提示
The
m4' macro processor is widely available on all UNIXes. Usually, only a small percentage of users are aware of its existence. However, those who do often become commited users.m4' 宏处理器在所有 UNIX 系统中均默认预装,但仅有少量使用者了解其存在,接触过的使用者大多会长期深耕使用。...
Some people found
m4' to be fairly addictive. They first usem4' for simple problems, then take bigger and bigger challenges, learning how to write complexm4' sets of macros along the way. Once really addicted,they pursue writing of sophisticatedm4applications even to solve simple problems, devoting more time debugging theirm4scripts than doing real work. Beware thatm4may be dangerous for the health of compulsive programmers. 部分使用者会对m4产生极强的使用黏性。起初只用m4处理简单事务,之后不断承接更复杂的需求,在过程中掌握编写复杂m4宏集的方式。一旦深度沉迷,即便面对简单任务也会刻意开发复杂m4应用,投入在脚本调试上的时间甚至多于实际业务工作。习惯沉浸式编程的使用者需留意,m4` 容易让人过度投入。
GNU m4 info pageGNU m4 官方信息文档
The m4 macro language isn't very hard to master, but does pose some unique challenges once past the rudimentory simple macro substitution. However, it's certainly an ``addiction '' where the accomplished artist seeks out ever more complicated and dangerous tasks.
m4 宏语言的入门门槛不高,但跨过基础宏替换阶段后,会遇到诸多独有的难点。同时它具备独特的吸引力,熟练使用者会主动尝试更复杂、更有难度的编写场景。
Quite simply, there are things you can do with m4 that can't easily be done any other way. On this page, I highlight some quoting techniques that lead to more efficient macro substitutions and may help avoid problems. These tips aren't obvious and only came after gaining much experience in the crafting of this m4 -to-html package.
直白来说,部分借助 m4 实现的逻辑很难用其他方式简易替代。本文整理若干引号使用技巧,可提升宏替换效率并规避常见问题。这些用法并无直观规律,均是在编写 m4 转 HTML 工具包的大量实践中总结得出。
Order of Nested Macro Substitutions
嵌套宏替换的执行顺序
Macros with unquoted arguments are evaluated from inside out. For example:
无引号入参的宏遵循由内向外的求值规则,示例如下:
define(`_i', 0)dnl define counter
define(`_xinc', `define(`$1',incr($1))')dnl x++
define(`_m1', `_xinc(`_i') macro 1 :_i: ($*)')dnl
define(`_m2', `_xinc(`_i') macro 2 :_i: ($*)')dnl
dnl
_m1(_m2(some text :_i:))
the dnl macro just leaves off any text after it, even the new-line character. It's a good way to add comments to some tricky code. The results with passing the text to m4 is shown below:
dnl 会舍弃自身后续所有字符,包含换行符,适合为复杂代码添加注释。将以上代码交由 m4 处理,输出结果如下:
macro 1 :2: ( macro 2 :1: (some text :0:))
Notice that the ``increment'' variable _i which is initialized to 0 shows the inside text of macro _m2 is evaluated first and then _m2 before _m1. This is exactly the opposite of what you might expect.
初始值为 0 0 0 的自增变量 _i 可以直观印证执行顺序:_m2 内部逻辑最先求值,随后执行 _m2 整体,最后才执行 _m1,与常规直观认知的执行顺序完全相反。
Order of Nested Quoted Macro Substitutions
带引号嵌套宏替换的执行顺序
By quoting the arguments to each of the macros (show below)
为每层宏的入参添加引号,写法如下:
_m1(`_m2(`some text :_i:')')
will yield the following results.
对应输出结果:
macro 1 :1: ( macro 2 :2: (some text :2:))
Clearly showing that _m1 is evaluated fully before continuing on with the rest of the text inside. Note that putting the quotes in the definition around the $* has no effect, but may have unwanted side effects later.
结果可明确看出,_m1 会先完整求值完毕,再处理内部嵌套文本。在宏定义中为 $* 单独添加引号不会改变本次执行逻辑,却可能在后续场景引发非预期行为。
Depending on what you're trying to accomplish; generally, if you're passing a macro as an argument to another macro then quote the argument. This will force outer-to-inner evaluation of the macros.
可根据业务需求灵活选用规则;通用原则:若将一个宏作为入参传入另一个宏,需为入参添加引号,强制遵循由外向内的求值顺序。
When not to Quote Macro Arguments
无需为宏入参添加引号的场景
Given the above sections, it would appear that you should always quote any arguments. There are some macros that you should probably not quote. These are the numerical ones - incr, decr, and eval. The following example illustrates some uses.
结合前文内容易产生一个认知:所有宏入参都应添加引号。但数值运算类宏不适合加引号,包含 incr、decr、eval,通过以下示例可直观理解。
define(`swp',1)dnl
define(`swapcolors',`define(`swp',eval(incr(swp) % 2))dnl
ifelse(swp,0,`Red',`Blue')')dnl
swapcolors swapcolors swapcolors
The macro swp is our variable which contains a numerical value. The macro swapcolors is our procedure which changes the variable swp by redefining it and outputs text dependent on its value. If the eval or incr macros were to have their arguments quoted, it would delay the interpretation and instead of numerical values they would see text instead, which causes m4 to complain: *Non-numeric argument to built-in incr'*. swp 作为数值变量使用,swapcolors 作为过程宏,通过重定义 swp 修改数值,并依据数值输出对应文本。若为 eval、incr 的入参添加引号,会延迟表达式解析,宏只会读取字面文本而非数值,触发 m4 报错:内置宏 incr' 收到非数值入参。
The output looks like this
程序输出如下:
Red Blue Red
Evaluations of ifdefs and ifelses
ifdef 与 ifelse 的求值逻辑
Take for example the following m4 code:
参考以下 m4 代码示例:
define(`_m', `macro :$*:')dnl
ifdef(_x,_m(some text))dnl
which unremarkably produces no output since _x is not defined. However, by looking at the GNU m4 --debug=aeqt output
因 _x 未定义,代码无任何输出。开启 GNU m4 调试参数 --debug=aeqt 后,可看到底层执行细节:
m4trace: -1- define(`_m', `macro :$*:')
m4trace: -1- dnl
m4trace: -2- _m(`some text') -> `macro :some text:'
m4trace: -1- ifdef(`_x', `macro :some text:')
m4trace: -1- dnl
This demonstrates that the _m macro is being needlessly evaluated even though it will not ultimately be viewed. By quoting the conditional argument
日志可看出:即便最终不会输出内容,_m 宏仍被无效求值。为条件分支内的宏调用添加引号即可优化:
ifdef(_x,`_m(some text)')dnl
We see that the _m macro evaluation is avoided.
优化后可跳过 _m 宏的无效求值:
m4trace: -1- define(`_m', `macro :$*:')
m4trace: -1- dnl
m4trace: -1- ifdef(`_x', `_m(some text)')
m4trace: -1- dnl
This example is linked to the nested macro example above and shows that judicious quoting can lead to more efficient macro evaluations especially with conditional macros, ifdef and ifelse.
该示例与前文嵌套宏规则关联,合理使用引号可减少无效求值,提升条件宏 ifdef、ifelse 的执行效率。
Having one macro redefine another macro
单宏重定义其他宏的写法
A useful trick sometimes is to have one macro, when called, to redefine another macro. This is useful if you need to modify the actions of the other macro for special circumstances.
实用编写技巧:调用某个宏时,动态重定义另一个宏。适合特殊场景下临时修改原有宏逻辑的需求。
The following might be the naïve approach:
以下是常规但存在缺陷的写法:
define(`xxx', `XXX-$1')dnl
xxx(abc)
define(`redxxx',
`--$1--
pushdef(`xxx', `YYY=$1')dnl')dnl
redxxx(abc)
xxx(xyz)
popdef(`xxx')dnl
xxx(xyz)
where we define xxx', then modify it when redxxx' is invoked. The argument passed to redxxx' is for some banner, and is not intended to be part of xxx' in any fashion. The results of this code generates the following:
代码预先定义 xxx 宏,调用 redxxx 时尝试重定义 xxx;传入 redxxx 的入参仅用作装饰文本,不应参与 xxx 宏的赋值逻辑。代码实际输出:
XXX-abc
--abc--
YYY=abc
XXX-xyz
where we expect the redefined xxx(xyz)' to show 预期重定义后的 xxx(xyz)' 应输出:
YYY=xyz
The problem is that the $1 in the pushdef'` redefinition of xxx' is interpreted as the first argument of the macro ``redxxx', and is then substituted into the redefinition. This is not what we intended.
问题根源:pushdef 重定义 xxx 时,内部 $1 被识别为 redxxx 的入参,直接带入宏定义,违背编写初衷。
We need to delay the interpretation of $1. That can be done by inserting quote marks:
解决方案:添加引号延迟 $1 的解析时机:
define(`redxxx',
`--$1--
pushdef(`xxx', `YYY=$'`1')dnl')dnl
which will give the correct results. However, this technique can be difficult to get it right. You need to know the correct level of macro nesting.
修正后可得到符合预期的输出,但这种引号嵌套写法对层级把控要求较高,易出错。
Another, and in my opinion, easier approach is to define an ``alternative '' macro and use it later:
更简洁易懂的替代方案:预先定义备用宏,再通过栈操作替换原有宏:
define(`xxx', `XXX-$1')dnl
xxx(abc)
define(`altxxx', `YYY=$1')dnl
define(`redxxx',
`--$1--
pushdef(`xxx', defn(`altxxx'))dnl')dnl
redxxx(abc)
xxx(xyz)
popdef(`xxx')dnl
xxx(xyz)
which calls up the definition for altxxx' and then pushes it onto the definition for xxx'. Now we get the expected results:
该方式读取 altxxx 的完整定义,替换存入 xxx 宏定义栈,执行结果符合预期:
XXX-abc
--abc--
YYY=xyz
XXX-xyz
Note the use of pushdef'` and popdef'` to *push*'' and pop '' the definitions onto the stack. This way you can easily recall the previous definition, when the exceptional condition is no longer in effect.
补充说明:pushdef 与 popdef 可将宏定义压入、弹出栈结构,临时替换宏逻辑后,能便捷恢复原有定义,适配临时特殊场景的开发需求。
Last Modified:
最后修改时间:
2002/12/03 17:56:40
Brought to you by:
供稿:
R.K. Owen,Ph.D.
R.K. 欧文 博士
M4 宏处理器及二次扫描相关解析
一、M4 宏处理器概述
M4 是一种宏处理与预处理语言,既不属于编译语言,也不属于通用编程语言。其核心功能为生成代码、生成配置与生成文本,不具备直接运行程序的能力。
二、M4 宏处理器语法规则
(一)宏定义与调用
使用 define(宏名, 替换内容) 定义宏,直接书写宏名即可完成调用。
(二)参数宏
支持 $1、$2、...、$9(标准版)或无限参数(GNU 版),其中 $@ 表示全部参数。
(三)引号机制
以反引号 ````` 开头,以单引号 ' 结尾,用于延迟展开并避免误替换。
(四)二次扫描(Rescan)
宏替换后的内容将重新送入输入流并再次解析,该机制是 M4 功能拓展的重要支撑。二次扫描采用两轮处理逻辑,对文本、宏结构执行先归集定义,再统一展开解析的流程,可消解引用顺序倒置、嵌套层级无法完全展开、变量标识未及时识别等问题。
(五)内置宏
| 宏 | 功能 |
|---|---|
dnl |
删除至行尾(含换行符) |
ifelse |
条件判断 |
eval |
整数表达式求值(支持 +、-、*、/、%、^、比较与逻辑运算) |
incr |
数值自增 1 |
decr |
数值自减 1 |
divert、undivert |
输出分流 |
include、sinclude |
文件引入 |
eval 为通用算术宏,内部支持完整的 C 风格整数运算语法,包括一元正负、幂运算、乘除取模、加减、关系比较、逻辑非、逻辑与、逻辑或等;incr 与 decr 仅为数值增减 1 的便捷简写形式。M4 本身不支持浮点运算及复杂数学函数,需通过 esyscmd 调用外部程序实现。此外,GNU M4 还扩展了 format 宏(类似 C 语言 printf,用于格式化输出),该宏与 esyscmd 在数值处理场景中均有广泛应用。
(六)递归与循环
M4 无原生循环结构,需依靠递归与 ifelse 实现循环功能。
(七)大小写敏感
M4 宏名区分大小写。
(八)注释方式
dnl:行内注释#:行注释divert(-1):块注释
三、二次扫描详解
(一)二次扫描具体功能
- 支持内容前向引用,允许宏、标识先行使用,后续再补充定义,单次扫描流程无法适配此类编写形式。
- 逐层拆解并展开多层嵌套宏与多级变量结构,完成全层级文本替换与表达式解析工作。
- 梳理多元素之间的依赖链路,规整解析次序,完成关联内容的有序处理。
- 适配算术运算、字符串拼接、分支逻辑等动态语法结构,保障表达式求值与语法解析的完整性。
- 实现解析流程延迟化,首轮遍历归集全部定义与引用关联,次轮统一执行内容展开与数值求值。
(二)具备等效逻辑的同类机制
以下技术体系均遵循多轮遍历、先归集后解析的设计逻辑,运行逻辑与二次扫描相互等效:
- 宏处理体系:C 语言
#define宏、C 预处理器 CPP、GNU M4 宏处理器。 - 模板渲染引擎:Jinja2、Smarty、Mustache,依靠多轮解析完成变量填充、布局继承与循环结构渲染。
- 配置构建机制:Makefile 变量二次展开、Shell 变量与命令替换、通用结构化配置文件解析器。
- 编译处理流程:编译型语言双遍编译,首轮构建符号表与函数标识库,次轮完成语法校验与地址绑定。
- 动态执行接口:Perl、Python
eval、JavaScriptnew Function,通过文本缓存延后执行,实现分阶段解析。
(三)二次扫描与预处理器的区别
- 所属范畴:预处理器为独立工具形态,承担文本预处理、宏替换、条件编译、文件引入等全套前置处理任务;二次扫描属于底层执行算法,可为预处理器、解析器、模板引擎等各类程序提供遍历处理逻辑。
- 功能覆盖:预处理器涵盖文件导入、条件分支、注释过滤、字符转义等完整预处理能力;二次扫描仅负责多轮遍历、嵌套展开、前向引用解析,无法独立完成全套预处理任务。
- 从属关联:主流预处理器均内置二次扫描逻辑,用以支撑宏嵌套与前向引用解析;二次扫描可跨场景复用,适配预处理器、模板引擎、编译器等多种载体,不局限于预处理器范畴。
- 适用粒度:预处理器多用于工程级文件批量预处理;二次扫描可作用于单段文本、单行表达式、局部宏片段,适用范围更细化。
(四)除 M4 外支持二次扫描的工具
- 编译预处理类:C/C++ 预处理器
cpp、GCC 内置预编译流程。 - 构建脚本类:GNU Make、CMake 脚本解析器。
- 文档排版类:TeX / LaTeX、SASS / SCSS 预编译器。
- 脚本配置类:Bash/Zsh 脚本解析器、Nginx 配置解析器、Ansible 模板解析器。
- 代码生成类:Autoconf、Automake、SWIG 接口生成器。
(五)二次扫描实际应用方式
- 宏编写场景:编写多层嵌套宏、跨段落引用宏时,启用工具自带的二次扫描机制,规避单次扫描带来的展开不全、标识识别异常等问题。
- 配置工程场景:在 Makefile、CMake 中采用延迟赋值变量,依托二次扫描延后变量求值,适配多层变量嵌套依赖与递归赋值场景。
- 模板开发场景:设计模板继承、嵌套布局与动态变量逻辑,借助模板引擎多轮扫描特性,完成结构组装与变量渲染。
- 代码生成场景:依托 M4、Autoconf 等工具生成源码与配置文件,首轮归集全部宏定义与参数,次轮批量展开生成最终文本。
- 自定义解析场景:自研配置解析器或文本处理工具时,采用两轮遍历架构;首轮扫描登记所有标识与定义,次轮逐段替换求值,兼容前向引用与嵌套表达式结构。
四、M4 宏处理器的应用场景
- 配置文件自动生成:典型案例为 sendmail.cf 的生成,亦可批量生成各类配置文件。
- 构建系统组件:GNU Autoconf / Automake 的底层基于 M4 实现,是构建系统的重要组成部分。
- 代码与模板生成:用于生成 C 代码、脚本、配置模板等,实现代码片段的复用。
- 静态网页生成:通过宏定义页面结构,批量生成 HTML 页面。
- 文本预处理、批量替换与格式化:用于批量文本转换、代码片段复用,提升文本处理效率。
- 嵌入式与系统工具配置渲染:M4 轻量且无依赖,适用于系统级工具的配置渲染需求。
五、M4 宏处理器的优缺点
(一)优点
- 超轻量、跨平台、预装率极高:UNIX / Linux 系统几乎默认自带。
- 纯文本处理:不依赖编译环境。
- 宏能力全面:支持递归、条件判断、数学运算、文件操作、输出分流。
- 适合生成代码与配置:较脚本语言更为简洁。
- 可嵌入构建流程:作为预处理环节使用。
(二)缺点
- 语法晦涩:引号嵌套极易出错。
- 二次扫描机制调试难度较高。
- 缺乏现代语言特性:无调试系统、类型系统与模块系统。
- 可读性较差:复杂宏的维护成本较高。
- 非通用编程语言:无法用于开发应用程序。
六、M4 宏编程与其他编程语言的区别
- 定位不同:M4 为宏预处理与文本替换工具;其他编程语言用于实现逻辑、开发程序。
- 执行模型不同:M4 执行流程为扫描→替换→输出文本;编程语言执行流程为编译/解释→执行指令→完成计算。
- 不生成可执行程序:M4 仅产出文本、代码与配置,不产出二进制可执行文件。
- 无完整运行环境:M4 不具备变量、类型、内存管理、I/O 等完整运行环境,仅提供宏、替换与文本操作。
- 差异:二次扫描机制为 M4 特有,所有编程语言均不具备替换后重新解析的机制。
- 使用场景不同:M4 用于生成代码;其他编程语言用于运行程序。
Reference
- Introduction to the m4 Macro Processor - m4.pdf
https://www.nesssoftware.com/home/mwc/doc/coherent/manual/pdf/m4.pdf - Macro Magic: M4 Complete Guide
https://www.linuxtoday.com/blog/macro-m4-guide/ - Web Paging - Tips and Hints on M4 quoting
http://owen.sj.ca.us/rkowen/howto/webpaging/m4tipsquote.html - 宏 - 让这世界再多一份 GNU m4 教程 (1) - 死循环 - SegmentFault 思否
https://segmentfault.com/a/1190000004104696
宏 - 让这世界再多一份 GNU m4 教程 (2) - 死循环 - SegmentFault 思否
https://segmentfault.com/a/1190000004108113 - gnu-m4 - 让这世界再多一份 GNU m4 教程 (3) - 死循环 - SegmentFault 思否
https://segmentfault.com/a/1190000004128102 - gnu-m4 - 让这世界再多一份 GNU m4 教程 (4) - 死循环 - SegmentFault 思否
https://segmentfault.com/a/1190000004131031 - gnu-m4 - 让这世界再多一份 GNU m4 教程(终结篇) - 死循环 - SegmentFault 思否
https://segmentfault.com/a/1190000004137562