拥有自己的解析器(C#实现LALR(1)语法解析器和miniDFA词法分析器的生成器)

拥有自己的解析器(C#实现LALR(1)语法解析器和miniDFA词法分析器的生成器)

参考lex和yacc的输入格式,参考虎书《现代编译原理-C语言描述》的算法,不依赖第三方库,大力整合优化,实现了LALR(1)语法解析器和miniDFA词法分析器的C#生成器(暂命名为bitParser)。

可在(https://gitee.com/bitzhuwei/bitParser-demos)下载ANSI C语言、GLGL4.60.8和各类测试用例的解析器完整代码。

https://www.cnblogs.com/bitzhuwei/p/18544785)列示了实现词法分析器和语法分析器的生成器的全部算法。

1234+567+89+0+0的语法树生成过程☟

(1234+567)/(89-0)的语法树生成过程☟

可在(https://www.cnblogs.com/bitzhuwei/p/18679529)查看更多语法树的生成过程。(想看自定义表达式的语法树生成过程的同学请留言)

词法分析器的生成器

  • 分别生成DFA和最小化DFA(以下简称miniDFA)的词法分析器代码(状态转换表、保留字、Token类型等)

  • 支持全Unicode字符、/后缀、前缀<'Vt'>、状态信号<signal1, signal2, ..>,便于识别id = refIdstruct type_name<Comment>[^*\n]*subroutine(type_name_list)这类情况。类似lex,但不完全相同。

  • 无须显式书写Token类型、状态信号、保留字,即无须lex中的%s NUM ID ..%x Comment End Text ..[(] { return ('('); }

  • 注释详尽。每个词法状态的每个条件分支都在注释中说明其正则表达式、导向的Token类型(和signal)等。

  • 生成ε-NFA、NFA、DFA、miniDFA的状态图(mermaid格式)和各类型Token的状态图(mermaid格式),便于学习和调试。

语法分析器的生成器

  • 分别生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的语法分析器代码(分析表、规则列表、语法树结点类型等)。

  • 支持优先级指令%nonassoc%left%right%prec,自动解决Shift/Reduce、Reduce/Reduce冲突,并在分析表代码的注释中列示之。

  • 注释详尽。在注释中列示:每个语法状态的LR项和lookahead;冲突数量、已解决数量、未解决数量等。

  • 生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的状态图(mermaid格式)和状态表(markdown格式),便于学习和调试。

  • 生成nullable、FIRST集、FOLLOW集的文档。

其他

  • 无须lex和yacc中的%option%union%define%{%}%parse-param%lex-param%pure-parser%expect%token%type。做成类库,直接按如下方式调用即可:
csharp 复制代码
var compiler = new CompilerXxx();
var sourceCode = File.ReadAllText("input.st");
var tokens = compiler.Analyze(sourceCode);
var syntaxTree = compiler.Parse(tokens);
var extractedObj = compiler.Extract(syntaxTree, tokens, sourceCode);
// use extractedObj for user-defined business ..
  • 支持多行注释指令%blockComment on/off和单行注释指令%inlineComment on/off。默认格式同C语言的/**///,可自定义其格式,例如:在解析VRML文件时,将单行注释的格式定义为从#到行尾:
smalltalk 复制代码
%%#[^\r\n]*%% 'inlineComment'
  • 生成遍历语法树提取语义信息的框架,提供适用各种语言的源代码格式化算法。可用于格式化、进一步生成中间代码等后续业务逻辑。

  • 大力优化,例如生成ANSI C语言的全部解析器代码+文档只需3秒,生成GLSL4.60.8的全部解析器代码+文档只需9秒。

点击查看 其他功能

  • 支持Scope范围指令%validScopeChars和全局范围指令%validGlobalChars,默认范围均为[\u0001-\uFFFF](即除'\0'外的全部Unicode字符),可自定义其范围。

  • 支持%omit指令,可指定要忽略的空白符。默认为'空格''\t''\n''\r''\0'

  • 支持%start指定起始语法结点。

举例-Calc.st

输入文件Calc.st

能够处理加减乘除和括号运算的解析器,其文法如下:

smalltalk 复制代码
// 输入文件Calc.st
Exp    : Exp '+' Term
       | Exp '-' Term
       | Term ;
Term   : Term '*' Factor
       | Term '/' Factor
       | Factor ;
Factor : '(' Exp ')'
       | 'number' ;

%%[0-9]+%% 'number' // 示例只处理非负整数
//无须书写 %%[+]%% '+' 等

据此文法,我们可以生成下述内容:

生成的词法分析器

生成的ε-NFA的词法分析器的状态图如下:

生成的miniDFA的词法分析器的状态图如下:

点击查看 生成的 终结点Vt和非终结点Vn 代码

csharp 复制代码
// 如不需要,可删除此数组
public static readonly IReadOnlyList<string> stArray = new string[] {
    "'¥'", // @终 = 0;
    "'+'", // @Plus符 = 1; // '+'
    "'-'", // @Dash符 = 2; // '-'
    "'*'", // @Asterisk符 = 3; // '*'
    "'/'", // @Slash符 = 4; // '/'
    "'('", // @LeftParenthesis符 = 5; // '('
    "')'", // @RightParenthesis符 = 6; // ')'
    "'number'", // @number = 7; // 'number'
    // end of 1 + 7 Vt
    "Exp", // Exp枝 = 8; // Exp
    "Term", // Term枝 = 9; // Term
    "Factor", // Factor枝 = 10; // Factor
    // end of 3 Vn
};
/// <summary>
/// Vt types are used both for lexical-analyze and syntax-parse.
/// <para>Vn types are only for syntax-parse.</para>
/// <para>Vt is quoted in ''.</para>
/// <para>Vn is not quoted in ''.</para>
/// </summary>
public static class st {
    // Vt
    /// <summary>
    /// Something wrong within the source code.
    /// </summary>
    public const int Error错 = -1; // "'×'";

    /// <summary>
    /// end of token list.
    /// </summary>
    public const int @终 = 0; // "'¥'";
   
    /// <summary>
    /// '+'
    /// </summary>
    public const int @Plus符 = 1; // "'+'"
    /// <summary>
    /// '-'
    /// </summary>
    public const int @Dash符 = 2; // "'-'"
    /// <summary>
    /// '*'
    /// </summary>
    public const int @Asterisk符 = 3; // "'*'"
    /// <summary>
    /// '/'
    /// </summary>
    public const int @Slash符 = 4; // "'/'"
    /// <summary>
    /// '('
    /// </summary>
    public const int @LeftParenthesis符 = 5; // "'('"
    /// <summary>
    /// ')'
    /// </summary>
    public const int @RightParenthesis符 = 6; // "')'"
    /// <summary>
    /// 'number'
    /// </summary>
    public const int @number = 7; // "'number'"
    /// <summary>
    /// count of ('¥' + user-defined Vt)
    /// </summary>
    public const int VtCount = 8;


    // Vn
    /// <summary>
    /// Exp
    /// </summary>
    public const int Exp枝 = 8; // "Exp"
    /// <summary>
    /// Term
    /// </summary>
    public const int Term枝 = 9; // "Term"
    /// <summary>
    /// Factor
    /// </summary>
    public const int Factor枝 = 10; // "Factor"
}

点击查看 生成的 保留字(即一个语言中的keyword) 相关代码

csharp 复制代码
public static class reservedWord {
    /// <summary>
    /// +
    /// </summary>
    public const string @Plus符 = "+";
    /// <summary>
    /// -
    /// </summary>
    public const string @Dash符 = "-";
    /// <summary>
    /// *
    /// </summary>
    public const string @Asterisk符 = "*";
    /// <summary>
    /// /
    /// </summary>
    public const string @Slash符 = "/";
    /// <summary>
    /// (
    /// </summary>
    public const string @LeftParenthesis符 = "(";
    /// <summary>
    /// )
    /// </summary>
    public const string @RightParenthesis符 = ")";
}

/// <summary>
/// if <paramref name="token"/> is a reserved word, assign correspond type and return true.
/// <para>otherwise, return false.</para>
/// </summary>
/// <param name="token"></param>
/// <returns></returns>
private static bool CheckReservedWord(AnalyzingToken token) {
    bool isReservedWord = true;
    switch (token.value) {
    case reservedWord.@Plus符: token.type = st.@Plus符; break;
    case reservedWord.@Dash符: token.type = st.@Dash符; break;
    case reservedWord.@Asterisk符: token.type = st.@Asterisk符; break;
    case reservedWord.@Slash符: token.type = st.@Slash符; break;
    case reservedWord.@LeftParenthesis符: token.type = st.@LeftParenthesis符; break;
    case reservedWord.@RightParenthesis符: token.type = st.@RightParenthesis符; break;

    default: isReservedWord = false; break;
    }

    return isReservedWord;
}

以下是用8个Action<LexicalContext, char, CurrentStateWrapper>函数委托实现的词法状态转换表
点击查看 生成的 lexi状态0 相关代码

csharp 复制代码
// lexicalState0
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState0 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* [0-9] */
	else if (/* possible Vt : 'number' */
	'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState1;
	}
	/* \) */
	else if (/* possible Vt : ')' */
	c == ')'/*'\u0029'(41)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState2;
	}
	/* \( */
	else if (/* possible Vt : '(' */
	c == '('/*'\u0028'(40)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState3;
	}
	/* \/ */
	else if (/* possible Vt : '/' */
	c == '/'/*'\u002F'(47)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState4;
	}
	/* \* */
	else if (/* possible Vt : '*' */
	c == '*'/*'\u002A'(42)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState5;
	}
	/* - */
	else if (/* possible Vt : '-' */
	c == '-'/*'\u002D'(45)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState6;
	}
	/* \+ */
	else if (/* possible Vt : '+' */
	c == '+'/*'\u002B'(43)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState7;
	}
	/* deal with everything else. */
	else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') {
		wrapper.currentState = lexicalState0; // skip them.
	}
	else { // unexpected char.
		BeginToken(context);
		ExtendToken(context);
		AcceptToken(st.Error, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态1 相关代码

csharp 复制代码
// lexicalState1
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState1 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* [0-9] */
	else if (/* possible Vt : 'number' */
	'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
		ExtendToken(context);
		wrapper.currentState = lexicalState1;
	}
	/* deal with everything else. */
	else {
		AcceptToken(st.@number, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态2 相关代码

csharp 复制代码
// lexicalState2
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState2 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@RightParenthesis符, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态3 相关代码

csharp 复制代码
// lexicalState3
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState3 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@LeftParenthesis符, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态4 相关代码

csharp 复制代码
// lexicalState4
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState4 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@Slash符, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态5 相关代码

csharp 复制代码
// lexicalState5
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState5 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@Asterisk符, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态6 相关代码

csharp 复制代码
// lexicalState6
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState6 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@Dash符, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态7 相关代码

csharp 复制代码
// lexicalState7
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState7 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@Plus符, context);
		wrapper.currentState = lexicalState0;
	}
};

其调用过程如下:

csharp 复制代码
// analyze the specified sourceCode and return a list of Token.
public TokenList Analyze(string sourceCode) {
    var context = new LexicalContext(sourceCode);
    var wrapper = new CurrentStateWrapper(this.initialState);
    do {
		char currentChar = context.MoveForward();
        wrapper.currentState(context, currentChar, wrapper);
        // wrapper.currentState will be set to next lexi-state.
    } while (!context.EOF);

    return context.result;
}

下面是用一个二维数组ElseIf[][]实现的词法状态转换表,其占用空间较少,且执行效率也有所提高。

csharp 复制代码
private static readonly ElseIf[][] lexiStates = new ElseIf[8][];
static void InitializeLexiTable() {
	lexiStates[0] = new ElseIf[] {
	// possible Vt: '('
	/*0*/new('('/*'\u0028'(40)*/, Acts.Begin | Acts.Extend, 3),
	// possible Vt: ')'
	/*1*/new(')'/*'\u0029'(41)*/, Acts.Begin | Acts.Extend, 2),
	// possible Vt: '*'
	/*2*/new('*'/*'\u002A'(42)*/, Acts.Begin | Acts.Extend, 5),
	// possible Vt: '+'
	/*3*/new('+'/*'\u002B'(43)*/, Acts.Begin | Acts.Extend, 7),
	// possible Vt: '-'
	/*4*/new('-'/*'\u002D'(45)*/, Acts.Begin | Acts.Extend, 6),
	// possible Vt: '/'
	/*5*/new('/'/*'\u002F'(47)*/, Acts.Begin | Acts.Extend, 4),
	// possible Vt: 'number'
	/*6*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 1),
	};
	lexiStates[1] = new ElseIf[] {
	// possible Vt: 'number'
	/*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 1),
	// possible Vt: 'number'
	/*1*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@number),
	};
	lexiStates[2] = new ElseIf[] {
	// possible Vt: ')'
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@RightParenthesis符),
	};
	lexiStates[3] = new ElseIf[] {
	// possible Vt: '('
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@LeftParenthesis符),
	};
	lexiStates[4] = new ElseIf[] {
	// possible Vt: '/'
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Slash符),
	};
	lexiStates[5] = new ElseIf[] {
	// possible Vt: '*'
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Asterisk符),
	};
	lexiStates[6] = new ElseIf[] {
	// possible Vt: '-'
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Dash符),
	};
	lexiStates[7] = new ElseIf[] {
	// possible Vt: '+'
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Plus符),
	};
}

顾名思义,一个ElseIf与函数委托方式中的一个else if ('0' <= c && c <= '9') { .. }的作用相同。这样,就用一个ElseIf[]取代了一个函数委托;而且在调用此表时可以用折半查找方式快速定位ElseIf
点击查看 调用二维数组实现的词法分析器 相关代码

csharp 复制代码
// skip '\0' at lexi-state 0
private static readonly ElseIf skipZero = new(
    char.MinValue, char.MaxValue, Acts.None,
    nextState: 0);
// construct a error token at lexi-state 0
private static readonly ElseIf unexpectedChar = new(
    char.MinValue, char.MaxValue, Acts.Begin | Acts.Extend | Acts.Accept,
    nextState: 0, -1);// -1 means error("'×'");
// construct a error token at other lexi-states
private static readonly ElseIf errorToken = new(
    char.MinValue, char.MaxValue, Acts.Extend | Acts.Accept,
    nextState: 0, -1);// -1 means error("'×'");

// analyze the specified sourceCode and return a list of Token.
public TokenList Analyze(string sourceCode) {
    var context = new LexicalContext(sourceCode);
    var currentStateId = 0;
    do {
        // read current char,
        char currentChar = context.MoveForward();
        ElseIf[] lexiState = lexiStates[currentStateId];
        // binary search for the segment( else if (left <= c && c <= right) { ... } )
        var segment = BinarySearch(currentChar, lexiState, currentStateId != 0);
        if (segment is null) {
            if (currentStateId == 0) { // the initial state of lexical analyze.
                segment = BinarySearch(currentChar, this.omitChars, currentStateId != 0);
                if (segment is null) {
                    // '\0' must be skipped.
                    if (currentChar == '\0') { segment = skipZero; }
                    else { segment = unexpectedChar; }
                }
            }
            else { // token with error type
                segment = errorToken;
            }
        }
        // construct the next token,
        var scripts = segment.scripts;
        if (scripts != 0) { // it is 0 in most cases.
            if ((scripts & Acts.Begin) != 0) {
                this.beginToken(context);
            }
            if ((scripts & Acts.Extend) != 0) {
                this.extendToken(context);
            }

            if ((scripts & Acts.Accept) != 0) {
                this.acceptToken(context, segment.Vt);
            }
            else if ((scripts & Acts.Accept2) != 0) {
                this.acceptToken2(context, segment.ifVts);
            }
        }
        // point to next state.
        currentStateId = segment.nextStateId;
    } while (!context.EOF);

    return context.result;
}

private ElseIf? BinarySearch(char currentChar, ElseIf[] lexiState, bool specialLast) {
    var left = 0; var right = lexiState.Length - (specialLast ? 2 : 1);
    if (right < 0) { }
    else {
        var result = -1;
        while (left < right) {
            var mid = (left + right) / 2;
            var current = lexiState[mid];
            if (currentChar < current.min) { result = -1; }
            else if (current.max < currentChar) { result = 1; }
            else { return current; }

            if (result < 0) { right = mid; }
            else { left = mid + 1; }
        }
        {
            var current = lexiState[left];
            if (current.min <= currentChar && currentChar <= current.max) {
                return current;
            }
        }
    }
    if (specialLast) {
        var last = lexiState[lexiState.Length - 1];
        return last;
        // no need to compare, because it's ['\0', '\uFFFF']
        //if (last.min <= currentChar && currentChar <= last.max) {
        //    return last;
        //}
    }

    return null;
}

生成的语法分析器

点击查看 nullable、FIRST集、FOLLOW集

C 复制代码
nullable:
[0]: nullable( Exp' ) = False
[1]: nullable( Exp ) = False
[2]: nullable( Term ) = False
[3]: nullable( Factor ) = False
[4]: nullable( '¥' ) = False
[5]: nullable( '+' ) = False
[6]: nullable( '-' ) = False
[7]: nullable( '*' ) = False
[8]: nullable( '/' ) = False
[9]: nullable( '(' ) = False
[10]: nullable( ')' ) = False
[11]: nullable( 'number' ) = False

FIRST集:
[0]: FIRST( Exp' ) = { '(' 'number' }
[1]: FIRST( Exp ) = { '(' 'number' }
[2]: FIRST( Term ) = { '(' 'number' }
[3]: FIRST( Factor ) = { '(' 'number' }
[4]: FIRST( '¥' ) = { '¥' }
[5]: FIRST( '+' ) = { '+' }
[6]: FIRST( '-' ) = { '-' }
[7]: FIRST( '*' ) = { '*' }
[8]: FIRST( '/' ) = { '/' }
[9]: FIRST( '(' ) = { '(' }
[10]: FIRST( ')' ) = { ')' }
[11]: FIRST( 'number' ) = { 'number' }
[12]: FIRST( Exp '+' Term ) = { '(' 'number' }
[13]: FIRST( Exp '-' Term ) = { '(' 'number' }
[14]: FIRST( Term '*' Factor ) = { '(' 'number' }
[15]: FIRST( Term '/' Factor ) = { '(' 'number' }
[16]: FIRST( '(' Exp ')' ) = { '(' }

FOLLOW集:
[0]: FOLLOW( Exp' ) = { '¥' }
[1]: FOLLOW( Exp ) = { '-' ')' '+' '¥' }
[2]: FOLLOW( Term ) = { '-' ')' '*' '/' '+' '¥' }
[3]: FOLLOW( Factor ) = { '-' ')' '*' '/' '+' '¥' }

点击查看 生成的 规约规则 代码

csharp 复制代码
public static IReadOnlyList<Regulation> Regulations => regulations;
private static readonly Regulation[] regulations = new Regulation[] {
	// [0]=Exp : Exp '+' Term ;
	new(0, st.Exp枝, st.Exp枝, st.@Plus符, st.Term枝), 
	// [1]=Exp : Exp '-' Term ;
	new(1, st.Exp枝, st.Exp枝, st.@Dash符, st.Term枝), 
	// [2]=Exp : Term ;
	new(2, st.Exp枝, st.Term枝), 
	// [3]=Term : Term '*' Factor ;
	new(3, st.Term枝, st.Term枝, st.@Asterisk符, st.Factor枝), 
	// [4]=Term : Term '/' Factor ;
	new(4, st.Term枝, st.Term枝, st.@Slash符, st.Factor枝), 
	// [5]=Term : Factor ;
	new(5, st.Term枝, st.Factor枝), 
	// [6]=Factor : '(' Exp ')' ;
	new(6, st.Factor枝, st.@LeftParenthesis符, st.Exp枝, st.@RightParenthesis符), 
	// [7]=Factor : 'number' ;
	new(7, st.Factor枝, st.@number), 
};

点击查看 生成的 LALR(1)语法分析表 代码

csharp 复制代码
const int syntaxStateCount = 16;
// LALR(1) syntax parse table
private static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[]
    syntaxStates = new Dictionary<string, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
	var states = CompilerExp.syntaxStates;
	// 78 actions
	// conflicts(0)=not sovled(0)+solved(0)(0 warnings)
	#region create objects of syntax states
	states[0] = new(capacity: 5);
	states[1] = new(capacity: 3);
	states[2] = new(capacity: 6);
	states[3] = new(capacity: 6);
	states[4] = new(capacity: 5);
	states[5] = new(capacity: 6);
	states[6] = new(capacity: 4);
	states[7] = new(capacity: 4);
	states[8] = new(capacity: 3);
	states[9] = new(capacity: 3);
	states[10] = new(capacity: 3);
	states[11] = new(capacity: 6);
	states[12] = new(capacity: 6);
	states[13] = new(capacity: 6);
	states[14] = new(capacity: 6);
	states[15] = new(capacity: 6);
	#endregion create objects of syntax states

	#region re-used actions
	LRParseAction aGoto2 = new(LRParseAction.Kind.Goto, states[2]);// refered 2 times
	LRParseAction aGoto3 = new(LRParseAction.Kind.Goto, states[3]);// refered 4 times
	LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
	LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
	LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 2 times
	LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 2 times
	LRParseAction aShift8 = new(LRParseAction.Kind.Shift, states[8]);// refered 3 times
	LRParseAction aShift9 = new(LRParseAction.Kind.Shift, states[9]);// refered 3 times
	LRParseAction aReduce2 = new(regulations[2]);// refered 4 times
	LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
	LRParseAction aReduce7 = new(regulations[7]);// refered 6 times
	LRParseAction aReduce0 = new(regulations[0]);// refered 4 times
	LRParseAction aReduce1 = new(regulations[1]);// refered 4 times
	LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
	LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
	LRParseAction aReduce6 = new(regulations[6]);// refered 6 times
	#endregion re-used actions

	// 78 actions
	// conflicts(0)=not sovled(0)+solved(0)(0 warnings)
	#region init actions of syntax states
	// syntaxStates[0]:
	// [-1] Exp' : ⏳ Exp ;☕ '¥' 
	// [0] Exp : ⏳ Exp '+' Term ;☕ '-' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Term ;☕ '-' '+' '¥' 
	// [2] Exp : ⏳ Term ;☕ '-' '+' '¥' 
	// [3] Term : ⏳ Term '*' Factor ;☕ '-' '*' '/' '+' '¥' 
	// [4] Term : ⏳ Term '/' Factor ;☕ '-' '*' '/' '+' '¥' 
	// [5] Term : ⏳ Factor ;☕ '-' '*' '/' '+' '¥' 
	// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' '*' '/' '+' '¥' 
	// [7] Factor : ⏳ 'number' ;☕ '-' '*' '/' '+' '¥' 
	/*0*/states[0].Add(st.Exp枝, new(LRParseAction.Kind.Goto, states[1]));
	/*1*/states[0].Add(st.Term枝, aGoto2);
	/*2*/states[0].Add(st.Factor枝, aGoto3);
	/*3*/states[0].Add(st.@LeftParenthesis符, aShift4);
	/*4*/states[0].Add(st.@number, aShift5);
	// syntaxStates[1]:
	// [-1] Exp' : Exp ⏳ ;☕ '¥' 
	// [0] Exp : Exp ⏳ '+' Term ;☕ '-' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Term ;☕ '-' '+' '¥' 
	/*5*/states[1].Add(st.@Plus符, aShift6);
	/*6*/states[1].Add(st.@Dash符, aShift7);
	/*7*/states[1].Add(st.@终, LRParseAction.accept);
	// syntaxStates[2]:
	// [2] Exp : Term ⏳ ;☕ '-' ')' '+' '¥' 
	// [3] Term : Term ⏳ '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Term : Term ⏳ '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	/*8*/states[2].Add(st.@Asterisk符, aShift8);
	/*9*/states[2].Add(st.@Slash符, aShift9);
	/*10*/states[2].Add(st.@Dash符, aReduce2);
	/*11*/states[2].Add(st.@RightParenthesis符, aReduce2);
	/*12*/states[2].Add(st.@Plus符, aReduce2);
	/*13*/states[2].Add(st.@终, aReduce2);
	// syntaxStates[3]:
	// [5] Term : Factor ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	/*14*/states[3].Add(st.@Dash符, aReduce5);
	/*15*/states[3].Add(st.@RightParenthesis符, aReduce5);
	/*16*/states[3].Add(st.@Asterisk符, aReduce5);
	/*17*/states[3].Add(st.@Slash符, aReduce5);
	/*18*/states[3].Add(st.@Plus符, aReduce5);
	/*19*/states[3].Add(st.@终, aReduce5);
	// syntaxStates[4]:
	// [6] Factor : '(' ⏳ Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Term ;☕ '-' ')' '+' 
	// [1] Exp : ⏳ Exp '-' Term ;☕ '-' ')' '+' 
	// [2] Exp : ⏳ Term ;☕ '-' ')' '+' 
	// [3] Term : ⏳ Term '*' Factor ;☕ '-' ')' '*' '/' '+' 
	// [4] Term : ⏳ Term '/' Factor ;☕ '-' ')' '*' '/' '+' 
	// [5] Term : ⏳ Factor ;☕ '-' ')' '*' '/' '+' 
	// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' 
	// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' 
	/*20*/states[4].Add(st.Exp枝, new(LRParseAction.Kind.Goto, states[10]));
	/*21*/states[4].Add(st.Term枝, aGoto2);
	/*22*/states[4].Add(st.Factor枝, aGoto3);
	/*23*/states[4].Add(st.@LeftParenthesis符, aShift4);
	/*24*/states[4].Add(st.@number, aShift5);
	// syntaxStates[5]:
	// [7] Factor : 'number' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	/*25*/states[5].Add(st.@Dash符, aReduce7);
	/*26*/states[5].Add(st.@RightParenthesis符, aReduce7);
	/*27*/states[5].Add(st.@Asterisk符, aReduce7);
	/*28*/states[5].Add(st.@Slash符, aReduce7);
	/*29*/states[5].Add(st.@Plus符, aReduce7);
	/*30*/states[5].Add(st.@终, aReduce7);
	// syntaxStates[6]:
	// [0] Exp : Exp '+' ⏳ Term ;☕ '-' ')' '+' '¥' 
	// [3] Term : ⏳ Term '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Term : ⏳ Term '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Term : ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	/*31*/states[6].Add(st.Term枝, new(LRParseAction.Kind.Goto, states[11]));
	/*32*/states[6].Add(st.Factor枝, aGoto3);
	/*33*/states[6].Add(st.@LeftParenthesis符, aShift4);
	/*34*/states[6].Add(st.@number, aShift5);
	// syntaxStates[7]:
	// [1] Exp : Exp '-' ⏳ Term ;☕ '-' ')' '+' '¥' 
	// [3] Term : ⏳ Term '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Term : ⏳ Term '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Term : ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	/*35*/states[7].Add(st.Term枝, new(LRParseAction.Kind.Goto, states[12]));
	/*36*/states[7].Add(st.Factor枝, aGoto3);
	/*37*/states[7].Add(st.@LeftParenthesis符, aShift4);
	/*38*/states[7].Add(st.@number, aShift5);
	// syntaxStates[8]:
	// [3] Term : Term '*' ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	/*39*/states[8].Add(st.Factor枝, new(LRParseAction.Kind.Goto, states[13]));
	/*40*/states[8].Add(st.@LeftParenthesis符, aShift4);
	/*41*/states[8].Add(st.@number, aShift5);
	// syntaxStates[9]:
	// [4] Term : Term '/' ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	/*42*/states[9].Add(st.Factor枝, new(LRParseAction.Kind.Goto, states[14]));
	/*43*/states[9].Add(st.@LeftParenthesis符, aShift4);
	/*44*/states[9].Add(st.@number, aShift5);
	// syntaxStates[10]:
	// [6] Factor : '(' Exp ⏳ ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Term ;☕ '-' ')' '+' 
	// [1] Exp : Exp ⏳ '-' Term ;☕ '-' ')' '+' 
	/*45*/states[10].Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[15]));
	/*46*/states[10].Add(st.@Plus符, aShift6);
	/*47*/states[10].Add(st.@Dash符, aShift7);
	// syntaxStates[11]:
	// [0] Exp : Exp '+' Term ⏳ ;☕ '-' ')' '+' '¥' 
	// [3] Term : Term ⏳ '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Term : Term ⏳ '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	/*48*/states[11].Add(st.@Asterisk符, aShift8);
	/*49*/states[11].Add(st.@Slash符, aShift9);
	/*50*/states[11].Add(st.@Dash符, aReduce0);
	/*51*/states[11].Add(st.@RightParenthesis符, aReduce0);
	/*52*/states[11].Add(st.@Plus符, aReduce0);
	/*53*/states[11].Add(st.@终, aReduce0);
	// syntaxStates[12]:
	// [1] Exp : Exp '-' Term ⏳ ;☕ '-' ')' '+' '¥' 
	// [3] Term : Term ⏳ '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Term : Term ⏳ '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	/*54*/states[12].Add(st.@Asterisk符, aShift8);
	/*55*/states[12].Add(st.@Slash符, aShift9);
	/*56*/states[12].Add(st.@Dash符, aReduce1);
	/*57*/states[12].Add(st.@RightParenthesis符, aReduce1);
	/*58*/states[12].Add(st.@Plus符, aReduce1);
	/*59*/states[12].Add(st.@终, aReduce1);
	// syntaxStates[13]:
	// [3] Term : Term '*' Factor ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	/*60*/states[13].Add(st.@Dash符, aReduce3);
	/*61*/states[13].Add(st.@RightParenthesis符, aReduce3);
	/*62*/states[13].Add(st.@Asterisk符, aReduce3);
	/*63*/states[13].Add(st.@Slash符, aReduce3);
	/*64*/states[13].Add(st.@Plus符, aReduce3);
	/*65*/states[13].Add(st.@终, aReduce3);
	// syntaxStates[14]:
	// [4] Term : Term '/' Factor ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	/*66*/states[14].Add(st.@Dash符, aReduce4);
	/*67*/states[14].Add(st.@RightParenthesis符, aReduce4);
	/*68*/states[14].Add(st.@Asterisk符, aReduce4);
	/*69*/states[14].Add(st.@Slash符, aReduce4);
	/*70*/states[14].Add(st.@Plus符, aReduce4);
	/*71*/states[14].Add(st.@终, aReduce4);
	// syntaxStates[15]:
	// [6] Factor : '(' Exp ')' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	/*72*/states[15].Add(st.@Dash符, aReduce6);
	/*73*/states[15].Add(st.@RightParenthesis符, aReduce6);
	/*74*/states[15].Add(st.@Asterisk符, aReduce6);
	/*75*/states[15].Add(st.@Slash符, aReduce6);
	/*76*/states[15].Add(st.@Plus符, aReduce6);
	/*77*/states[15].Add(st.@终, aReduce6);
	#endregion init actions of syntax states
}

生成的LALR(1)语法分析器的状态图和状态表如下:

由于mermaid支持的字符数量有限,往往不能完全显示一个语言(例如C语言)的全部语法状态及其LR项+lookahead,我在生成状态图时默认只显示每个语法状态的前3个LR项+lookahead。完整的LR项+lookahead可以在生成的语法分析表代码中找到。

状态 '+' '-' '*' '/' '(' ')' 'number' '¥' Exp Term Factor
0 S4 S5 G1 G2 G3
1 S6 S7
2 R[2] R[2] S8 S9 R[2] R[2]
3 R[5] R[5] R[5] R[5] R[5] R[5]
4 S4 S5 G10 G2 G3
5 R[7] R[7] R[7] R[7] R[7] R[7]
6 S4 S5 G11 G3
7 S4 S5 G12 G3
8 S4 S5 G13
9 S4 S5 G14
10 S6 S7 S15
11 R[0] R[0] S8 S9 R[0] R[0]
12 R[1] R[1] S8 S9 R[1] R[1]
13 R[3] R[3] R[3] R[3] R[3] R[3]
14 R[4] R[4] R[4] R[4] R[4] R[4]
15 R[6] R[6] R[6] R[6] R[6] R[6]

上表中:

  • S6表示Shift并进入状态6

  • R[2]表示用regulations[2]=Exp : Term ;规约

  • G1表示进入状态1

  • ✅表示Accept,即分析完毕,语法树生成成功;

  • 空白的地方表示遇到语法错误

点击查看 调用语法分析表 代码

csharp 复制代码
public SyntaxTree Parse(TokenList tokenList) {
    var context = LRSyntaxContext(tokenList, this.initialState, this.EOT, this.stArray);
    var accept = false;
    do {
        Token token = context.CurrentToken;
        int nodeType = token.type;// auto-convert from string to string.
        while (nodeType == blockComment || nodeType == inlineComment) {
            // skip comment token
            context.cursor++;
            token = context.CurrentToken;
            nodeType = token.type;// auto-convert from string to string.
        }

        Dictionary<int/*Node.type*/, LRParseAction> currentState =
            context.stateStack.Peek();
        if (currentState.TryGetValue(nodeType, out var parseAction)) {
            parseAction.Execute(context);
            accept = parseAction.kind == LRParseAction.Kind.Accept;
        }
        else { // syntax error happened.
            return new SyntaxTree(context);
        }
    } while (!accept);

    var root = context.root;
    Debug.Assert(root != null);
    return new SyntaxTree(root);
}

public class LRParseAction {
    public enum Kind {
        Error,
        Shift,
        Reduce,
        Goto,
        Accept,
    }

    public readonly Kind kind;
    [StructLayout(LayoutKind.Explicit)]
    struct Union {
        [FieldOffset(0)] public Dictionary<int/*Node.type*/, LRParseAction> nextState;
        [FieldOffset(0)] public Regulation regulation;
    }
    private Union union;

    // 执行分析动作。
    public void Execute(LRSyntaxContext context) {
        switch (this.kind) {
        case Kind.Error: { throw new NotImplementedException(); }
        //break;
        case Kind.Shift: {
            var token = context.CurrentToken;
            var leaf = new LRNode(token);
            context.nodeStack.Push(leaf);
            var nextState = this.union.nextState;
            context.stateStack.Push(nextState);
            // prepare for next loop.
            context.cursor++;
        }
        break;
        case Kind.Reduce: {
            var regulation = this.union.regulation;
            int count = regulation.right.Length;
            var children = new LRNode[count];
            var start = Token.empty; LRNode? lastNode = null;
            var first = true;
            for (int i = 0; i < count; i++) {
                var state = context.stateStack.Pop();//只弹出,不再使用。
                var node = context.nodeStack.Pop();
                children[count - i - 1] = node;
                if (node.start.index >= 0) { // this node includes token
                    if (first) { lastNode = node; first = false; }
                    start = node.start;
                }
            }
            int tokenCount = 0;
            if (lastNode is not null) {
                tokenCount =
                    lastNode.start.index   // comment tokens inside of parent are included.
                    - start.index          // comment tokens before parent are excluded.
                    + lastNode.tokenCount; // comment tokens after parent are excluded.
            }
            var parent = new LRNode(regulation, start, tokenCount, children);
            for (var i = 0; i < count; i++) { children[i].parent = parent; }
            context.nodeStack.Push(parent);
            // goto next syntax-state
            Dictionary<int/*Node.type*/, LRParseAction> currentState =
               context.stateStack.Peek();
            var nodeType = regulation.left;
            if (currentState.TryGetValue(nodeType, out var parseAction)) {
                parseAction.Execute(context); // parseAction is supposed to be a Goto action
            }
            Debug.Assert(parseAction != null && parseAction.kind == Kind.Goto);
        }
        break;
        case Kind.Goto: {
            var nextState = this.union.nextState;
            context.stateStack.Push(nextState);
        }
        break;
        case Kind.Accept: {
            var state = context.stateStack.Pop();
            context.root = context.nodeStack.Pop();
            context.Finish(context.root, 40, context.stArray);
        }
        break;
        default: { throw new NotImplementedException(); }
        }
    }
}

生成的语义内容提取器

举例来说,对于1234+567+89+0+0这个输入,Calc.st经过语法分析得到的语法树如下所示:(语法树的生成过程见本文开头)

smalltalk 复制代码
R[0]=Exp : Exp '+' Term ;⛪T[0->8]
 ├─R[0]=Exp : Exp '+' Term ;⛪T[0->6]
 │  ├─R[0]=Exp : Exp '+' Term ;⛪T[0->4]
 │  │  ├─R[0]=Exp : Exp '+' Term ;⛪T[0->2]
 │  │  │  ├─R[2]=Exp : Term ;⛪T[0]
 │  │  │  │  └─R[5]=Term : Factor ;⛪T[0]
 │  │  │  │     └─R[7]=Factor : 'number' ;⛪T[0]
 │  │  │  │        └─T[0]='number' 1234
 │  │  │  ├─T[1]='+' +
 │  │  │  └─R[5]=Term : Factor ;⛪T[2]
 │  │  │     └─R[7]=Factor : 'number' ;⛪T[2]
 │  │  │        └─T[2]='number' 567
 │  │  ├─T[3]='+' +
 │  │  └─R[5]=Term : Factor ;⛪T[4]
 │  │     └─R[7]=Factor : 'number' ;⛪T[4]
 │  │        └─T[4]='number' 89
 │  ├─T[5]='+' +
 │  └─R[5]=Term : Factor ;⛪T[6]
 │     └─R[7]=Factor : 'number' ;⛪T[6]
 │        └─T[6]='number' 0
 ├─T[7]='+' +
 └─R[5]=Term : Factor ;⛪T[8]
    └─R[7]=Factor : 'number' ;⛪T[8]
       └─T[8]='number' 0

从左上到右下,连续4个R[0]显著增加了树的深度。

bitParser自动生成的语义内容提取器,会后序遍历此语法树,提取结点信息。
点击查看 通用的 遍历语法树 相关代码

csharp 复制代码
// Extract some data structure from syntax tree.
// <para>post-order traverse <paramref name="root"/> with stack(without recursion).</para>
public T? Extract(LRNode root, TokenList tokens, string sourceCode) {
	var context = new TContext<T>(root, tokens, sourceCode);

	var nodeStack = new Stack<LRNode>(); var indexStack = new Stack<int>();
	// init stack.
	{
		// push nextLeft and its next pending children.
		var nextLeft = root; var index = 0;
		nodeStack.Push(nextLeft); indexStack.Push(index);
		while (nextLeft.children.Count > 0) {
			nextLeft = nextLeft.children[0];
			nodeStack.Push(nextLeft); indexStack.Push(0);
		}
	}

	while (nodeStack.Count > 0) {
		var current = nodeStack.Pop(); var index = indexStack.Pop() + 1;
		if (index < current.children.Count) {
			// push this node back again.
			nodeStack.Push(current); indexStack.Push(index);

			// push nextLeft and its next pending children.
			var nextLeft = current.children[index];
			nodeStack.Push(nextLeft); indexStack.Push(0);
			while (nextLeft.children.Count > 0) {
				nextLeft = nextLeft.children[0];
				nodeStack.Push(nextLeft); indexStack.Push(0);
			}
		}
		else {
			// Visit(current);
			if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) {
				action(current, context);
			}
		}
	}

	{
		var current = this.EOT;
		// Visit(current);
		if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) {
			action(current, context);
		}
	}

	return context.result;
}

点击查看 生成的 在遍历语法树时提取结点信息 相关代码

csharp 复制代码
private static readonly Dictionary<int/*LRNode.type*/,
    Action<LRNode, TContext<Exp>>> @expExtractorDict = new();

private static readonly Action<LRNode, TContext<Exp>> VtHandler =
(node, context) => {
    var token = node.start;
    context.objStack.Push(token);
};

// initialize dict for extractor.
private static void InitializeExtractorDict() {
    var extractorDict = @expExtractorDict;
    extractorDict.Add(st.@Plus符, VtHandler);
    extractorDict.Add(st.@Dash符, VtHandler);
    extractorDict.Add(st.@Asterisk符, VtHandler);
    extractorDict.Add(st.@Slash符, VtHandler);
    extractorDict.Add(st.@LeftParenthesis符, VtHandler);
    extractorDict.Add(st.@RightParenthesis符, VtHandler);
    extractorDict.Add(st.@number, VtHandler);
    extractorDict.Add(st.@终,
    static (node, context) => {
        // [-1]=Exp' : Exp ;
        // dumped by ExternalExtractor
        var @final = (VnExp?)context.objStack.Pop();
        var left = new Exp(@final);
        context.result = left; // final step, no need to push into stack.
    }); // end of extractorDict.Add(st.@终, (node, context) => { ... });
    extractorDict.Add(st.Exp枝,
    static (node, context) => {
        switch (node.regulation.index) {
        case 0: { // [0]=Exp : Exp '+' Term ;
            // dumped by ListExtractor 2
            var r0 = (VnTerm?)context.objStack.Pop();
            var r1 = (Token?)context.objStack.Pop();
            var r2 = (VnExp?)context.objStack.Pop();
            var left = r2;
            left.Add(r1, r0);
            context.objStack.Push(left);
        }
        break;
        case 1: { // [1]=Exp : Exp '-' Term ;
            // dumped by ListExtractor 2
            var r0 = (VnTerm?)context.objStack.Pop();
            var r1 = (Token?)context.objStack.Pop();
            var r2 = (VnExp?)context.objStack.Pop();
            var left = r2;
            left.Add(r1, r0);
            context.objStack.Push(left);
        }
        break;
        case 2: { // [2]=Exp : Term ;
            // dumped by ListExtractor 1
            var r0 = (VnTerm?)context.objStack.Pop();
            var left = new VnExp(r0);
            context.objStack.Push(left);
        }
        break;
        default: throw new NotImplementedException();
        }
    }); // end of extractorDict.Add(st.Exp枝, (node, context) => { ... });
    extractorDict.Add(st.Term枝,
    static (node, context) => {
        switch (node.regulation.index) {
        case 3: { // [3]=Term : Term '*' Factor ;
            // dumped by ListExtractor 2
            var r0 = (VnFactor?)context.objStack.Pop();
            var r1 = (Token?)context.objStack.Pop();
            var r2 = (VnTerm?)context.objStack.Pop();
            var left = r2;
            left.Add(r1, r0);
            context.objStack.Push(left);
        }
        break;
        case 4: { // [4]=Term : Term '/' Factor ;
            // dumped by ListExtractor 2
            var r0 = (VnFactor?)context.objStack.Pop();
            var r1 = (Token?)context.objStack.Pop();
            var r2 = (VnTerm?)context.objStack.Pop();
            var left = r2;
            left.Add(r1, r0);
            context.objStack.Push(left);
        }
        break;
        case 5: { // [5]=Term : Factor ;
            // dumped by ListExtractor 1
            var r0 = (VnFactor?)context.objStack.Pop();
            var left = new VnTerm(r0);
            context.objStack.Push(left);
        }
        break;
        default: throw new NotImplementedException();
        }
    }); // end of extractorDict.Add(st.Term枝, (node, context) => { ... });
    extractorDict.Add(st.Factor枝,
    static (node, context) => {
        switch (node.regulation.index) {
        case 6: { // [6]=Factor : '(' Exp ')' ;
            // dumped by DefaultExtractor
            var r0 = (Token?)context.objStack.Pop();
            var r1 = (VnExp?)context.objStack.Pop();
            var r2 = (Token?)context.objStack.Pop();
            var left = new VnFactor(r2, r1, r0);
            context.objStack.Push(left);
        }
        break;
        case 7: { // [7]=Factor : 'number' ;
            // dumped by DefaultExtractor
            var r0 = (Token?)context.objStack.Pop();
            var left = new VnFactor(r0);
            context.objStack.Push(left);
        }
        break;
        default: throw new NotImplementedException();
        }
    }); // end of extractorDict.Add(st.Factor枝, (node, context) => { ... });
}

提取到的结点信息,展示出来如下:

smalltalk 复制代码
VnExp⛪T[0->8]
 ├─VnTerm⛪T[0]
 │  └─VnFactor⛪T[0]
 │     └─T[0]='number' 1234
 └─List T[1->8]
    ├─Item T[1->2]
    │  ├─T[1]='+' +
    │  └─VnTerm⛪T[2]
    │     └─VnFactor⛪T[2]
    │        └─T[2]='number' 567
    ├─Item T[3->4]
    │  ├─T[3]='+' +
    │  └─VnTerm⛪T[4]
    │     └─VnFactor⛪T[4]
    │        └─T[4]='number' 89
    ├─Item T[5->6]
    │  ├─T[5]='+' +
    │  └─VnTerm⛪T[6]
    │     └─VnFactor⛪T[6]
    │        └─T[6]='number' 0
    └─Item T[7->8]
       ├─T[7]='+' +
       └─VnTerm⛪T[8]
          └─VnFactor⛪T[8]
             └─T[8]='number' 0

类似Exp : Exp '+' Term ;的规约规则,会导致语法树深度过大。这一步的语义提取,作用是将此类树结构"压平"。根据压平了的语义信息,很容易将源代码格式化。
点击查看 VnExp结点的格式化 代码

csharp 复制代码
/// <summary>
/// Correspond to the Vn node Exp in the grammar(Exp).
/// </summary>
internal partial class VnExp : IFullFormatter {
	// [0]=Exp : Exp '+' Term ;
	// [1]=Exp : Exp '-' Term ;
	// [2]=Exp : Term ;

	private readonly VnTerm first0;
	public class PostItem : IFullFormatter {
		private readonly Token r1;
		private readonly VnTerm r0;

		public PostItem(Token r1, VnTerm r0) {
			this.r1 = r1;
			this.r0 = r0;
			this._scope = new TokenRange(r1, r0);
		}
		private readonly TokenRange _scope;
		public TokenRange Scope => _scope;

		public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
			context.PrintBlanksAnd(this.r1, preConfig, writer);
			// '+'或'-'与其左右两边的Token间隔1个空格
			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
            context.PrintCommentsBetween(this.r1, this.r0, config, writer);
			context.PrintBlanksAnd(this.r0, config, writer);
		}
	}
	public class PostItemList : IFullFormatter {
		private readonly List<PostItem> list = new();
		public PostItemList(PostItem item) {
			this.list.Add(item);
			this._scope = new TokenRange(item);
		}
		public void Add(PostItem item) {
			this.list.Add(item);
			this._scope.end = item.Scope.end;
		}
		private readonly TokenRange _scope;
		public TokenRange Scope => _scope;

		public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
			// '+'或'-'与其左右两边的Token间隔1个空格
			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
			for (int i = 0; i < list.Count; i++) {
				if (i == 0) {
					context.PrintBlanksAnd(list[i], preConfig, writer);
				}
				else {
					context.PrintCommentsBetween(list[i - 1], list[i], config, writer);
					context.PrintBlanksAnd(list[i], config, writer);
				}
			}
		}
	}
	private PostItemList? list;

	private readonly TokenRange _scope;
	public TokenRange Scope => _scope;

	internal VnExp(VnTerm first0) {
		this.first0 = first0;
		this._scope = new TokenRange(first0);
	}

	// [0]=Exp : Exp '+' Term ;
	// [1]=Exp : Exp '-' Term ;
	internal void Add(Token r1, VnTerm r0) {
		if (this.list == null) {
			this.list = new PostItemList(new(r1, r0));
		}
		else {
			this.list.Add(new(r1, r0));
		}
		this._scope.end = this.list.Scope.end;
	}

	public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
		context.PrintBlanksAnd(this.first0, preConfig, writer);
		if (this.list != null) {
			// '+'或'-'与其左右两边的Token间隔1个空格
			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
			context.PrintCommentsBetween(this.first0, this.list, config, writer);
			context.PrintBlanksAnd(this.list, config, writer);
		}
	}
}

点击查看 VnTerm结点的格式化 代码

csharp 复制代码
/// <summary>
/// Correspond to the Vn node Term in the grammar(Exp).
/// </summary>
internal partial class VnTerm : IFullFormatter {
	// [3]=Term : Term '*' Factor ;
	// [4]=Term : Term '/' Factor ;
	// [5]=Term : Factor ;

	private readonly VnFactor first0;
	public class PostItem : IFullFormatter {
		private readonly Token r1;
		private readonly VnFactor r0;

		public PostItem(Token r1, VnFactor r0) {
			this.r1 = r1;
			this.r0 = r0;
			this._scope = new TokenRange(r1, r0);
		}
		private readonly TokenRange _scope;
		public TokenRange Scope => _scope;

		public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
			context.PrintBlanksAnd(this.r1, preConfig, writer);
			// '+'或'-'与其左右两边的Token间隔1个空格
			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
            context.PrintCommentsBetween(this.r1, this.r0, config, writer);
			context.PrintBlanksAnd(this.r0, config, writer);
		}
	}
	public class PostItemList : IFullFormatter {
		private readonly List<PostItem> list = new();
		public PostItemList(PostItem item) {
			this.list.Add(item);
			this._scope = new TokenRange(item);
		}
		public void Add(PostItem item) {
			this.list.Add(item);
			this._scope.end = item.Scope.end;
		}

		private readonly TokenRange _scope;
		public TokenRange Scope => _scope;

		public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
			// '*'或'/'与其左右两边的Token间隔1个空格
			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
			for (int i = 0; i < list.Count; i++) {
				if (i == 0) {
					context.PrintBlanksAnd(list[i], preConfig, writer);
				}
				else {
					context.PrintCommentsBetween(list[i - 1], list[i], config, writer);
					context.PrintBlanksAnd(list[i], config, writer);
				}
			}
		}
	}
	private PostItemList? list;

	private readonly TokenRange _scope;
	public TokenRange Scope => _scope;

	// [5]=Term : Factor ;
	internal VnTerm(VnFactor first0) {
		this.first0 = first0;
		this._scope = new TokenRange(first0);
	}

	// [3]=Term : Term '*' Factor ;
	// [4]=Term : Term '/' Factor ;
	internal void Add(Token r1, VnFactor r0) {
		if (this.list == null) {
			this.list = new PostItemList(new(r1, r0));
		}
		else {
			this.list.Add(new(r1, r0));
		}
		this._scope.end = this.list.Scope.end;
	}

	public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
		context.PrintBlanksAnd(this.first0, preConfig, writer);
		if (this.list != null) {
			// '*'或'/'与其左右两边的Token间隔1个空格
			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
			context.PrintCommentsBetween(this.first0, this.list, config, writer);
			context.PrintBlanksAnd(this.list, config, writer);
		}
	}
}

点击查看 VnFactor结点的格式化 代码

csharp 复制代码
/// <summary>
/// Correspond to the Vn node Factor in the grammar(Exp).
/// </summary>
internal abstract partial class VnFactor : IFullFormatter {
	// [6]=Factor : '(' Exp ')' ;
	// [7]=Factor : 'number' ;

	public class C0 : VnFactor {
		// [6]=Factor : '(' Exp ')' ;
		public C0(Token r2, VnExp r1, Token r0) {
			this.r2 = r2;
			this.r1 = r1;
			this.r0 = r0;
			this._scope = new TokenRange(r2, r0);
		}
		private readonly Token r2;
		private readonly VnExp r1;
		private readonly Token r0;

		private readonly TokenRange _scope;
		public override TokenRange Scope => _scope;

		public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
			context.PrintBlanksAnd(this.r2, preConfig, writer);
			// ( Exp )之间不留空格
			var config = new BlankConfig(inlineBlank: 0, forceNewline: false);
			context.PrintCommentsBetween(this.r2, this.r1, config, writer); 
			context.PrintBlanksAnd(this.r1, config, writer);
			context.PrintCommentsBetween(this.r1, this.r0, config, writer); 
			context.PrintBlanksAnd(this.r0, config, writer);
		}
	}
	public class C1 : VnFactor {
		// [7]=Factor : 'number' ;
		public C1(Token r0) {
			this.r0 = r0;
			this._scope = new TokenRange(r0);
		}
		private readonly Token r0;
		private readonly TokenRange _scope;
		public override TokenRange Scope => _scope;

		public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
            // 根据上级设置的preConfig输出自己唯一的token
			context.PrintBlanksAnd(this.r0, preConfig, writer);
		}
	}

	public abstract TokenRange Scope { get; }
	public abstract void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context);
}

最终,1234+567+89+0+0被格式化后的样子如下(在运算符两侧各加入1个空格):

csharp 复制代码
1234 + 567 + 89 + 0 + 0

下面是更多示例:

csharp 复制代码
// 格式化 1-2*3
1 - 2 * 3

// 格式化 (1+2)/3
(1 + 2) / 3

// 格式化 (1+2)*(3-4)
(1 + 2) * (3 - 4)

点击查看 GLSL代码格式化的示例1

GLSL 复制代码
// 示例1 格式化前
void main() {
    int a=0;
    a++++;
    ++++a;
}
// 示例1 格式化后
void main() {
    int a = 0;
    a++ ++;
    ++ ++a;
}

点击查看 GLSL代码格式化的示例2

GLSL 复制代码
// 示例2 格式化前
in vec3  passNormal;
in vec2  passTexCoord;

uniform sampler2D textureMap;
uniform   vec3      lihtDirection=vec3(1,1,1);
uniform   vec3       diffuseColor;
uniform bool     transparent=false;

out vec4 outColor;

void main(  ){
    if (transparent) {
        if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;}

    if (passTexCoord==vec2(-1,-1)){   // when texture coordinate not exists..
    float diffuse=max(dot(normalize(lihtDirection),normalize(passNormal)),0);
     outColor = vec4(diffuseColor * diffuse, 1.0);
    }
    else {     outColor = texture(textureMap, passTexCoord);}
}
// 示例2 格式化后
in vec3 passNormal;
in vec2 passTexCoord;

uniform sampler2D textureMap;
uniform vec3 lihtDirection = vec3(1, 1, 1);
uniform vec3 diffuseColor;
uniform bool transparent = false;

out vec4 outColor;

void main() {
    if (transparent) {
        if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;
    }

    if (passTexCoord == vec2(-1, -1)) {   // when texture coordinate not exists..
        float diffuse = max(dot(normalize(lihtDirection), normalize(passNormal)), 0);
        outColor = vec4(diffuseColor * diffuse, 1.0);
    }
    else { outColor = texture(textureMap, passTexCoord); }
}

关于这个格式化算法的详细介绍,可参考我的另一篇文章(GLSL Shader的格式化算法(LALR解析器))。

举例-自动解决Shift/Reduce冲突

C语言和GLSL中都有if-else悬挂问题,它是由下述产生式引起的:

smalltalk 复制代码
selection_statement :
      'if' '(' expression ')' selection_rest_statement
 ;
selection_rest_statement :
      statement 'else' statement
    | statement
 ;

这样,在语法分析器读到'else'这个Token时,它是应当Shift这个'else'呢,还是应当按照selection_rest_statement : statement ;进行规约呢?这就产生了Shift/Reduce冲突。

bitParser会自动选择按照Shift处理,将按照Ruduce方式处理的那一项注释掉,如下代码所示:

csharp 复制代码
const int syntaxStateCount = 477;
// LALR(1) syntax parse table
private static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[]
    syntaxStates = new Dictionary<string, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
    var states = CompilerGLSL.syntaxStates;
    ...
    // 30814 actions
    // conflicts(1)=not sovled(0)+solved(1)(1 warnings)
    #region init actions of syntax states
    ...
    // syntaxStates[454]:
    // [324] selection_rest_statement : statement ⏳ 'else' statement ;☕ '--' '-' ';' '!' '(' '{' '}' '+' ..
    // [325] selection_rest_statement : statement ⏳ ;☕ '--' '-' ';' '!' '(' '{' '}' '+' ..
    // 'else' repeated 2 times
    states[454]/*28145*/.Add(st.@else, new(LRParseAction.Kind.Shift, states[466]));
    // ⚔ PreferShiftToReduce states[454]/*28146*/.Add(st.@else, new(regulations[325]));
    states[454]/*28147*/.Add(st.@Dash符Dash符, new(regulations[325]));
	...
    #endregion init actions of syntax states
}

举例-优先级指令%nonassoc、%left、%right、%prec

如果我们按最直观的方式书写Calc.st,可能是这样的:

smalltalk 复制代码
Exp : Exp '+' Exp
    | Exp '-' Exp
    | Exp '*' Exp
    | Exp '/' Exp
    | '(' Exp ')'
    | 'number' ;

%%[0-9]+%% 'number'

点击查看 其LALR(1)语法分析器的状态转换表代码

csharp 复制代码
const int syntaxStateCount = 14;
/// <summary>
/// LALR(1) syntax parse table
/// </summary>
private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates =
	new Dictionary<string/*Node.type*/, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
	var states = CompilerExp.syntaxStates;
	// 80 actions
	// conflicts(16)=not sovled(0)+solved(16)(16 warnings)
	#region create objects of syntax states
	states[0] = new(capacity: 3);
	states[1] = new(capacity: 5);
	states[2] = new(capacity: 3);
	states[3] = new(capacity: 6);
	states[4] = new(capacity: 3);
	states[5] = new(capacity: 3);
	states[6] = new(capacity: 3);
	states[7] = new(capacity: 3);
	states[8] = new(capacity: 5);
	states[9] = new(capacity: 6);
	states[10] = new(capacity: 6);
	states[11] = new(capacity: 6);
	states[12] = new(capacity: 6);
	states[13] = new(capacity: 6);
	#endregion create objects of syntax states

	#region re-used actions
	LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states[2]);// refered 6 times
	LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states[3]);// refered 6 times
	LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
	LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
	LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 6 times
	LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 6 times
	LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
	LRParseAction aReduce0 = new(regulations[0]);// refered 6 times
	LRParseAction aReduce1 = new(regulations[1]);// refered 6 times
	LRParseAction aReduce2 = new(regulations[2]);// refered 6 times
	LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
	LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
	#endregion re-used actions

	// 80 actions
	// conflicts(16)=not sovled(0)+solved(16)(16 warnings)
	#region init actions of syntax states
	// syntaxStates[0]:
	// [-1] Exp' : ⏳ Exp ;☕ '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' '*' '/' '+' '¥' 
	states[0]/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[1]));
	states[0]/*1*/.Add(st.@LeftParenthesis符, aShift2);
	states[0]/*2*/.Add(st.@number, aShift3);
	// syntaxStates[1]:
	// [-1] Exp' : Exp ⏳ ;☕ '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' '*' '/' '+' '¥' 
	states[1]/*3*/.Add(st.@Plus符, aShift4);
	states[1]/*4*/.Add(st.@Dash符, aShift5);
	states[1]/*5*/.Add(st.@Asterisk符, aShift6);
	states[1]/*6*/.Add(st.@Slash符, aShift7);
	states[1]/*7*/.Add(st.@终, LRParseAction.accept);
	// syntaxStates[2]:
	// [4] Exp : '(' ⏳ Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' 
	states[2]/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[8]));
	states[2]/*9*/.Add(st.@LeftParenthesis符, aShift2);
	states[2]/*10*/.Add(st.@number, aShift3);
	// syntaxStates[3]:
	// [5] Exp : 'number' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	states[3]/*11*/.Add(st.@Dash符, aReduce5);
	states[3]/*12*/.Add(st.@RightParenthesis符, aReduce5);
	states[3]/*13*/.Add(st.@Asterisk符, aReduce5);
	states[3]/*14*/.Add(st.@Slash符, aReduce5);
	states[3]/*15*/.Add(st.@Plus符, aReduce5);
	states[3]/*16*/.Add(st.@终, aReduce5);
	// syntaxStates[4]:
	// [0] Exp : Exp '+' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[4]/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[9]));
	states[4]/*18*/.Add(st.@LeftParenthesis符, aShift2);
	states[4]/*19*/.Add(st.@number, aShift3);
	// syntaxStates[5]:
	// [1] Exp : Exp '-' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[5]/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[10]));
	states[5]/*21*/.Add(st.@LeftParenthesis符, aShift2);
	states[5]/*22*/.Add(st.@number, aShift3);
	// syntaxStates[6]:
	// [2] Exp : Exp '*' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[6]/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[11]));
	states[6]/*24*/.Add(st.@LeftParenthesis符, aShift2);
	states[6]/*25*/.Add(st.@number, aShift3);
	// syntaxStates[7]:
	// [3] Exp : Exp '/' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[7]/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[12]));
	states[7]/*27*/.Add(st.@LeftParenthesis符, aShift2);
	states[7]/*28*/.Add(st.@number, aShift3);
	// syntaxStates[8]:
	// [4] Exp : '(' Exp ⏳ ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' 
	states[8]/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[13]));
	states[8]/*30*/.Add(st.@Plus符, aShift4);
	states[8]/*31*/.Add(st.@Dash符, aShift5);
	states[8]/*32*/.Add(st.@Asterisk符, aShift6);
	states[8]/*33*/.Add(st.@Slash符, aShift7);
	// syntaxStates[9]:
	// [0] Exp : Exp '+' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	states[9]/*34*/.Add(st.@Plus符, aShift4);
	// ⚔ PreferShiftToReduce states[9]/*35*/.Add(st.@Plus符, aReduce0);
	// '-' repeated 2 times
	states[9]/*36*/.Add(st.@Dash符, aShift5);
	// ⚔ PreferShiftToReduce states[9]/*37*/.Add(st.@Dash符, aReduce0);
	// '*' repeated 2 times
	states[9]/*38*/.Add(st.@Asterisk符, aShift6);
	// ⚔ PreferShiftToReduce states[9]/*39*/.Add(st.@Asterisk符, aReduce0);
	// '/' repeated 2 times
	states[9]/*40*/.Add(st.@Slash符, aShift7);
	// ⚔ PreferShiftToReduce states[9]/*41*/.Add(st.@Slash符, aReduce0);
	states[9]/*42*/.Add(st.@RightParenthesis符, aReduce0);
	states[9]/*43*/.Add(st.@终, aReduce0);
	// syntaxStates[10]:
	// [1] Exp : Exp '-' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	states[10]/*44*/.Add(st.@Plus符, aShift4);
	// ⚔ PreferShiftToReduce states[10]/*45*/.Add(st.@Plus符, aReduce1);
	// '-' repeated 2 times
	states[10]/*46*/.Add(st.@Dash符, aShift5);
	// ⚔ PreferShiftToReduce states[10]/*47*/.Add(st.@Dash符, aReduce1);
	// '*' repeated 2 times
	states[10]/*48*/.Add(st.@Asterisk符, aShift6);
	// ⚔ PreferShiftToReduce states[10]/*49*/.Add(st.@Asterisk符, aReduce1);
	// '/' repeated 2 times
	states[10]/*50*/.Add(st.@Slash符, aShift7);
	// ⚔ PreferShiftToReduce states[10]/*51*/.Add(st.@Slash符, aReduce1);
	states[10]/*52*/.Add(st.@RightParenthesis符, aReduce1);
	states[10]/*53*/.Add(st.@终, aReduce1);
	// syntaxStates[11]:
	// [2] Exp : Exp '*' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	states[11]/*54*/.Add(st.@Plus符, aShift4);
	// ⚔ PreferShiftToReduce states[11]/*55*/.Add(st.@Plus符, aReduce2);
	// '-' repeated 2 times
	states[11]/*56*/.Add(st.@Dash符, aShift5);
	// ⚔ PreferShiftToReduce states[11]/*57*/.Add(st.@Dash符, aReduce2);
	// '*' repeated 2 times
	states[11]/*58*/.Add(st.@Asterisk符, aShift6);
	// ⚔ PreferShiftToReduce states[11]/*59*/.Add(st.@Asterisk符, aReduce2);
	// '/' repeated 2 times
	states[11]/*60*/.Add(st.@Slash符, aShift7);
	// ⚔ PreferShiftToReduce states[11]/*61*/.Add(st.@Slash符, aReduce2);
	states[11]/*62*/.Add(st.@RightParenthesis符, aReduce2);
	states[11]/*63*/.Add(st.@终, aReduce2);
	// syntaxStates[12]:
	// [3] Exp : Exp '/' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	states[12]/*64*/.Add(st.@Plus符, aShift4);
	// ⚔ PreferShiftToReduce states[12]/*65*/.Add(st.@Plus符, aReduce3);
	// '-' repeated 2 times
	states[12]/*66*/.Add(st.@Dash符, aShift5);
	// ⚔ PreferShiftToReduce states[12]/*67*/.Add(st.@Dash符, aReduce3);
	// '*' repeated 2 times
	states[12]/*68*/.Add(st.@Asterisk符, aShift6);
	// ⚔ PreferShiftToReduce states[12]/*69*/.Add(st.@Asterisk符, aReduce3);
	// '/' repeated 2 times
	states[12]/*70*/.Add(st.@Slash符, aShift7);
	// ⚔ PreferShiftToReduce states[12]/*71*/.Add(st.@Slash符, aReduce3);
	states[12]/*72*/.Add(st.@RightParenthesis符, aReduce3);
	states[12]/*73*/.Add(st.@终, aReduce3);
	// syntaxStates[13]:
	// [4] Exp : '(' Exp ')' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	states[13]/*74*/.Add(st.@Dash符, aReduce4);
	states[13]/*75*/.Add(st.@RightParenthesis符, aReduce4);
	states[13]/*76*/.Add(st.@Asterisk符, aReduce4);
	states[13]/*77*/.Add(st.@Slash符, aReduce4);
	states[13]/*78*/.Add(st.@Plus符, aReduce4);
	states[13]/*79*/.Add(st.@终, aReduce4);
	#endregion init actions of syntax states
}

当处于syntaxStates[9]时,如果遇到'+'这个Token,那么本应当Reduce,但是bitParser根据默认的"Shift优先于Reduce"的原则,选择了Shift。显然,这无法正确处理加减运算和乘除运算的优先关系。

如果不想将文法改写为本文最初的样式,这里可以用优先级指令声明加减乘除运算的优先级,从而得到正确的语法分析表。

smalltalk 复制代码
Exp : Exp '+' Exp
    | Exp '-' Exp
    | Exp '*' Exp
    | Exp '/' Exp
    | '(' Exp ')'
    | 'number' ;

%%[0-9]+%% 'number'

%left '+' '-' // '+' '-' 的优先级相同,且偏向于Reduce
%left '*' '/' // '*' '/' 的优先级相同,且高于'+' '-',且偏向于Reduce

点击查看 使用了优先级指令的LALR(1)语法分析器的状态转换表 代码

csharp 复制代码
const int syntaxStateCount = 14;
/// <summary>
/// LALR(1) syntax parse table
/// </summary>
private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates =
	new Dictionary<string/*Node.type*/, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
	var states = CompilerExp.syntaxStates;
	// 80 actions
	// conflicts(16)=not sovled(0)+solved(16)(0 warnings)
	#region create objects of syntax states
	states[0] = new(capacity: 3);
	states[1] = new(capacity: 5);
	states[2] = new(capacity: 3);
	states[3] = new(capacity: 6);
	states[4] = new(capacity: 3);
	states[5] = new(capacity: 3);
	states[6] = new(capacity: 3);
	states[7] = new(capacity: 3);
	states[8] = new(capacity: 5);
	states[9] = new(capacity: 6);
	states[10] = new(capacity: 6);
	states[11] = new(capacity: 6);
	states[12] = new(capacity: 6);
	states[13] = new(capacity: 6);
	#endregion create objects of syntax states

	#region re-used actions
	LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states[2]);// refered 6 times
	LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states[3]);// refered 6 times
	LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
	LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
	LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 6 times
	LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 6 times
	LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
	LRParseAction aReduce0 = new(regulations[0]);// refered 6 times
	LRParseAction aReduce1 = new(regulations[1]);// refered 6 times
	LRParseAction aReduce2 = new(regulations[2]);// refered 6 times
	LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
	LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
	#endregion re-used actions

	// 80 actions
	// conflicts(16)=not sovled(0)+solved(16)(0 warnings)
	#region init actions of syntax states
	// syntaxStates[0]:
	// [-1] Exp' : ⏳ Exp ;☕ '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' '*' '/' '+' '¥' 
	states[0]/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[1]));
	states[0]/*1*/.Add(st.@LeftParenthesis符, aShift2);
	states[0]/*2*/.Add(st.@number, aShift3);
	// syntaxStates[1]:
	// [-1] Exp' : Exp ⏳ ;☕ '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' '*' '/' '+' '¥' 
	states[1]/*3*/.Add(st.@Plus符, aShift4);
	states[1]/*4*/.Add(st.@Dash符, aShift5);
	states[1]/*5*/.Add(st.@Asterisk符, aShift6);
	states[1]/*6*/.Add(st.@Slash符, aShift7);
	states[1]/*7*/.Add(st.@终, LRParseAction.accept);
	// syntaxStates[2]:
	// [4] Exp : '(' ⏳ Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' 
	states[2]/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[8]));
	states[2]/*9*/.Add(st.@LeftParenthesis符, aShift2);
	states[2]/*10*/.Add(st.@number, aShift3);
	// syntaxStates[3]:
	// [5] Exp : 'number' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	states[3]/*11*/.Add(st.@Dash符, aReduce5);
	states[3]/*12*/.Add(st.@RightParenthesis符, aReduce5);
	states[3]/*13*/.Add(st.@Asterisk符, aReduce5);
	states[3]/*14*/.Add(st.@Slash符, aReduce5);
	states[3]/*15*/.Add(st.@Plus符, aReduce5);
	states[3]/*16*/.Add(st.@终, aReduce5);
	// syntaxStates[4]:
	// [0] Exp : Exp '+' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[4]/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[9]));
	states[4]/*18*/.Add(st.@LeftParenthesis符, aShift2);
	states[4]/*19*/.Add(st.@number, aShift3);
	// syntaxStates[5]:
	// [1] Exp : Exp '-' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[5]/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[10]));
	states[5]/*21*/.Add(st.@LeftParenthesis符, aShift2);
	states[5]/*22*/.Add(st.@number, aShift3);
	// syntaxStates[6]:
	// [2] Exp : Exp '*' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[6]/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[11]));
	states[6]/*24*/.Add(st.@LeftParenthesis符, aShift2);
	states[6]/*25*/.Add(st.@number, aShift3);
	// syntaxStates[7]:
	// [3] Exp : Exp '/' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[7]/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[12]));
	states[7]/*27*/.Add(st.@LeftParenthesis符, aShift2);
	states[7]/*28*/.Add(st.@number, aShift3);
	// syntaxStates[8]:
	// [4] Exp : '(' Exp ⏳ ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' 
	states[8]/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[13]));
	states[8]/*30*/.Add(st.@Plus符, aShift4);
	states[8]/*31*/.Add(st.@Dash符, aShift5);
	states[8]/*32*/.Add(st.@Asterisk符, aShift6);
	states[8]/*33*/.Add(st.@Slash符, aShift7);
	// syntaxStates[9]:
	// [0] Exp : Exp '+' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	// ⚔ LeftShouldReduce states[9]/*34*/.Add(st.@Plus符, aShift4);
	states[9]/*35*/.Add(st.@Plus符, aReduce0);
	// '-' repeated 2 times
	// ⚔ LeftShouldReduce states[9]/*36*/.Add(st.@Dash符, aShift5);
	states[9]/*37*/.Add(st.@Dash符, aReduce0);
	// '*' repeated 2 times
	states[9]/*38*/.Add(st.@Asterisk符, aShift6);
	// ⚔ LowPrecedence states[9]/*39*/.Add(st.@Asterisk符, aReduce0);
	// '/' repeated 2 times
	states[9]/*40*/.Add(st.@Slash符, aShift7);
	// ⚔ LowPrecedence states[9]/*41*/.Add(st.@Slash符, aReduce0);
	states[9]/*42*/.Add(st.@RightParenthesis符, aReduce0);
	states[9]/*43*/.Add(st.@终, aReduce0);
	// syntaxStates[10]:
	// [1] Exp : Exp '-' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	// ⚔ LeftShouldReduce states[10]/*44*/.Add(st.@Plus符, aShift4);
	states[10]/*45*/.Add(st.@Plus符, aReduce1);
	// '-' repeated 2 times
	// ⚔ LeftShouldReduce states[10]/*46*/.Add(st.@Dash符, aShift5);
	states[10]/*47*/.Add(st.@Dash符, aReduce1);
	// '*' repeated 2 times
	states[10]/*48*/.Add(st.@Asterisk符, aShift6);
	// ⚔ LowPrecedence states[10]/*49*/.Add(st.@Asterisk符, aReduce1);
	// '/' repeated 2 times
	states[10]/*50*/.Add(st.@Slash符, aShift7);
	// ⚔ LowPrecedence states[10]/*51*/.Add(st.@Slash符, aReduce1);
	states[10]/*52*/.Add(st.@RightParenthesis符, aReduce1);
	states[10]/*53*/.Add(st.@终, aReduce1);
	// syntaxStates[11]:
	// [2] Exp : Exp '*' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	// ⚔ LowPrecedence states[11]/*54*/.Add(st.@Plus符, aShift4);
	states[11]/*55*/.Add(st.@Plus符, aReduce2);
	// '-' repeated 2 times
	// ⚔ LowPrecedence states[11]/*56*/.Add(st.@Dash符, aShift5);
	states[11]/*57*/.Add(st.@Dash符, aReduce2);
	// '*' repeated 2 times
	// ⚔ LeftShouldReduce states[11]/*58*/.Add(st.@Asterisk符, aShift6);
	states[11]/*59*/.Add(st.@Asterisk符, aReduce2);
	// '/' repeated 2 times
	// ⚔ LeftShouldReduce states[11]/*60*/.Add(st.@Slash符, aShift7);
	states[11]/*61*/.Add(st.@Slash符, aReduce2);
	states[11]/*62*/.Add(st.@RightParenthesis符, aReduce2);
	states[11]/*63*/.Add(st.@终, aReduce2);
	// syntaxStates[12]:
	// [3] Exp : Exp '/' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	// ⚔ LowPrecedence states[12]/*64*/.Add(st.@Plus符, aShift4);
	states[12]/*65*/.Add(st.@Plus符, aReduce3);
	// '-' repeated 2 times
	// ⚔ LowPrecedence states[12]/*66*/.Add(st.@Dash符, aShift5);
	states[12]/*67*/.Add(st.@Dash符, aReduce3);
	// '*' repeated 2 times
	// ⚔ LeftShouldReduce states[12]/*68*/.Add(st.@Asterisk符, aShift6);
	states[12]/*69*/.Add(st.@Asterisk符, aReduce3);
	// '/' repeated 2 times
	// ⚔ LeftShouldReduce states[12]/*70*/.Add(st.@Slash符, aShift7);
	states[12]/*71*/.Add(st.@Slash符, aReduce3);
	states[12]/*72*/.Add(st.@RightParenthesis符, aReduce3);
	states[12]/*73*/.Add(st.@终, aReduce3);
	// syntaxStates[13]:
	// [4] Exp : '(' Exp ')' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	states[13]/*74*/.Add(st.@Dash符, aReduce4);
	states[13]/*75*/.Add(st.@RightParenthesis符, aReduce4);
	states[13]/*76*/.Add(st.@Asterisk符, aReduce4);
	states[13]/*77*/.Add(st.@Slash符, aReduce4);
	states[13]/*78*/.Add(st.@Plus符, aReduce4);
	states[13]/*79*/.Add(st.@终, aReduce4);
	#endregion init actions of syntax states

}

bitParser中的优先级指令%nonassoc%left%right%prec,与yacc相同:

  • 书写顺序靠后的,优先级更高;

  • %left偏向于Reduce;

  • %right偏向于Shift;

  • %nonassoc表示有语法错误;

  • %prec可以特别指定一个Token类型,以使用其优先级,而不是采用默认的(规约规则最右边Vt)的优先级。而且可以指定文法中不存在的Vt(即此Vt纯粹是个占位符)。

举例-/后缀

在Step格式的文件里,1=2中的1应当被认为是一个'entityId',而2应当被认为是一个'refEntity',即对"entity 2"的引用。

如何区别两者呢?当某个数值后面有=时,它就是'entityId',否则就是'refEntity'。此时就需要用/后缀功能,如下所示:

smalltalk 复制代码
// postfix.st
// regulations:
Items : Items Item | Item ;
Item : 'entityId' '=' 'refEntity' ;
// lexi statements:
%%[0-9]+/[ \t]*=%% 'entityId' // 数字后跟随着" =",那么这些数字就是'entityId'
%%[0-9]+%% 'refEntity' 

点击查看 生成的 lexi状态0 相关代码

csharp 复制代码
// lexicalState0
private static readonly Action<LexicalContext, char> lexicalState0 =
static (context, c) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* = */
	else if (/* possible Vt : '=' */
	c == '='/*'\u003D'(61)*/) {
		BeginToken(context);
		ExtendToken(context);
		context.currentState = lexicalState2;
	}
	/* [0-9] */
	else if (/* possible Vt : 'entityId' 'refEntity' */
	'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
		BeginToken(context);
		ExtendToken(context);
		context.currentState = lexicalState3;
	}
	/* deal with everything else. */
	else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') {
		context.currentState = lexicalState0; // skip them.
	}
	else { // unexpected char.
		BeginToken(context);
		ExtendToken(context);
		AcceptToken(st.Error, context);
		context.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态1 相关代码

csharp 复制代码
// lexicalState1
private static readonly Action<LexicalContext, char> lexicalState1 =
static (context, c) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* = */
	else if (/* possible Vt : 'entityId' */
	c == '='/*'\u003D'(61)*/) {
		context.currentState = lexicalState4;
	}
	/* [ \t] */
	else if (/* possible Vt : 'entityId' */
	(c == ' '/*'\u0020'(32)*/)
	|| (c == '\t'/*'\u0009'(9)*/)) {
		context.currentState = lexicalState1;
	}
	/* deal with everything else. */
	else { // token with error type
		ExtendToken(context);
		AcceptToken(st.Error, context);
		context.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态2 相关代码

csharp 复制代码
// lexicalState2
private static readonly Action<LexicalContext, char> lexicalState2 =
static (context, c) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@Equal符, context);
		context.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态3 相关代码

csharp 复制代码
// lexicalState3
private static readonly Action<LexicalContext, char> lexicalState3 =
static (context, c) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* = */
	else if (/* possible Vt : 'entityId' */
	c == '='/*'\u003D'(61)*/) {
		context.currentState = lexicalState4;
	}
	/* [ \t] */
	else if (/* possible Vt : 'entityId' */
	(c == ' '/*'\u0020'(32)*/)
	|| (c == '\t'/*'\u0009'(9)*/)) {
		context.currentState = lexicalState1;
	}
	/* [0-9] */
	else if (/* possible Vt : 'entityId' 'refEntity' */
	'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
		ExtendToken(context);
		context.currentState = lexicalState3;
	}
	/* deal with everything else. */
	else {
		AcceptToken(st.@refEntity, context);
		context.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态4 相关代码

csharp 复制代码
// lexicalState4
private static readonly Action<LexicalContext, char> lexicalState4 =
static (context, c) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@entityId, context);
		context.currentState = lexicalState0;
	}
};

下面是二维数组ElseIf[][]形式的状态转换表,它复用了一些ElseIf对象,这进一步减少了空间占用。
点击查看 生成的 二维数组状态转换表 代码

csharp 复制代码
private static readonly ElseIf[][] lexiStates = new ElseIf[5][];
static void InitializeLexiTable() {
	ElseIf s9_9_0_1 = new('\t'/*'\u0009'(9)*/, Acts.None, 1);//refered 2 times
	ElseIf s32_32_0_1 = new(' '/*'\u0020'(32)*/, Acts.None, 1);//refered 2 times
	ElseIf s61_61_0_4 = new('='/*'\u003D'(61)*/, Acts.None, 4);//refered 2 times
	lexiStates[0] = new ElseIf[] {
	// possible Vt: 'entityId' 'refEntity'
	/*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 3),
	// possible Vt: '='
	/*1*/new('='/*'\u003D'(61)*/, Acts.Begin | Acts.Extend, 2),
	};
	lexiStates[1] = new ElseIf[] {
	// possible Vt: 'entityId'
	/*0*/ //new('\t'/*'\u0009'(9)*/, Acts.None, 1),
	/*0*/s9_9_0_1,
	// possible Vt: 'entityId'
	/*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1),
	/*1*/s32_32_0_1,
	// possible Vt: 'entityId'
	/*2*///new('='/*'\u003D'(61)*/, Acts.None, 4),
	/*2*/s61_61_0_4,
	};
	lexiStates[2] = new ElseIf[] {
	// possible Vt: '='
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Equal符),
	};
	lexiStates[3] = new ElseIf[] {
	// possible Vt: 'entityId'
	/*0*///new('\t'/*'\u0009'(9)*/, Acts.None, 1),
	/*0*/s9_9_0_1,
	// possible Vt: 'entityId'
	/*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1),
	/*1*/s32_32_0_1,
	// possible Vt: 'entityId' 'refEntity'
	/*2*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 3),
	// possible Vt: 'entityId'
	/*3*///new('='/*'\u003D'(61)*/, Acts.None, 4),
	/*3*/s61_61_0_4,
	// possible Vt: 'refEntity'
	/*4*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@refEntity),
	};
	lexiStates[4] = new ElseIf[] {
	// possible Vt: 'entityId'
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@entityId),
	};
}

其miniDFA的状态图如下:

结合代码和状态图容易发现:

  • 当处于miniDFA3状态时,如果遇到空格\t=则可断定下一个Token类型是'entityId'(或错误类型'Error错');否则就是'refEntity'(并返回miniDFA0状态)。这符合我们的预期。

  • 当处于miniDFA0状态(即初始状态)时,如果遇到[0-9],那么下一个Token类型既可能是'entityId',又可能是'refEntity'。此时尚且无法唯一断定之。

也就是说,这个例子也包含了将NFA转换为DFA的问题,因为它的两种Token'entityId''refEntity'的开头[0-9]+是相同的。

我们来对比它的NFA和DFA:

下面是它的NFA状态图:

下面是它的DFA状态图:

可见:

  • 在NFA状态图中,NFA0-0状态在遇到[0-9]时,有两个选择:NFA2-1状态和NFA3-1状态。在对应的DFA状态图中,NFA2-1状态和NFA3-1状态就被合并到了一个DFA状态(DFA2)中。这是通过子集构造法实现的。

这里顺便展示一个构造miniDFA的例子:

smalltalk 复制代码
// regulations:
Left : 'min' ;
// lexical statements
%%[a-zA-Z_][a-zA-Z0-9_]*/,%% 'min' //标识符后跟',',则为'min'

这个'min'的DFA如下:

这个'min'的miniDFA如下:

可见,miniDFA合并了DFA中的DFA1状态和DFA3状态。它们两个能够合并,是因为它们两个在读到相同的char时均会跳转到相同的下一个状态。这是通过Hopcroft算法实现的。

举例-前缀<'Vt'>

在GLSL中,为了方便语法分析,我需要将struct Point { float x; float y; }中的Point识别为一个"type_name"类型的Token,而不是"identifier"类型的Token。这可以通过前缀来实现。

smalltalk 复制代码
// ..
struct_specifier :
      'struct' 'type_name' '{' struct_declaration_list '}' ;
// ..

// lexical statements
%%<'struct'>[a-zA-Z_][a-zA-Z0-9_]*%% 'type_name' // 跟在struct之后的Token应当被设定为"type_name"类型
%%[a-zA-Z_][a-zA-Z0-9_]*%% 'identifier' // 平时应当被设定为"identifier"类型

添加前缀<'struct'>不会影响词法分析器的状态构成,只会在设置Token类型时看一看上一个Token是不是"struct"类型:若是,则设置为"type_name"类型,否则,设置为"identifier"类型。

另外,还需要新增一个数组,用于记录已识别出的全部"type_name"类型,以便再次遇到它时,也能够将其设置为"type_name"类型。
点击查看 状态机受前缀影响的部分 代码

csharp 复制代码
// lexicalState1
private static readonly Action<LexicalContext, char> lexicalState1 =
static (context, c) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* [a-zA-Z0-9_] */
	else if (/* possible Vt : 'type_name' 'identifier' */
	('a' <= c && c <= 'z')
	|| ('A' <= c && c <= 'Z')
	|| ('0' <= c && c <= '9')
	|| (c == '_')) {
		ExtendToken(context);
		context.currentState = lexicalState3;
	}
	/* deal with everything else. */
	else {
		AcceptToken2(context
		// 如果上一个Token是struct,那么新Token是type_name
		, new(/*<'Vt'>*/st.@struct, st.@type_name)
		// 否则,新Token是identifier
		, new(/*default preVt*/string.Empty, st.@identifier));
		context.currentState = lexicalState0;
	}
};

与之配套的AcceptToken2(context, ifVts);函数也就复杂些:
点击查看 void AcceptToken2(LexicalContext context, params IfVt[] ifVts); 代码

csharp 复制代码
struct IfVt {
	public readonly string preVt;
	public readonly string Vt;
	public IfVt(string preVt, string Vt) {
		this.preVt = preVt;
		this.Vt = Vt;
	}
}

private static void AcceptToken2(LexicalContext context, params IfVt[] ifVts) {
	var startIndex = context.analyzingToken.start.index;
	var endIndex = context.analyzingToken.end.index;
	context.analyzingToken.value = context.sourceCode.Substring(
		startIndex, endIndex - startIndex + 1);
	var typeSet = false;
	const string key = "type_name"; var hadThisTypeName = false;
	if (!typeSet) {
		if (context.tagDict.TryGetValue(key, out var type_nameList)) {
			var list = type_nameList as List<string>;
			if (list.Contains(context.analyzingToken.value)) {
			    // 如果是已识别出的type_name
				context.analyzingToken.type = st.type_name;
				typeSet = true;
				hadThisTypeName = true;
			}
		}
	}
	if (!typeSet) {
	    int lastType = 0;
		if (context.lastSyntaxValidToken != null) {
			lastType = context.lastSyntaxValidToken.type;
		}
		for (var i = 0; i < ifVts.Length; i++) {
			var ifVt = ifVts[i];
			if (ifVt.preVt == 0 // 默认没有前缀
			  || ifVt.preVt == lastType) { // 匹配到了前缀<'Vt'>
				context.analyzingToken.type = ifVt.Vt;
				typeSet = true;
				break;
			}
		}
	}
	if (!typeSet) {
		// we failed to assign type according to lexi statements.
		// this indicates token error in source code or inappropriate lexi statements.
		context.analyzingToken.type = st.Error错;
		context.signalCondition = LexicalContext.defaultSignal;
	}

    // cancel forward steps for post-regex
	var backStep = context.cursor.index - context.analyzingToken.end.index;
	if (backStep > 0) { context.MoveBack(backStep); }
	// next operation: context.MoveForward();

	var token = context.analyzingToken.Dump();
	context.result.Add(token);
	// 跳过注释
	if (context.analyzingToken.type != st.blockComment
	 && context.analyzingToken.type != st.inlineComment) {
		context.lastSyntaxValidToken = token;
	}

	if (!hadThisTypeName && context.analyzingToken.type == st.type_name) {
	    // 将新识别出的type_name加入list
		if (!context.tagDict.TryGetValue(key, out var type_nameList)) {
			type_nameList = new List<string>();
			context.tagDict.Add(key, type_nameList);
		}
		var list = type_nameList as List<string>;
		list.Add(context.analyzingToken.value);
	}
}

注意,语法分析不需要blockCommentinlineComment(类型为"注释"的Token),因而在记录上一个Token类型时,我们要跳过注释。

举例-状态信号<signal1, signal2, ..>

在GLSL中,为了方便语法分析,我需要将subroutine ( r1, r2 )中的r1r2都识别为"type_name"类型的Token,而不是"identifier"类型的Token。这无法通过前缀实现,但可以通过状态信号LexicalContext.signal实现。

状态信号是这样起作用的:

在读到一个"subroutine"类型的Token时,将LexicalContext.signal设置为subroutine0

在读到一个"("类型的Token时,如果LexicalContext.signalsubroutine0,就将LexicalContext.signal设置为subroutine1

在读到一个符合[a-zA-Z_][a-zA-Z0-9_]*的标识符时,如果LexicalContext.signalsubroutine1(这说明词法分析器刚刚连续读到了'subroutine' '('),就将它识别为"type_name"类型的Token,否则识别为"identifier"类型的Token;

在读到一个")"类型的Token时,如果LexicalContext.signalsubroutine1,就将LexicalContext.signal设置为default(默认状态),即不再理会状态信号。

smalltalk 复制代码
storage_qualifier :
    | 'subroutine' '(' type_name_list ')' ;

// lexical statements
%%subroutine%%             'subroutine' subroutine0
<subroutine0>%%[(]%%       '('          subroutine1
<subroutine1>%%[a-zA-Z_][a-zA-Z0-9_]*%% 'type_name'
<subroutine1>%%[,]%%       ','
<subroutine1>%%[)]%%       ')'          default

状态信号,是在词法分析器这个状态机的基础上又附加了一个状态机,因而其应用比较复杂,容易出错,应当尽量少用。

举例-中文字符

如果我想识别出一个文本文件中的全部汉字(假设汉字字符位于\u4E00\u9FFF之间),可以这样:

smalltalk 复制代码
Items : Items Item | Item ;
Item : 'chineseChar' | 'other' ;

%%[\u4E00-u9FFF]%% 'chineseChar' 
%%[^\u4E00-u9FFF]%% 'other'

End

相关推荐
secondyoung5 天前
Markdown转换为Word:Pandoc模板使用指南
开发语言·经验分享·笔记·c#·编辑器·word·markdown
Source.Liu6 天前
【mdBook】6 在持续集成中运行 mdbook
markdown
Source.Liu7 天前
【mdBook】5.5 mdBook 特色功能
markdown
Source.Liu9 天前
【mdBook】7.1 预处理器
markdown
Source.Liu10 天前
【mdBook】5.2.3 渲染器配置详解
markdown
Source.Liu11 天前
【mdBook】5.2 配置
markdown
Source.Liu12 天前
【mdBook】1 安装
笔记·rust·markdown
颜酱12 天前
实现一个mini编译器,来感受编译器的各个流程
前端·javascript·编译器
qq74223498412 天前
免费版Markdown 编辑器:Typora
大模型·编辑器·markdown
Georgewu13 天前
【鸿蒙开源技术共建】用@luvi/lv-markdown-in在HarmonyOS上打造高性能Markdown编辑体验
harmonyos·markdown